HTML2text is a set of Tcl procedures for converting HTML to ASCII text. It differs from other converters I've seen in that it gives the user control of the rendering of the various tags.
It uses the HTML parser from Sun's Hippo HTML display library (by Stephen Uhler). A copy of this library is included in the HTML2text distribution.
NOTE: Before using the program (on Unix),
you'll need to change the first line of the HTML2text
file to point to the location of a Tcl 7.4 (or newer) shell.
To convert from HTML to text you can simply run the HTML2text utility. It works as a filter or you can optionally specify an input file and output file as arguments.
All of the following could be used to generate the text version of the README:
HTML2text < README.html > README HTML2text README.html > README HTML2text README.html README
In order to do the conversion in a Tcl program, you need to read
in the html_library.tcl
file from the Hippo
library and the html_text.tcl
file. You can either do
this by "source"ing them directly or by arranging for them to be
autoloaded.
You can then convert HTML to ASCII by calling the
HTconvert_html
command. For example:
puts [HTconvert_html "This text is emphasized"]would output the text in emphasized form. Using the defaults it would be:
_This text is emphasized_
See the contents of the HTML2text
file for a couple
of examples of changing the renderings.
If you wish to install the HTML2text program into a
standard location for executables, you'll need to first install
the html_library-0.3/html_library.tcl
and
html_text.tcl
files into a library directory.
Then modify the HTML2text
program to load in the
two library files (as well as changing the first line to point to
tclsh, if you haven't already done so).
The HTML2text
file can then be copied into place.