(Difference between revisions)
Jump to: navigation, search
m (Limitations)
m (Source Code)
Line 49: Line 49:
=== Source Code ===
=== Source Code ===
see [ the SVN source code repository].
see [ the source code repository].
== Change Log ==
== Change Log ==

Latest revision as of 13:58, 19 January 2022

A tool to convert HTML documents into plain text.

For example this HTML code


is rendered like this:


For a complete description of the supported HTML inline elements, see here.

For a complete description of the supported HTML block elements, see here.


[edit] Motivation

The goal was to generate the "usage" page that a command line tool usually prints when you invoke it with a "-help" or "--help" option, rather than maintain it manually (e.g. in the form of "println()" statements in the code).

The chosen solution is to put a big DOC comment before the "main()" method, generate an HTML page with JAVADOC, convert that into a plain text file, put it into the application's JAR file and copy its contents to STDOUT when the user want to see it.

The command line version of html2txt itself uses that technique, and you can see the results above.

[edit] Download

You can download the latest version of the runnable JAR file here.

[edit] Limitations

Since the tool uses the JRE's built-in XML parser, it supports "numeric character references" (like "ü" for "Ü"), but not "named HTML character entity references" (like "Ü" for "Ü").

For the same reason, the HTML markup in the DOC comments must be "well-formed", i.e. all start tags must be matched by an end tag (like "<li>...</li>"), and void tags must end with a slash, like "<br />".

[edit] Usage

[edit] Command line tool

see here.

[edit] ANT task

see here.

[edit] Library

see the JAVADOC.

[edit] Source Code

see the source code repository.

[edit] Change Log

Version 1.0.2, 2016-11-25
  • Modified the text of the copyright notice slightly: Replaced "author" with "copyright holders and contributors".
Version 1.0.1, 2016-11-07
  • Resurrected Java 6 compatibility.

[edit] License

html2txt is published under the "New BSD License".

[edit] Contact

If you have issues, don't hesitate to submit a ticket.

To discuss in public, check the forum and/or subscribe to it (envelope icon).

Personal tools