Editing Html2txt

Jump to navigation Jump to search
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
A tool to convert HTML documents into plain text.
A tool to convert HTML documents into plain text.


For example this HTML code
Html2txt is written in Java; it is available as a command line tool and as an APACHE ANT task.


[[File:Main.main.jpg]]
Some HTML elements are converted into "markup" characters, e.g.


is rendered like this:
<pre>This is a <var>variable</var></pre>.


[[File:usage.txt.jpg]]
converts into


For a complete description of the supported HTML inline elements, see
<pre>This is a &lt;variable&gt;</pre>
<span class="plainlinks">[http://html2txt.unkrig.de/javadoc/de/unkrig/html2txt/Html2Txt.html#ALL_INLINE_ELEMENTS here]</span>.


For a complete description of the supported HTML block elements, see
, other elements are simply ignored because they cannot reasonably be converted into text.
<span class="plainlinks">[http://html2txt.unkrig.de/javadoc/de/unkrig/html2txt/Html2Txt.html#ALL_BLOCK_ELEMENTS here]</span>.


== Motivation ==
For a complete description of the supported HTML inline elements, see
 
<span class="plainlinks">[http://html2txt.unkrig.de/javadoc/src-html/de/unkrig/html2txt/Html2Txt.html#line.1269 here]</span>.
The goal was to generate the "usage" page that a command line tool usually prints when you invoke it with a "<tt>-help</tt>" or "<tt>--help</tt>" option, rather than maintain it manually (e.g. in the form of "<tt>println()</tt>" statements in the code).
 
The chosen solution is to put a big DOC comment before the "<tt>main()</tt>" method, generate an HTML page with JAVADOC, convert that into a plain text file, put it into the application's JAR file and copy its contents to STDOUT when the user want to see it.
 
The command line version of <tt>html2txt</tt> itself uses that technique, and you can see the results above.
 
== Download ==
 
You can download the latest version of the runnable JAR file [https://repository.sonatype.org/service/local/artifact/maven/redirect?r=central-proxy&g=de.unkrig&a=html2txt&v=LATEST&c=jar-with-dependencies here].
 
== Limitations ==
 
Since the tool uses the JRE's built-in XML parser, it supports "numeric character references" (like "&amp;#252;" for "Ü"), but not "named HTML character entity references" (like "&amp;Uuml;" for "Ü").
 
For the same reason, the HTML markup in the DOC comments must be "well-formed", i.e. all start tags must be matched by an end tag (like "<code>&lt;li>...&lt;/li></code>"), and void tags must end with a slash, like "<code>&lt;br /></code>".
 
== Usage ==
 
=== Command line tool ===
 
see [http://html2txt.unkrig.de/Main.main(String%5b%5d).html here].
 
=== ANT task ===
 
see [http://html2txt.unkrig.de/antdoc/index.html here].
 
=== Library ===
 
see [http://html2txt.unkrig.de/javadoc/index.html the JAVADOC].
 
=== Source Code ===
 
see [https://github.com/aunkrig/html2txt the source code repository].
 
== Change Log ==
 
; Version 1.0.2, 2016-11-25:
:* Modified the text of the copyright notice slightly: Replaced "author" with "copyright holders and contributors".
 
; Version 1.0.1, 2016-11-07:
:* Resurrected Java 6 compatibility.
 
== License ==
 
<code>html2txt</code> is published under the "[[New BSD License]]".
 
== Contact ==
 
If you have issues, don't hesitate to [https://sourceforge.net/p/html2txt/tickets/ submit a ticket].
 
To discuss in public, check the [https://sourceforge.net/p/html2txt/discussion/ forum] and/or subscribe to it (envelope icon).
Please note that all contributions to unkrig.de may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see Unkrig.de:Copyrights for details). Do not submit copyrighted work without permission!
Cancel Editing help (opens in new window)