up

## From LaTeX to HTML

The translation of a LaTeX source file into HTML involves of loading tex4ht.sty and *.4ht style files, choosing the desirable options for the translation, compiling the source into dvi code with the native LaTeX engine, and postprocessing the outcome with the tex4ht and t4ht programs (see overview).

The htlatex command loads a script which takes on itself to invoke the different steps of the process, without user intervention. The command assumes the form

htlatex filename "options1" "option2" "options3" "options4"

where the first set of options is for the tex4ht.sty and *.4ht style files, the second set is for the tex4ht postprocessor, the third for the t4ht postprocessor, and the last one is for the LaTeX compiler. For instance,

htlatex filename
This command requests a translation according to the default conditions, which are set to produce HTML transitional 4.0 code.
htlatex filename "html,2,info"
This command is equivalent to the previous one, specifying explicitly the option html for tex4ht.sty instead of doing so implicitly.

In addition, the command requests a break up of the output into separate web pages, in accordance to the two top sectioning levels of the document.

Moreover, it asks for a listing in the log file of the information available for the style files in use. That information, among other things, also introduces additional values available for the first list of options.

htlatex filename "" "dbcs/!"
This command requests the loading of the dbcs branch of Chinese hypertext fonts (on top of those already requested by the default setting).
htlatex filename "foo,frames" "" "-p"
This command requests LaTeX to load a private configuration file, named foo.cfg, and to place the content and table of contents in separate frames. In addition, it asks t4ht not to produce bitmaps for pictures.
htlatex filename "" " -ciso2htf" "" "-translate-file=il2-pl"
This command invokes the LaTeX compiler with the instruction ‘latex -translate-file=il2-plfilename’.

## Available Values for the Options

The fields of option1 should be separated by commas. An ‘info’ field requests a listing in the .log file of many of the the available values. If the list is not empty, it must start with the entry ‘html’, ‘xhtml’, or a name of a private configuration file.

The fields of option2 and options3 should be separated by spaces. The available values can be listed by executing the postprocessors tex4ht.c and t4ht.c, respectively, without arguments (or with wrong sets of arguments).

The first field of option2 should be empty or a subdirectory of ht-fonts (typically augmented with an exclamation mark ‘!’). A space should separate the first field from the second one, also when the first field is empty.

The underlying output formats of available htlatex-like commands are tailored into the commands through fields of option1. The names of these fields are defined in tex4ht.4ht and tex4ht.usr (see General Configuration Files). These values should be of little interest to most users.

Different variants of the htlatex command may be invoked by introducing the commands as arguments to a driver named mk4ht. When provided without arguments, the driver lists the commands it recognizes.

 mk4ht mzlatex filename "html,3" (htlatex filename "html,3,xhtml,mozilla" " -cmozhtf") mk4ht oolatex filename (htlatex filename "xhtml,ooffice" "ooffice/! -cmozhtf" "-coo")

Alternatively, a compilation ‘latex mkht-scripts.4ht’ produces different named scripts of similar functionality.

## XHTML and Unicode

The ‘xhlatex’ command is a variant of the ‘htlatex’ command requesting XHTML output. It consists just of a call to ‘htlatex’ with the entry ‘xhtml’ in the first list of options and ‘-cvalidate’ in the third list. For instance, ‘xhlatex filename’ or ‘htlatex filename "xhtml"’.

To request a Unicode representation of symbols, the first list of options should include the ‘uni-html4’ entry, and the second list should include the ‘-cunihtf’ entry preceded by space. For instance, ‘xhlatex filename "xhtml,uni-html4" " -cunihtf"’.

Unicode representations of symbols in UTF-8 encoding may be requested with the entry ‘-utf8’ added to the second list. For instance, ‘xhlatex filename "xhtml,charset=utf-8" " -cunihtf -utf8"’.

To request expanded usage of unicode values in iso-8859-1 output employ commands similar to

htlatex file "" "iso8859/1/charset/uni/!"

or introduce a similar charset path in tex4ht.env. Otherwise, non iso-8859-1 characters might obtain bitmap representations.

## XHTML with MathML

TeX4ht has different configurations for different modes of output. It is distributed with pre-tailored base configurations for translating LaTeX math into MathML, and extra configurations for adjusting the outcome to Mozilla, MathPlayer, and PMathML CSS. Only presentational MathML is supported.

mzlatex filename
mzlatex filename "html,pmathml"
mzlatex filename "html,mathml-"
mzlatex filename "html,mathplayer"
xhmlatex filename

For XHTML+MathML to be served both by Mozilla and MSIE+MathPlayer use the command line option ‘mathplayer’.

The mzlatex command is a short cut representation for the command ‘htlatex filename "xhtml,mozilla" " -cmozhtf" "-cvalidate"’. It take into account special needs of browsers. The xhmlatex command is a short cut representation for the command ‘htlatex filename "xhtml,mathml" " -cunihtf" "-cvalidate"’; it does not make any compromizes toward browsers.

It might be worthwhile to notice some of the more common sources of problems for MathML. The ‘mathml-’ options asks for a degraded MathML output that sidetracks some of the problems.

## OpenDocument, OpenOffice, and MS Word

A translation for an OpenDocument format can be requested by the ‘\oolatex’ command. The command is a variant of htlatex in which the first list of options holds the entries ‘xhtml,ooffice’, the second list holds the entry ‘-cmozhtf’ preceded by a space, and the third list contains ‘-coo’ (htlatex filename "xhtml,ooffice" "ooffice/! -cmozhtf" "-coo -cvalidate"). The output of a command ‘oolatex filename’ is a zipped file named with a ‘.odt’ extension.

The OpenDocument code employs MathML for formulas, and XSL-FO for formatting. It can be viewed by the OpenOffice word processor which, in turn, can export RTF and other MicroSoft-based formats (see also, Maarten Wisse, “Hacking TeX4ht for XML Output: The Road toward a TeX to Word Convertor”, MAPS 28 (2002), pp. 28-35).

A command of the form ‘htlatex filename "html,word" "symbol/!" "-cvalidate"’ asks for HTML output tuned toward MicroSoft Word. Such a format, however, relies on bitmaps for mathematical formulas.

## DocBook and TEI

The following commands may be used for requesting DocBook and TEI output.

 dbmlatex: htlatex filename "xhtml,docbook-mml" " -cunihtf" "-cdocbk" dblatex: htlatex filename "xhtml,docbook" " -cunihtf" "-cdocbk" teimlatex: htlatex filename "xhtml,tei-mml" " -cunihtf" "-cdocbk" teilatex: htlatex filename "xhtml,tei" " -cunihtf" "-cdocbk"

## JavaHelp

JavaHelp is an online documentation system for use by Java-based applications. Such documents can be produced from LaTeX files through commands similar to ‘jhlatex filename’ for JavaHelp version 2.0.

The above invocation stands for ‘htlatex filename "html,3.2,xml,javahelp,unicode" " -cmozhtf -u10" "-dfilename-doc/ -cjavahelp"’. The ‘-u10’ asks for entity references in base 10—JavaHelp doesn’t seem to support hexadecimal representations. The -cjavahelp invokes the JavaHelp indexer to produce the search database at ‘jobname-doc/jobname-jhs’ with a command similar to ‘java -jar \${HOME}/jh2.0/javahelp/bin/jhindexer.jar -db jobname-doc/jobname-jhs jobname*.html’.

The Java program is to be distributed with the jobname-doc directory.

## Private Configuration Files

The leading entry, in the first list of options of the htlatex-like commands, can equal ‘html’ or ‘xhtml’. If this is not the case, the entry is assumed to be the name of a configuration file. The extension ‘cfg’ is assumed for names of configuration files that are listed without their extension.

A configuration file should take the following form for LaTeX files.

...early definitions...
\Preamble{options}
...definitions...
\begin{document}
...insertions into the header of the html file...
\EndPreamble

It is up to the user to decide the distribution of entries between the \Preamble and the htlatex-like commands.

Example The command ‘htlatex myfile "mycfg,2"’ requests the compilation of a file named myfile.tex, in the presence of a configuration file named mycfg.cfg. The configuration file might have the following content.
\Preamble{html}
\begin{document}
\Css{body { color : red; }}
\EndPreamble
Notes
• Notice that for a LaTeX file the \begin{document} instruction should be present both in the configuration file and the source file.

• Instructions defined within a source file may be redefined in a configuration file. Such a feature enables to keep source files intact for compilation to different formats by different tools.

For instance, a definition of the form \renewcommand\mycommand{...} within a configuration file provided for the following LaTeX source.

\documentclass{...}
\newcommand\mycommand{...}
\begin{document}
Use \mycommand{...}
\end{document}

## Creating Private Command Lines

A htlatex-like script foolatex.bat can be obtained from the compilation under LaTeX of a file similar to the following one.

\def\script{bat}
\input mkht.4ht
\one{,html,next,3}
\two{-ic:\tex4ht\texmf\tex4ht\ht-fonts\#1
-ic:\tex4ht\texmf\tex4ht\ht-fonts\symbol\!}
\three{#1 -dc:\my\dir}
\make{foo}
\end{document}

A call of the form

foolatex filename

is then equivalent to a call of the following form.

htlatex filename "html,next,3" "symbol/!" "-dc:\my\dir"

Scripts obtained in such a manner can embed parameters within their bodies instead of expecting the parameters to be provided in command lines.

Details regarding the available options can be found by compiling under LaTeX a file of the following form.

\input mkht.4ht \end{document}

The compilation requires the ProTex.sty and AlProTex.sty files available at http://www.cse.ohio-state.edu/~gurari/systems.html.

## An Insight into the Commands

Given a LaTeX file

\documentclass{article}
\begin{document}
..................
\end{document}

the ‘htlatex filename’ command produces a call ‘latex filename’ to LaTeX on an implicit file of the following form.

\documentclass{article}
\usepackage{tex4ht}
\begin{document}
..................
\end{document}

Similarly, the command ‘htlatex filename "options"’ produces a call to a ‘latex filename’ command on an implicit file of the following form.

\documentclass{article}
\usepackage[options]{tex4ht}
\begin{document}
..................
\end{document}
The command ‘ht latex filename ’ may be used, instead of the ‘htlatex filename "options"’ command, in cases that the \usepackage instruction is explicitly introduced into the source files.

## A Deeper Insight

From the perspective of TeX4ht, the htlatex-like commands, and the \usepackage, are indirect approaches for getting LaTeX files of the following form. Such files can be explicitly provided for compilations requested through the ‘ht latex filename’ command.

\documentclass{article}
.....
\input tex4ht.sty
.....
\Preamble{options}
.....
\begin{document}
.....
\EndPreamble
..................
\end{document}

## TeX, ConTeXt, and TeXi

Commands similar to those offered for LaTeX are also offered for TeX (dbmtex, dbtex, ht, httex, mztex, ootex, t4ht, teimtex, teitex, tex4ht, xhmtex, xhtex) and TeXi (dbmtexi, dbtexi, httexi, mztexi, ootexi, teimtexi, teitexi, xhmtexi, xhtexi). In the case of TeX, the fragment of code ‘\csname tex4ht\endcsname’ should be introduced by the user into the source file, after the preamble of the file where the document definitions reside (example). In the case of TeXi, such a code fragment is introduced implicitly.

The private configuration files are similar to those of LaTeX, with the instruction ‘\begin{document}’ excluded.

...
\Preamble{options}
...
\begin{document}
...
\EndPreamble
...
The ‘ht tex filename ’ and ‘ht texi filename ’ commands may apply for TeX and TeXi sources that embed such code fragments in their body. The embeded code should replace the ‘\csname tex4ht\endcsname’ fragment in TeX sources, be palces at the strat of the files in TeXi sources, and not include the \begin{document} instruction.

For ConTeXt similar instructions apply with suffixes ‘context’ instead of ‘latex’, ‘tex’, or ‘texi’. For instance, ‘htcontext’ .

## Other Options

• XeTeX files can be compiled with htlatex-like instructions (e.g., htxelatex, htxetex, mzxelatex). Currently only partial support is provided and only TeX-based fonts are handled.
• A jsMath mode of output may be requested with instructions similar to the following one.
htlatex file "xhtml,jsmath" " -cmozhtf"

• The dvipng utility might be activated for bitmap constructions through a request ‘-cdvipng’ in the third options list. For instance,
htlatex file "" "" "-cdvipng"

This utility is reported to produce fast high quality output with much smaller files than other convertors.

• TeX4ht offers also speech output formats.

## Validation

The outcome of the translations should be checked by validators for proper syntax. Typically, with the presence of validators, errors are easy to detect and correct, but they require human intervention.

TeX4ht doesn’t offer a built-in parser to verify the correctness of the outcome. However, external validator(s) can quite easily be integrated into the compilation process.

## Recommendations

To keep with the spirit of LaTeX and hypertext, in which style is assumed to be separated from content, the users are encouraged to avoid inserting TeX4ht code into their source files. Instead, they should place their modifications, to the default settings, within private configuration files to be loaded by htlatex-like commands.

On the other hand, it should be noted that hypertext markings should adhere to strict rules specified by different standards. Consequently, it is strongly advised to check the output obtained from the default configurations, before trying to tailor new ones.