[luatex] dump a tfm to a file

Thu Jun 23 23:56:47 CEST 2011

On 22 June 2011 04:23, luigi scarso <luigi.scarso at gmail.com> wrote:
> On Wed, Jun 22, 2011 at 10:56 AM, Ulrike Fischer <luatex at nililand.de> wrote:
>>
>> tex4ht.exe is a dvi-driver. It takes a dvi and generates eg a
>> html-file. To be able to do this the dvi must contain a lot
>> \specials. This specials are inserted by tex4ht.sty and various
>> 4ht-files during the previous (lua)latex runs.
>>
>> [...]
>>
>> The htf-files don't do only reencoding or mapping. They are used to
>> control the "look" of the output. E.g.
>>
>> 'b' ''     98
>>
>> will give the expected "b" if the input is char98 (= b). But in
>> another htf-file you find at position 98 this:
>>
>> 'B' '4'    98
>>
>> and this will give
>>
>> <span class="small-caps">B</span>
>>
>> (The <span>  comes from the '4' which is a class number).
>>
>> So the htf-files gives you a low-level mapping characters to other
>> representations (like html entities) and of fonts to font features
>> in html like small-caps, bold etc.
>>
>>
>> The generation of the dvi works fine with luatex. The problems
>> starts at the dvi -> html step with tex4ht (if the document uses
>> system fonts). The dvi contains font names like "file:lm-modern..."
>> and tex4ht looks (for still unknown reasons) for its tfm and can't
>> find it.
>>
>> For a simple document I got around the problem by using the
>> low-level command \font\test=Arial and renaming an arbitrary tfm to
>> arial.tfm.  Currently I seem to be able to use ASCII and öäü, but
>> the € is output to  ÿ. This looks like a 256-barrier ;-(. But
>> perhaps one can get around it by extending the htf-files.
> With context mkii (texlive 2009, pdftex engine: test.mkii is an utf-8 file)
>
> %%% test.mkii
> \enableregime[utf]
> \starttext
> goo
> €æß@¢¢
> \stoptext
> %%%
>
> $>texexec --dvi  test.mkii
> $>tex4ht test.dvi
>
> [...]
>
> The result is
>
> 1
> goo &#x20AC;æß@cc
>

>From above I infer that the translation works only for the UTF8
characters that fit in ISO-8859-1 (Latin 1): since ISO-8859-1 is
effectively a superset of ASCII and a subset of UNICODE, all the slots
that are in the right position ("goo" "æß@" in the example above)
translate properly; any other charecter besides (¢) or above (€)
ISO-8859-1 breaks the remapping/transcoding of UTF8 (euro is
represented as an hexadecimal xml entity, and I don't know why ¢
breaks down).

If Ulrike's only problem is the Euro character, she may try dumping
(and/or patching) a tfm/htf with ISO-8859-15 (Latin 9) encoding for
the remapping. I doubt tex4ht will complain, as long as the character
is in the single 8 bit range allowed by the dvi format, and she
provides the appropriate slot numbers in the translation tables. So
instead of yielding the hexadecimal entity the tex4ht font would
produce the Euro. In fact, that tfm/htf file may be already lurking in
the massive whole of htf files in tex4ht's distro.

<speculation>

If indeed tex4ht is a dvi driver, it should *not* work with UTF8 (or
even BMP encoded ofm files), given the specification (single 8 bit
integer) of the dvi format, regardless how you patch the tfm/htf
files. She really needs, as Robin suggests, a successor to tex4ht that
ignores the single 8 bit limitation of the input (dvi) format.

Now I wonder whether LuaTeX may be used to implement that successor:
perhaps an appropriate Lua script may call LuaTeX to process the UTF8
encoded tex source to produce a properly UTF8 encoded file in a UTF8
extension to the dvi format (let's call it dui, for the sake of this
argument ;-); then the very same Lua script would call TeXLua to run
some Lua code to do the remapping of all the \special-ly embedded xml
entities and produce the xml or html file(s) required, leaving the
UTF8 encoded characters alone.

</speculation>

-- 
Luis Rivera
O< http://www.asciiribbon.org/ campaign