[tex4ht] [bug #242] Spurious semicolon produced by \textregistered command

Michal Hoftich michal.h21 at gmail.com
Thu Jan 22 21:51:52 CET 2015


2015-01-22 19:21 GMT+01:00 Karl Berry <karl at freefriends.org>:
>     characters which should be replaced with entites in \url commands.
>
> I think that's right.
>
>     this declaration is used only when `url-il2-pl` command line option is
>     used. not all special characters are declared. now, the problem is
>     with lualatex, as it is unicode engine, it reports invalid utf-8
>     sequence even if it doesn't use url-encoders at all.
>
>     as it is unlikely that anybody uses still latin2 encoding and special
>     characters in urls at the same time, and given that list of these
>     escaped special characters isn't comprehensive anyway, maybe we should
>     take that away? because it causes compile errors every time tex4ht is
>     used with lualatex.
>
> Oh.  Clearly we need to solve it somehow, but I don't much like the idea
> of getting rid of functionality, even something as obscure and probably
> little-used as this.  Plenty of people still use Latin N encodings, and
> there is an active TeX community in Poland -- I surmise that's who asked
> for that option in the first place.
>

OK. clearly someone needed it, as it is only configuration provided
for any input encoding for url-encoder.

> LuaTeX can certainly read files in any encoding, including plain bytes,
> not just UTF-8.  I'm afraid I don't have any recipes at hand, though
> it seems like it should be doable.
>

it is possible to use luatex's callback to convert read file to utf8
on the fly. I did that when I tried to use callbacks to write html
directly from LuaLaTeX:

https://github.com/michal-h21/lua4ht/blob/math/l4patchlatin1.lua


> But a simpler idea comes to mind: how about replacing the problematic
> characters with TeX's ^^xx notation?  I'm not sure if the conversions
> will happen at the right time, given that \url is changing everything
> around anyway, but we can wait and see if anyone notices.  At least it
> would go through at the input level and is one step beyond just deleting it.
>
> Another idea is to move that chunk of input to a separate file, which
> only gets read when that option is in effect.

that might be perhaps the best solution?

thanks,
Michal

>
> Thanks,
> Karl


More information about the tex4ht mailing list