[tex4ht] [bug #242] Spurious semicolon produced by \textregistered command

Karl Berry karl at freefriends.org
Thu Jan 22 19:21:04 CET 2015

    characters which should be replaced with entites in \url commands.

I think that's right.

    this declaration is used only when `url-il2-pl` command line option is
    used. not all special characters are declared. now, the problem is
    with lualatex, as it is unicode engine, it reports invalid utf-8
    sequence even if it doesn't use url-encoders at all.

    as it is unlikely that anybody uses still latin2 encoding and special
    characters in urls at the same time, and given that list of these
    escaped special characters isn't comprehensive anyway, maybe we should
    take that away? because it causes compile errors every time tex4ht is
    used with lualatex.

Oh.  Clearly we need to solve it somehow, but I don't much like the idea
of getting rid of functionality, even something as obscure and probably
little-used as this.  Plenty of people still use Latin N encodings, and
there is an active TeX community in Poland -- I surmise that's who asked
for that option in the first place.

LuaTeX can certainly read files in any encoding, including plain bytes,
not just UTF-8.  I'm afraid I don't have any recipes at hand, though
it seems like it should be doable.

But a simpler idea comes to mind: how about replacing the problematic
characters with TeX's ^^xx notation?  I'm not sure if the conversions
will happen at the right time, given that \url is changing everything
around anyway, but we can wait and see if anyone notices.  At least it
would go through at the input level and is one step beyond just deleting it.

Another idea is to move that chunk of input to a separate file, which
only gets read when that option is in effect.


More information about the tex4ht mailing list