[tex-hyphen] UTF-8 Hyphen

Javier Múgica javier at digi21.eu
Mon Jun 9 15:55:51 CEST 2008


Hello:

I got the message from Mojca today. I was planing to write utf-8 patterns
for LuaTeX, but I was waiting for the July release of LuaTeX to start with
LuaTeX and then see if I could write a single file, that will be read
properly both by current TeX and LuaTeX. Something like

\ifx\undefined\SomethinginLuanotinTeX
\else
   Set up a LuaTeX callback to read latin1 characters and transform them to
UTF-8
\fi

And at the end of the file

 \ifx\undefined\SomethinginLuanotinTeX
\else
   Restore previous behavour of LuaTeX
\fi

But for the existing engines i didn't want a utf-8 file. Indeed, since
currently hyphenation patterns are for a specific encoding, it makes sense
to have them written in that encoding. initex and friends (pdftex --ini)
will always read them right.
I never use XeTeX, I don't need it at all, nor do I read the xu- files (I do
not even have them in my computer, no need all that stuff), and I'm likely
not the only one, so for anything prior to LuaTeX, pattern files without
multibyte characters need to be present.

That said, I don't mind if you make a copy of my patterns and transform them
to UTF-8. Galician is a good language to make experiments with, very few
people will be affected if it crashes, so you may sacrifice it along with
Slovenian :-)


> Since your patterns are auto-generated, it might mean that tools that
> auto-generate them might need to be adjusted a bit


Fortunately not. Indeed, I may use old initex to generate the patterns in
any single-byte encoding I wish, just changin the line

\def\Ti{\encodingreplacements{á é í ó ú ñ ü ï}{^^e1 ^^e9 ^^ed ^^f3 ^^fa ^^f1
^^fc ^^ef}}

in the generating file. It does not work for UTF-8, but not because of the
tool used but because of processing them with initex.

I will start playing with LuaTeX at the end of July or even in September,
not before.


Regards, Javier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://tug.org/pipermail/tex-hyphen/attachments/20080609/3b76b927/attachment.html 


More information about the tex-hyphen mailing list