[tex-hyphen] Unicode patterns for Unicode in Pdflatex/Babel?

Mojca Miklavec mojca.miklavec.lists at gmail.com
Sun Jun 12 23:25:50 CEST 2011

On Sun, Jun 12, 2011 at 20:38, Arthur Reutenauer wrote:
>    Hi Daniel,
>> I am trying to make myself clear: I would like to know if it wouldn't be possible to employ a custom
>> Unicode hyphenation rules/pattern file also for Pdflatex/Babel when the text there is Unicode, too

As Arthur pointed out, you cannot simply load hyph-foo.tex file with
UTF-8 patterns without loading other macros to handle UTF-8 *before*
patterns themselves. (You may try and see what happens.)

But we tried to make sure that all patterns that can reasonably work
in pdfTeX do work. Indic languages, Sanskrit, "Ethiopic", Lao etc.
were disabled in pdfTeX because we had no idea how to handle them.

>  However, I'm going to venture a wild guess and assume that the language you're interested in is Sanskrit, a language which actually has patterns disabled for pdfTeX, because we couldn't determine what font encoding was appropriate when the patterns were submitted: for the vast majority of languages that had patterns when Mojca and I took over work on hyphenation files three years ago, there was one single 8-bit encoding, that was used by both the pattern file and the Babel support files.  Several languages, though, have been added in the mean time, including Sanskrit, that had no dedicated 8-bit encoding that we could use(*).  We thus decided to make them available for Unicode-aware TeX engines only; hence, you don't have access to them from pdfTeX.  But if you have a reason to want to use them, we'll gladly make them available as well.  That won't be a problem at all; we only never considered the issue because we didn't think it would come up -- Mojca, what do you think?

I would really like to know what language Daniel is referring to
before any further discussion.

If it is really about Sanskrit ... I have no idea how one can type it
in UTF-8 in pdfTeX. All the examples about Devanagari that I found
were using ASCII and were pointing to Velthuis' package claiming:

% Hyphenation
% ~~~~~~~~~~~
% The responsibility for hyphenating Devanagari text is taken over
% completely by the preprocessor, devnag.c. The preprocessor inserts
% discretionary hyphenation points (\-) in all the places it thinks are
% appropriate.

or whatever other older package.

I would need more input to provide any reasonable answer.


More information about the tex-hyphen mailing list