[tex-hyphen] Unicode patterns for Unicode in Pdflatex/Babel?

Daniel Stender daniel at danielstender.com
Fri Jun 17 12:32:03 CEST 2011

Hi guys,

thanks for the rich comments! Guessed right it's about Sanskrit but it's too early to present
something (Dominik already knows).

The other guy who asked me if I could find out something on running custom Unicode patterns also
with Pdflatex/ucs.sty/Babel is about to switch over to Xelatex/Polyglossia now, so this isn't the
important issue anymore - but thanks again for the comments anyway.

Daniel Stender

On 12.06.2011 23:27, Mojca Miklavec wrote:
> I'm not sure if you subscribed to the list or not. Arthur forgot to CC
> you in case you aren't.
> Mojca
> ---------- Forwarded message ----------
> From: Arthur Reutenauer <arthur.reutenauer at normalesup.org>
> Date: Sun, Jun 12, 2011 at 20:38
> Subject: Re: [tex-hyphen] Unicode patterns for Unicode in Pdflatex/Babel?
> To: "About TeX hyphenation patterns." <tex-hyphen at tug.org>
> Cc: "tex-hyphen at tug.org" <tex-hyphen at tug.org>
>  Hi Daniel,
>> I am trying to make myself clear: I would like to know if it wouldn't be possible to employ a custom
>> Unicode hyphenation rules/pattern file also for Pdflatex/Babel when the text there is Unicode, too
>  That shouldn't be necessary: the patterns that are presented to
> pdfTeX are encoded in some *font* encoding, distinct from the input
> encoding that you use in your document.  The inputenc and fontenc
> packages take care of mapping the code positions from the input
> encoding to the font encoding, be it UTF-8 or an 8-bit encoding.  As
> far as the patterns are concerned, they're always encoded in UTF-8 in
> the different hyph-<lang>.tex, and are converted on the fly when input
> by pdfTeX, at format generation time, to whatever font encoding is
> appropriate for the language at hand.  The inputenc / fontenc packages
> then do the job for you, and you can use any encoding you wish in your
> document.
> However, I'm going to venture a wild guess and assume that the
> language you're interested in is Sanskrit, a language which actually
> has patterns disabled for pdfTeX, because we couldn't determine what
> font encoding was appropriate when the patterns were submitted: for
> the vast majority of languages that had patterns when Mojca and I took
> over work on hyphenation files three years ago, there was one single
> 8-bit encoding, that was used by both the pattern file and the Babel
> support files.  Several languages, though, have been added in the mean
> time, including Sanskrit, that had no dedicated 8-bit encoding that we
> could use(*).  We thus decided to make them available for
> Unicode-aware TeX engines only; hence, you don't have access to them
> from pdfTeX.  But if you have a reason to want to use them, we'll
> gladly make them available as well.  That won't be a problem at all;
> we only never considered the issue because we didn't think it would
> come up -- Mojca, what do you think?
> Arthur
> (*) Note that packages to typeset Devanagari in TeX, as well as
> several other Indic scripts, have existed for a long time, but they
> didn't have any hyphenation patterns attached.  These have only be
> added recently from different contributors, and when Mojca found out
> that OpenOffice shipped many pattern files for modern Indian
> languages.  All the files were encoded using UTF-8.
> ---------- Forwarded message ----------
> From: Mojca Miklavec <mojca.miklavec.lists at gmail.com>
> Date: Sun, Jun 12, 2011 at 23:25
> Subject: Re: [tex-hyphen] Unicode patterns for Unicode in Pdflatex/Babel?
> To: "About TeX hyphenation patterns." <tex-hyphen at tug.org>
> As Arthur pointed out, you cannot simply load hyph-foo.tex file with
> UTF-8 patterns without loading other macros to handle UTF-8 *before*
> patterns themselves. (You may try and see what happens.)
> But we tried to make sure that all patterns that can reasonably work
> in pdfTeX do work. Indic languages, Sanskrit, "Ethiopic", Lao etc.
> were disabled in pdfTeX because we had no idea how to handle them.
> I would really like to know what language Daniel is referring to
> before any further discussion.
> If it is really about Sanskrit ... I have no idea how one can type it
> in UTF-8 in pdfTeX. All the examples about Devanagari that I found
> were using ASCII and were pointing to Velthuis' package claiming:
> % Hyphenation
> % ~~~~~~~~~~~
> % The responsibility for hyphenating Devanagari text is taken over
> % completely by the preprocessor, devnag.c. The preprocessor inserts
> % discretionary hyphenation points (\-) in all the places it thinks are
> % appropriate.
> or whatever other older package.
> I would need more input to provide any reasonable answer.
> Mojca

GPG key ID: 1654BD9C

More information about the tex-hyphen mailing list