[tex-hyphen] ptex-specific patterns

Mojca Miklavec mojca.miklavec.lists at gmail.com
Sun May 30 11:56:15 CEST 2010


On Sun, May 30, 2010 at 00:53, Karl Berry  wrote:
>    Akira's email seems to be bouncing (maybe I need to use another one),
>
> fuk... is good, previous jupiter... is bad.

The one with fsci.fuk.kindai.ac.jp was failing. I have removed it from
address book. The one without fsci seems to work fine.

>    But I'm still a bit confused about the fact of whether the engine is
>    supposed to behave more like 8-bit pdfTeX or more like UTF-8 XeTeX,
>
> As I understand it, it is neither.  It does not support UTF-8.  It does
> not support the European 8-bit encodings.  It supports Japanese
> encodings (which are multi-byte).

What exactly does it mean "does not support the European 8-bit encodings"?

There are a few specific questions that I have:

1.) How do I input special characters when I want to typeset some
Slovenian, French, Greek, Russian, ... document (not while loading
patterns, but while typesetting)? I notice that
\usepackage[utf8]{inputenc} works for some basic characters, but maybe
it only works in some special cases. If the first byte of
UTF-8-encoded character happens to coincide with some Japanese
character then it breaks? So that means that it's pretty
"unpredictable" on when exactly inputenc and/or fontenc is going to
break? For example the uppercase Greek Tau 0x03A4 is considered to be
a single character resulting in
    ** ERROR ** Could not find encoding file "H".
when trying to use it in a simple document while latin c with caron
0x010D is considered to be two characters? In such cases (when
hyphenation patterns with current 8bit-engine approach do break), it's
not really clear to me how one would be able to typeset Bulgarian with
ptex at all.

2.) What's the font encoding used for European languages (latin,
cyrillic, greek scripts) then? I guess it's still 8-bit and "EC/T1"
for most languages (or QX, texnansi, T2A, ...)? At least, the
following still works fine:

\documentclass{article}
\usepackage{lmodern}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[slovene]{babel} % after fixing the other issue with
language synonyms reported on TL list

OK, showhyphens is a bit different. LaTeX shows "?" when a non-ascii
letter comes and platex shows a space for both non-ascii letter and
the next character.

2010/5/30 Akira Kakuto wrote:
>
> ptex understands UTF-8 encoding (the default of ptex in TeXLive),
> however ptex is not a native UTF-8 engine.
>
> Furthermode, ptex is not a native 8bit engine, because
> it regards particular sequences of bytes as Japanese
> characters. Therefore 8bit pattern files should be
> replaced by using ^^ab expressions.

3.) So ^^ab need to be the 8bit patterns, not the Unicode ones? I have
also tried to use ^^010d in documents, but it doesn't work. (It would
help if I would understand the point nr. 1 first.)

>    Is there any simple test file & building instructions?
>
> Not that I am aware of.

I have managed to compile the format, so that at least enabled me to
test what definitely doesn't work. However I'm still not sure about
how one is supposed to typeset documents in European languages (given
the fact that one wants to add support for hyphenation patterns for
them).

> Thanks for the observations.  If Akira wants to proceed further, that is
> fine with me.  As far as I'm concerned, we do not need to settle this
> for TL 2010, and (as I said before) I would just as soon not.  There is
> no harm in ptex living in its own universe for a while.

I'm ready to either fix loadhyph-foo.tex files or create new
ptex-specific ones. I'm also ready to auto-generate new patterns which
satisfy the ptex requirements if there's no way to make the necessary
transformations out of UTF-8 patterns in the way we did it for "usual
8-bit engines" (in particular I'm talking about Greek and Cyrillic
patterns). We can also make a separate package (separated from
hyph-utf8, but maintained in the same repository) if that would make
more sense. Though the question of "Would anyone need them at all if
the program doesn't allow an easy way to typeset those languages?"
remains.

But the decision of whether we should use our "architecture for
patterns" should probably be left to Akira (or maybe someone else; I'm
not sure who's the main maintainer of ptex).

Mojca



More information about the tex-hyphen mailing list