[XeTeX] [tex-hyphen] Help with UTF-8 Language

Fri Oct 10 03:54:04 CEST 2014

Thank you all for your replies!
My programming abilities are quite limited and I realize there aren't many
people who need to make hyphenation dictionaries, hence the lack of good
Unicode support. But would someone be willing to help with a little more
step-by-step help? I am a little confused as how best to map the Khmer
Unicode characters to 8-bit values.
I think it would be quite useful to post a tutorial of the process once I
am done so others can more easily create hyphenation dictionaries for
languages that don't have them yet (I have yet to find a good tutorial
anywhere).
Thanks again for your help,
Nathan

On Fri, Oct 10, 2014 at 3:57 AM, Philip Taylor <P.Taylor at rhul.ac.uk> wrote:

>
>
> Mojca Miklavec wrote:
>
> > No, the patterns should work just fine with a large alphabet.
>
> This part I do not understand, Mojca; surely the patterns /define/ the
> size of the alphabet, do they not ?  If letter <xqqyn> is not in the
> patterns, then TeX cannot hyphenate a word containing letter <xqqyn>,
> can it ?
>
> > 2.) The fact that "patgen" is limited, "opatgen" is defunct and
> > nobody else stepped up yet to create a new tool in some modern
> > programming language with built-in Unicode support (or with some
> > modern C(++) libraries) has nothing to do with XeTeX's ability to
> > interpret patterns. If we don't have a tool that can generate
> > patterns for large alphabets that doesn't mean that XeTeX cannot
> > handle such patterns.
>
> It was, in part, the existence or otherwise of such a tool that
> interested me, as well as whether XeTeX could natively handle such
> patterns were they to be generatable ...  A /very/ quick look at
> Patgen.web suggests (to me) that a re-implementation in Perl might be
> the fastest way forward (Patgen is run so infrequently that the run-time
> overheads of an interpreted language are irrelevant) but I regret I have
> too much on my plate at the moment to volunteer to investigate further.
>
> > The patterns can probably be created with patgen with some ugly
> > tweaking (as Jonathan suggested).
>
> That is indeed seriously ugly.  The sooner the whole of the TeX suite
> has native UTF-8 clones, the more chance there is of TeX surviving into
> the 22nd century, it seems to me.
>
> ** Phil.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20141010/98ae56d4/attachment.html>