[tex-hyphen] how to add hyphenation for a new language?

Mojca Miklavec mojca.miklavec.lists at gmail.com
Fri Feb 22 09:50:43 CET 2013


Hello,

On Fri, Feb 22, 2013 at 3:10 AM, levan shoshiashvili wrote:
> hello,
> I have added Georgian language support for LaTeX.
> Have introduced 2 8bit encoding T8M, T8K.
> Everything works fine.
> I just get:
>
> Package babel Warning: No hyphenation patterns were loaded for
>
> (babel) the language `Georgian'
>
> (babel) I will use the patterns loaded for \language=0 instead.
>
>
> Yes i have generated patterns(using patgen) in utf-8 encoding. Converted
> them to T8M encoding.
>
> How to add this patterns to Babel and hyph-utf8 package?

Babel maintenance has recently been taken over by Javier Bezos, but
you might also want to have support in Polyglossia (Arthur Reutenauer)
for XeTeX (and LuaTeX?). Please ask the two of them.

For the hyph-utf8:
If you point me to the files or if you send the files to me, I will
add the patterns to hyph-utf8.
(I would be very grateful if you could also send a list of words that
you used to generate the patterns.)

> I need to make work hyphenation for "Classic"(8 bit) encoding mentioned
> above.

I need to apologise in advance for what I have written below, but I
didn't mean anything bad with it. I'm willing to create


Not to be taken seriously: if I was your teacher, I would make you
write a thousand times on the blackboard "Yes, I really want to make
it work in a new 8-bit encoding" and hoped that you would change your
mind while writing. Users will only be able to use those fonts that
someone prepares for them. (I'm not sure how many fonts currently
exist in your encoding, but I have a high confidence that at some
point everyone will want to switch to XeTeX and LuaTeX anyway,
provided that there would be sufficient support.)

You need to be aware that you have a very, very high luxury at this
moment: if the encoding is new, there is a high chance that very few
documents currently exist written in pdfTeX in that encoding (most
probably these are the documents written by you and your colleagues?)
and if you need to educate users, it's an equally difficult job to
teach them how to use pdfTeX or XeTeX/LuaTeX, but teaching them
XeTeX/LuaTeX would be a lot more future-proof. You could try to
perfect support for XeTeX and LuaTeX and possibly make some OpenType
font (or find someone to make one) in case that existing free fonts
aren't good enough.

At the time when all those 8-bit encoding came to existence, there was
no XeTeX around. It would make life easier if everyone would switch to
XeTeX/LuaTeX now, but there are simply waaaaay too many documents
written for pdfTeX already and it will never be possible to remove
support for those documents. With the luxury you have, you could
easily stir the Georgian writers to use XeTeX/LuaTeX now before they
realize that they would need to modify all the documents a couple of
years later.

I took a look at http://www.ctan.org/tex-archive/fonts/georgian/ and
none of these fonts come even in Type1, let alone OpenType.

Still, I admire the amount of work that you did already.

I took a look at the package (geotex), but I wasn't able to find any
documentation. The fd files all contain a disclaimer:

%% This file may only be distributed together with a copy of the LaTeX
%% `Cyrillic Bundle'. You may however distribute the LaTeX `Cyrillic
%% Bundle' without such generated files.
%%
%% The list of all files belonging to the `Cyrillic Bundle' is
%% given in the file `manifest.txt'.

And the babel file contains:

%% This is file `russianb.sty',
%% generated with the docstrip utility.
%%
%% The original source files were:
%%
%% bbcompat.dtx  (with options: `russianb')
%% This is a generated file.

I believe that this wasn't intentional?

The font seems to be a derivative of DejaVu and still contains
    Copyright (c) 2003 by Bitstream, Inc. All Rights Reserved.
for the copyright, no trace of your name or list of your
modifications, the font name hasn't been changed, only the filename. I
can imagine that the fontname (which hasn't been changed) would cause
problems to XeTeX users when doing a name-based search for DejaVu.

What changes exactly did you have to do to the DejaVu font? That is:
why wasn't the original DejaVu font good enough? Did you draw some
glyphs or do some other changes?

Where can I find the encoding files for TeX, like
    texmf-dist/tex/latex/base/t1enc.def
    texmf-dist/tex/latex/base/t1enc.dfu
?

Do you also have any support for writing in XeTeX?


Now, on a more serious part of it, if you would really like the
patterns to work for 8-bit: to make the patterns work in both UTF-8
and 8-bit I would also need the mapping from your encoding to UTF-8 as
plain text. Here's an example:
    http://tug.org/svn/texhyphen/trunk/hyph-utf8/source/generic/hyph-utf8/data/encodings/ec.dat?view=markup
but any other format would do as long as I can write a simple script
to get the desired format.

Also, before the 8-bit patterns are included it would help a lot to
have at least one font in TeX Live that supports that encoding.

One thing is definitely not clear to me though. You created two
encodings, and there's the third(?) encoding used by those older
metafont fonts. Hyphenation patterns cannot be used with two encodings
as you probably know already, so I have a question: what exactly is
the relation between the two encodings (in the Georgian part)? You
mention that the patterns should be using T8M, but what about texts in
T8K? Aren't the words written with letters from T8K hyphenated at all?
Or are the corresponding glyphs hyphenated in the same way? How did
those who were using Georgian in LaTeX deal with hyphenation issues
until now and which fonts were usually used?

> Package with fonts and other files are hosted here (I have submitted
> previous version to ctan, but this is new version) http://tex.tsu.ge/files/

Mojca


More information about the tex-hyphen mailing list