[XeTeX] hyphenation in Ethiopian languages

Mojca Miklavec mojca.miklavec.lists at gmail.com
Thu Nov 4 21:04:44 CET 2010


(I'm adding the TeX hyphenation mailing list to recipients; I
apologise for cross-posting. Hyphenation-patterns-related discussion
may continue on hyphenation list (or off-list if needed). XeLaTeX
issues, in particular "how not to start the line with
word-or-sentence-separator" may stay on the XeTeX list since that's
more or less engine- and polyglossia-related.)

On Thu, Nov 4, 2010 at 15:53, Gareth Hughes wrote:
> Dear Adam,
>
> Line 7 of gloss-amharic.ldf in the polyglossia package has
>
>  hyphennames={amharic,nohyphenation},
>
> which I take to mean that you'll get no hyphenation wherever 'amharic'
> is active. The next line is commented out
>
>  %hyphenmins={2,2},
>
> so I presume that some rules were intended (François?). If the rules are
> that hyphenation can occur anywhere, I'm sure this would be fairly
> easily to implement.

An example of hyphenation patterns is attached. I do not claim that
the patterns work perfectly (they probably don't, but it might be a
starting point). I simply added a number 1 after each valid Unicode
character between U+1200 and U+135A (without removing non-existing
characters in Amharic and without using those from Unicode 6,
2D80–2DDF).

1.) You need to put the file hyph-am.tex into
    /usr/local/texlive/2010/texmf-dist/tex/generic/hyph-utf8/patterns/tex/hyph-am.tex
2.) Put loadhyph-am.tex into
    /usr/local/texlive/2010/texmf-dist/tex/generic/hyph-utf8/loadhyph/loadhyph-am.tex
3.) Add
    amharic loadhyph-am.tex
to
    /usr/local/texlive/2010/texmf-var/tex/generic/config/language.dat
4.) Change "%hyphenmins={2,2}," into "hyphenmins={1,1}," in
    /usr/local/texlive/2010/texmf-dist/tex/xelatex/polyglossia/gloss-amharic.ldf
5.) Run
    sudo fmtutil-sys --byfmt xelatex


You can also test with the following (keep the rest of document unchanged):

\newdimen\savehsize
\savehsize\hsize
\def\test#1{\endgraf\hsize=1pt\noindent #1\endgraf\hsize=\savehsize}

\begin{amharic}
\test{እስመ ፡ አግዚአብሔር ፡ አምላክ ፡ ማእምር ፡ ውእቱ ። እግዚአብሔር ፡ አስተደወ ፡ መንብሮ ።
ወአድከመ ፡ ቅሥተ ፡ ኀያላን ። ወአቅነቶሙ ፡ ኀይለ ፡ ለድኩማን ። ጽጉማን ፡ እክል ፡ ርኅቡ ። ወርኁባን ፡
ጸግቡ ። እስመ ፡ መካን ፡ ወለደት ፡ ሰብዐተ ፡ ወወለድሰ ፡ ስእነት ፡ ወሊደ ፡ እግዚአብሔር ፡ ይቀትል ፡
ወየሐዩ ። ያወርድኒ ፡ ውስተ ፡ ሲእል ፡ ወየዐርግ ። እግዚአብሔር ፡ ያነዲ ፡ ወያብዕል ። ያኀስርሂ ፡
ወያከብር ፡ ዘያነሥኦ ፡ እምድር ፡ ለነዳይ ። ከመ ፡ ያንብሮ ፡ ምስለ ፡ ዓበይ[ተ] ።}
\end{amharic}

The problem of colons that may not start a new line has to be solved
on a different level. You could write like that:

እስመ~፡ አግዚአብሔር~፡ አምላክ~፡ ማእምር~፡ ውእቱ~። እግዚአብሔር~፡ አስተደወ~፡ መንብሮ~። ወአድከመ~፡
ቅሥተ~፡ ኀያላን~። ወአቅነቶሙ~፡ ኀይለ~፡ ለድኩማን~። ጽጉማን~፡ እክል~፡ ርኅቡ~። ወርኁባን~፡ ጸግቡ~።
እስመ~፡ መካን~፡ ወለደት~፡ ሰብዐተ~፡ ወወለድሰ~፡ ስእነት~፡ ወሊደ~፡ እግዚአብሔር~፡ ይቀትል~፡ ወየሐዩ~።
ያወርድኒ~፡ ውስተ~፡ ሲእል~፡ ወየዐርግ~። እግዚአብሔር~፡ ያነዲ~፡ ወያብዕል~። ያኀስርሂ~፡ ወያከብር~፡
ዘያነሥኦ~፡ እምድር~፡ ለነዳይ~። ከመ~፡ ያንብሮ~፡ ምስለ~፡ ዓበይ[ተ]~።

This works perfectly fine, but you probably don't want to write like
that. I leave it up to others to solve that problem. The hyphenchar
can easily be changed to "nothing" though.

Mojca

> Adam McCollum wrote:
>> Dear list members,
>>
>> I've recently drawn up a short document in Ge`ez (classical Ethiopic) using
>> Polyglossia and I see that the hyphenation is wrong. As some of you know,
>> languages that use the Ethiopic script, including Ge`ez and Amharic, place a
>> word divider—it looks somewhat like a thick colon—between each word and two
>> of these dividers side by side between sentences; see some Amharic examples
>> here<http://books.google.com/books?id=r87yh5z66TEC&printsec=frontcover&dq=amharic&hl=en&ei=U7TSTIX-Ds2r8AaT6LxF&sa=X&oi=book_result&ct=book-thumbnail&resnum=6&ved=0CEwQ6wEwBQ#v=onepage&q&f=false>.
>> That being the case, a word may be broken at any syllable (the script is a
>> syllabary, not an alphabet) at the end of a line, but there is nothing
>> corresponding to a hyphen. An additional matter of importance is that no
>> line should begin with the single or double word divider. How should this be
>> fixed?
>>
>> Here is a minimal example:
>>
>> \documentclass[12pt]{article}
>>
>> \usepackage{fontspec}
>> \usepackage{polyglossia}
>>
>> \setmainlanguage{english}
>> \setotherlanguage{amharic}
>>
>> \newfontfamily\amharicfont[Script = Ethiopic, Scale = 1.3]{Abyssinica SIL}
>>
>> \begin{document}
>>
>> \title{Sample in Gǝ`ǝz}
>> \maketitle
>>
>> \begin{amharic}
>> እስመ ፡ አግዚአብሔር ፡ አምላክ ፡ ማእምር ፡ ውእቱ ። እግዚአብሔር ፡ አስተደወ ፡ መንብሮ ። ወአድከመ ፡ ቅሥተ ፡
>> ኀያላን ። ወአቅነቶሙ ፡ ኀይለ ፡ ለድኩማን ። ጽጉማን ፡ እክል ፡ ርኅቡ ። ወርኁባን ፡ ጸግቡ ። እስመ ፡ መካን ፡
>> ወለደት ፡ ሰብዐተ ፡ ወወለድሰ ፡ ስእነት ፡ ወሊደ ፡ እግዚአብሔር ፡ ይቀትል ፡ ወየሐዩ ። ያወርድኒ ፡ ውስተ ፡ ሲእል
>> ፡ ወየዐርግ ። እግዚአብሔር ፡ ያነዲ ፡ ወያብዕል ። ያኀስርሂ ፡ ወያከብር ፡ ዘያነሥኦ ፡ እምድር ፡ ለነዳይ ። ከመ ፡
>> ያንብሮ ፡ ምስለ ፡ ዓበይ[ተ] ።
>> \end{amharic}
>>
>> \end{document}
>>
>> With many thanks in advance for the help,
>>
>> Adam McCollum, Ph.D.
>> Lead Cataloger, Eastern Christian Manuscripts
>> Hill Museum & Manuscript Library
>> Saint John's University
>> P.O. Box 7300
>> Collegeville, MN 56321
>>
>> (320) 363-2075 (phone)
>> (320) 363-3222 (fax)
>> www.hmml.org
> --
> Gareth Hughes
> Doctoral candidate in Syriac studies
>
> Department of Eastern Christianity
> Oriental Institute
> Pusey Lane
> Oxford
> OX1 2LE
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hyph-am.tex
Type: application/x-tex
Size: 1644 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20101104/5a8afe40/attachment.tex>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: loadhyph-am.tex
Type: application/x-tex
Size: 296 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20101104/5a8afe40/attachment-0001.tex>


More information about the XeTeX mailing list