[XeTeX] hyphenation in Ethiopian languages

Mojca Miklavec mojca.miklavec.lists at gmail.com
Fri May 6 19:03:25 CEST 2011


On Thu, Nov 4, 2010 at 14:42, Adam McCollum wrote:
> Dear list members,
> I've recently drawn up a short document in Ge`ez (classical Ethiopic) using
> Polyglossia and I see that the hyphenation is wrong. As some of you know,
> languages that use the Ethiopic script, including Ge`ez and Amharic, place a
> word divider—it looks somewhat like a thick colon—between each word and two
> of these dividers side by side between sentences; see some Amharic examples
> here. That being the case, a word may be broken at any syllable (the script
> is a syllabary, not an alphabet) at the end of a line, but there is nothing
> corresponding to a hyphen. An additional matter of importance is that no
> line should begin with the single or double word divider. How should this be
> fixed?

Dear Adam,

We have submitted Ethiopic hyphenation patterns to CTAN (and TL) a
while ago, so once you update you TeX Live, it should work out of the
box.

However there is a nasty limitation in XeTeX: words hyphenate only up
to 64 characters, so unless somebody fixes XeTeX, you need other
tricks and workarounds. The code below inserts a breakable space
before every word separator (and thus allows XeTeX to start breaking
the next word from scratch). In addition to that you also need to make
sure that:
- there is no hyphenation character at the end of line
- lines are properly aligned
- you might want (or not) some extra space around word and sentence delimiters

Together with Arthur we created the following working example, but it
would be great if François would include some of that code into
Polyglossia.

If you want to have space around word delimiters, you need to create
some non-breakable space in front of delimiter and some breakable
space after the delimiter. The amount of space might need to be
configurable. My estimates might not be the best ones (0.4 +/- 0.1
em), so feel free to fix to the most suitable values. Apart from that
you might want to have both spaces of equal size (I wasn't sure how to
achieve that).

\documentclass[12pt]{article}
\usepackage{fontspec}
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage{amharic}
\newfontfamily\amharicfont[Script = Ethiopic, Scale = 1.3]{Abyssinica SIL}

\newXeTeXintercharclass \ethiletter
\newXeTeXintercharclass \ethispace
\newcount\tmp
\def\setclass[#1-#2]#3{%
  \tmp=#1
  \XeTeXcharclass\tmp=#3
  \loop\ifnum\tmp<#2
    \advance\tmp by 1
    \XeTeXcharclass\tmp=#3
  \repeat}
\setclass["1200-"139F]\ethiletter

\XeTeXinterchartokenstate=1
\XeTeXcharclass"1361\ethispace
\XeTeXcharclass"1362\ethispace

\XeTeXinterchartoks \ethispace \ethiletter = {\egroup\hskip.4em plus
.1em minus .1em}
\XeTeXinterchartoks \ethiletter \ethispace = {\kern.4em\bgroup}

\begin{document}
\title{Sample in Gǝ`ǝz}
\maketitle

% \hsize=8cm

\begin{amharic}

\hyphenchar\font=0
እስመ፡አግዚአብሔር፡አምላክ፡ማእምር፡ውእቱ።እግዚአብሔር፡አስተደወ፡መንብሮ።ወአድከመ፡ቅሥተ፡ኀያላን።ወአቅነቶሙ፡ኀይለ፡ለድኩማን።ጽጉማን፡እክል፡ርኅቡ።ወርኁባን፡ጸግቡ።እስመ፡መካን፡ወለደት፡ሰብዐተ፡ወወለድሰ፡ስእነት፡ወሊደ፡እግዚአብሔር፡ይቀትል፡ወየሐዩ።ያወርድኒ፡ውስተ፡ሲእል፡ወየዐርግ።እግዚአብሔር፡ያነዲ፡ወያብዕል።ያኀስርሂ፡ወያከብር፡ዘያነሥኦ፡እምድር፡ለነዳይ።ከመ፡ያንብሮ፡ምስለ፡ዓበይ[ተ]~።
\end{amharic}

\end{document}

Please let us know if that works the way you want it to work. If you
need a LuaTeX solution, please let us know as well.

Mojca

PS: You could also simply use
    \XeTeXinterchartoks \ethiletter \ethiletter = {\hskip0pt}
and thus avoid the need for any hyphenation patterns at all.



More information about the XeTeX mailing list