[XeTeX] help with hyphenation

John Was john.was at ntlworld.com
Sat Feb 2 19:58:19 CET 2008


Hello

I don't know about the intricacies of Babel and so on - so far I have 
confined myself to using plain TeX.  But I can certainly say that all of the 
hyphenations you have listed would normally be considered acceptable/correct 
by modern conventions:

The -ing ending of the verb is treated as separable, and always has been as 
far as I can tell.  That takes care of four of your examples.

pro-ducts conforms to the usual rule that a  single consonant is taken over, 
and in this word it corresponds to the correct etymological division (Latin 
pro plus ducere).

under-stand is also a correct etymological division, and conforms to the 
usual rule that two consonants are split.

pos-sesses conforms to the usual rule that a double consonant may be 
divided.

This leaves ope-rating and natu-ral, which seem - indeed are - 
contradictory, and here we are up against the difficult problem of 'r':  the 
four letters l m n r (mutes and liquids) have been seen as special since 
ancient times (where they can have an unexpected effect in Latin metre). 
Latin can sometimes help in these Latin-derived words, but here that would 
give:

ope-rat-ing (treating the -ing as separable as indicated above)
na-tu-ral

But for reasons quite difficult to define the 'r' is often kept back.  Your 
hyphenation algorithm has taken it over, and that is perfectly defensible, 
though I see that the Oxford Spellin Dictionary has:

op-er-at-ing, specifying the division AFTER the r as the optimum one for the 
word

and

nat-ural, disallowing any other hyphenation except after the 't' (I find 
this weird!).

But the basic message is that your implementation is producing good hyphens 
in English so there's no need to worry unduly!  You can always force 
hyphenation explicitly in any given word that is proving awkward - plain TeX 
already does that with manu-script, since the algorithm it employs would 
produce manus-cript.

You certainly don't want to implement Welsh rules - Welsh keeps the 
consonant back, whereas English usually takes it over to the next line.



John






----- Original Message ----- 
From: <ashinpan at gmail.com>
To: <xetex at tug.org>
Sent: Saturday, February 02, 2008 5:45 PM
Subject: [XeTeX] help with hyphenation


Hi! all

I having been using XeTex as part of TexLive 2007 on Ubuntu (Gutsy
Gibbon). My documents are mainly in English but Pali and Sanskrit are
often embedded in them.

The problem I am facing is some words are getting randomly hyphenated
before linebreaks. I tried to change the language to Welsh, of which I
have no language file, to force manual hyphenation but no use. Again I
tried to remove the package Babel, but the problem still persists.

Below is the tex source that I use. I have also attached a PDF file
that XeTex produces on my machine from that source. In that PDF file,
I see the following unnatural mid-line hyphenations, provided together
with respective line numbers:

design-ing (1)
ope­-rat-­ing (2)
pro-ducts (5)
fly-ing (6)
pos-sesses (8)
under-stand(9)
mak-ing (18)
natu-ral (32)

I hope someone would kindly help me out.

Ven. Pandita

--------------------------------------------------------------------------
The Tex 
Source ---------------------------------------------------------------------------------------------------------

\documentclass[11pt,welsh]{scrbook}
\usepackage{geometry}
\geometry{verbose,paperwidth=6in,paperheight=9in,tmargin=63pt,bmargin=36pt,lmargin=54pt,rmargin=36pt,headheight=15pt,headsep=18pt,footskip=36pt}
\usepackage{fancyhdr}
\pagestyle{fancy}
\usepackage{jurabib}[2004/01/25]

\makeatletter
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% User specified LaTeX commands.
\usepackage{fontspec}
\usepackage{xunicode}
\usepackage{xltxtra}
\setmainfont[Mapping=tex-text]{Charis SIL}
\setsansfont[Mapping=tex-text]{Tahoma}
\setmonofont[Mapping=tex-text]{FreeMono}
\usepackage{lineno}
\linenumbers
\let\ps at plain = \ps at empty
\fancyhead{}
\fancyfoot{}
\rhead{Pandita \   \thepage}
\renewcommand{\headrulewidth}{0.1pt}
\setlength{\parindent}{20pt}
\deffootnote{1em}{1em}{\thefootnotemark .\ }
\date{}
\makeatother

\begin{document}

\begin{quote}
In design­ing to accommodate visibility, each function and the method
of ope­rat­ing it would be apparent (to most people in the culture for
which it is intended) by merely looking at it. (Raskin ch. 3.4)
\end{quote}
Raskin's standard of visibility may not be applicable to certain
pro­ducts which absolutely require instruction or training (E.g.
fly­ing an aeroplane). But whitespace does have the kind of visibility
he prefers. It not only makes word boundaries visible but also
pos­sesses visibility in itself, i. e., readers need no instruction to
under­stand and make use of its function. Why? Because it is already a
part of the reader model; at the time most readers come to meet
Romanized Pali texts, they have already been familiar with English
and/or other modern European
languages, which use whitespace for the same function. When they meet
it again in Pali texts, they would be tempted to treat it similarly
only to succeed. This is a typical case of {}``a system behaving
exactly as users thought it would\char`\"{}.

On the contrary, word boundaries in manuscripts are not visible,
mak­ing it harder to read.


\subsection{Whitespace provides good mappings }

Mapping is {}``a relationship between controls and their movements or
effects\char`\"{} (Lidwell, et al. 128) and, in the case of texts,
punctuation and whitespace are controls that readers use for guidance
through a text being read. But how should we define a good mapping?

\begin{quote}
Good mapping is primarily a function of similarity of layout,
behavior, or meaning. When the layout of stovetop controls corresponds
to the layout of burners, this is similarity of layout; when turning a
steering wheel left turns the car left, this is similarity of
behavior; when an emergency shut-off button is colored red, this is
similarity of meaning (e. g., most people associate red with stop)
(ibid)
\end{quote}

In modern Pali texts, whitespace maps to a word boundary while a
continuous string of text maps to a word. Those mappings are natu­ral
in the sense that they are similar in layout to those in modern
European language texts, which readers are already familiar with, and
consequently which they can readily understand.

On the contrary, manuscripts provide no useful mappings at all.

\end{document}



--------------------------------------------------------------------------------


> _______________________________________________
> XeTeX mailing list
> postmaster at tug.org
> http://tug.org/mailman/listinfo/xetex
> 


More information about the XeTeX mailing list