[XeTeX] help with hyphenation
John Was
john.was at ntlworld.com
Sat Feb 2 19:58:19 CET 2008
Hello
I don't know about the intricacies of Babel and so on - so far I have
confined myself to using plain TeX. But I can certainly say that all of the
hyphenations you have listed would normally be considered acceptable/correct
by modern conventions:
The -ing ending of the verb is treated as separable, and always has been as
far as I can tell. That takes care of four of your examples.
pro-ducts conforms to the usual rule that a single consonant is taken over,
and in this word it corresponds to the correct etymological division (Latin
pro plus ducere).
under-stand is also a correct etymological division, and conforms to the
usual rule that two consonants are split.
pos-sesses conforms to the usual rule that a double consonant may be
divided.
This leaves ope-rating and natu-ral, which seem - indeed are -
contradictory, and here we are up against the difficult problem of 'r': the
four letters l m n r (mutes and liquids) have been seen as special since
ancient times (where they can have an unexpected effect in Latin metre).
Latin can sometimes help in these Latin-derived words, but here that would
give:
ope-rat-ing (treating the -ing as separable as indicated above)
na-tu-ral
But for reasons quite difficult to define the 'r' is often kept back. Your
hyphenation algorithm has taken it over, and that is perfectly defensible,
though I see that the Oxford Spellin Dictionary has:
op-er-at-ing, specifying the division AFTER the r as the optimum one for the
word
and
nat-ural, disallowing any other hyphenation except after the 't' (I find
this weird!).
But the basic message is that your implementation is producing good hyphens
in English so there's no need to worry unduly! You can always force
hyphenation explicitly in any given word that is proving awkward - plain TeX
already does that with manu-script, since the algorithm it employs would
produce manus-cript.
You certainly don't want to implement Welsh rules - Welsh keeps the
consonant back, whereas English usually takes it over to the next line.
John
----- Original Message -----
From: <ashinpan at gmail.com>
To: <xetex at tug.org>
Sent: Saturday, February 02, 2008 5:45 PM
Subject: [XeTeX] help with hyphenation
Hi! all
I having been using XeTex as part of TexLive 2007 on Ubuntu (Gutsy
Gibbon). My documents are mainly in English but Pali and Sanskrit are
often embedded in them.
The problem I am facing is some words are getting randomly hyphenated
before linebreaks. I tried to change the language to Welsh, of which I
have no language file, to force manual hyphenation but no use. Again I
tried to remove the package Babel, but the problem still persists.
Below is the tex source that I use. I have also attached a PDF file
that XeTex produces on my machine from that source. In that PDF file,
I see the following unnatural mid-line hyphenations, provided together
with respective line numbers:
design-ing (1)
ope-rat-ing (2)
pro-ducts (5)
fly-ing (6)
pos-sesses (8)
under-stand(9)
mak-ing (18)
natu-ral (32)
I hope someone would kindly help me out.
Ven. Pandita
--------------------------------------------------------------------------
The Tex
Source ---------------------------------------------------------------------------------------------------------
\documentclass[11pt,welsh]{scrbook}
\usepackage{geometry}
\geometry{verbose,paperwidth=6in,paperheight=9in,tmargin=63pt,bmargin=36pt,lmargin=54pt,rmargin=36pt,headheight=15pt,headsep=18pt,footskip=36pt}
\usepackage{fancyhdr}
\pagestyle{fancy}
\usepackage{jurabib}[2004/01/25]
\makeatletter
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% User specified LaTeX commands.
\usepackage{fontspec}
\usepackage{xunicode}
\usepackage{xltxtra}
\setmainfont[Mapping=tex-text]{Charis SIL}
\setsansfont[Mapping=tex-text]{Tahoma}
\setmonofont[Mapping=tex-text]{FreeMono}
\usepackage{lineno}
\linenumbers
\let\ps at plain = \ps at empty
\fancyhead{}
\fancyfoot{}
\rhead{Pandita \ \thepage}
\renewcommand{\headrulewidth}{0.1pt}
\setlength{\parindent}{20pt}
\deffootnote{1em}{1em}{\thefootnotemark .\ }
\date{}
\makeatother
\begin{document}
\begin{quote}
In designing to accommodate visibility, each function and the method
of operating it would be apparent (to most people in the culture for
which it is intended) by merely looking at it. (Raskin ch. 3.4)
\end{quote}
Raskin's standard of visibility may not be applicable to certain
products which absolutely require instruction or training (E.g.
flying an aeroplane). But whitespace does have the kind of visibility
he prefers. It not only makes word boundaries visible but also
possesses visibility in itself, i. e., readers need no instruction to
understand and make use of its function. Why? Because it is already a
part of the reader model; at the time most readers come to meet
Romanized Pali texts, they have already been familiar with English
and/or other modern European
languages, which use whitespace for the same function. When they meet
it again in Pali texts, they would be tempted to treat it similarly
only to succeed. This is a typical case of {}``a system behaving
exactly as users thought it would\char`\"{}.
On the contrary, word boundaries in manuscripts are not visible,
making it harder to read.
\subsection{Whitespace provides good mappings }
Mapping is {}``a relationship between controls and their movements or
effects\char`\"{} (Lidwell, et al. 128) and, in the case of texts,
punctuation and whitespace are controls that readers use for guidance
through a text being read. But how should we define a good mapping?
\begin{quote}
Good mapping is primarily a function of similarity of layout,
behavior, or meaning. When the layout of stovetop controls corresponds
to the layout of burners, this is similarity of layout; when turning a
steering wheel left turns the car left, this is similarity of
behavior; when an emergency shut-off button is colored red, this is
similarity of meaning (e. g., most people associate red with stop)
(ibid)
\end{quote}
In modern Pali texts, whitespace maps to a word boundary while a
continuous string of text maps to a word. Those mappings are natural
in the sense that they are similar in layout to those in modern
European language texts, which readers are already familiar with, and
consequently which they can readily understand.
On the contrary, manuscripts provide no useful mappings at all.
\end{document}
--------------------------------------------------------------------------------
> _______________________________________________
> XeTeX mailing list
> postmaster at tug.org
> http://tug.org/mailman/listinfo/xetex
>
More information about the XeTeX
mailing list