[XeTeX] xdvipdfmx, line breaks and hyphenated words

William Adams will.adams at frycomm.com
Fri Jan 12 21:02:32 CET 2007


On Jan 12, 2007, at 2:02 PM, Pablo Rodríguez wrote:

> one of the things that I think it would be interesting to implement in
> xdvipdfmx is one feature that Adobe generated documents (such as
> http://pdf.codev2.cc/Lessig-Codev2.pdf) have: that the searchable text
> contains no line breaks (within the same paragraphs) and hyphenated
> works aren't hyphenated in the text within.

I'm not fully understanding what you're saying here.

There's a hyphen ``transla-
tion'' on the first line of the book (pg.ix).

The book was created w/ Quark XPress v7, so the not hyphenating  
hyphenated words was probably done by manually inserting a  
discretionary hyphen at the beginning of such compounds (that's how  
QXP 6.5 and earlier has done it --- see a recent post on this to  
comp.text.tex by yours truly about having to do it by hand).

AFAIK TeX won't hyphenate a word which contains a hyphen. I'm not  
sure if this is changed in LaTeX or no. If it's not, easy enough to  
introduce a ``\allowbreak'' at need. You could do a variation of what  
I do, searching for all instances of ``-'' and replacing those which  
warrant it w/ ``-\allowbreak '' or ``-\allowbreak%
''.

If you mean that Acrobat allows a search for ``translation'' to find  
``transla-
tion'', well that works for TeX document too --- Adobe simply chose  
to ignore / stitch together parts of words on different lines  
separated by a hyphen.

Try it:

\documentclass{minimal}
\begin{document}
\noindent Transla-\\
tion
\end{document}

This has some un-intended consequences though, consider that ``compound-
interest'' will be found, even when one is searching for  
``compoundinterest'' (If I could I'd think of a good example word  
pair where that would make a difference).

William

-- 
William Adams
senior graphic designer
Fry Communications



This email message and any files transmitted with it contain information
which is confidential and intended only for the addressee(s). If you are
not the intended recipient(s), any usage,  dissemination, disclosure, or
action taken in  reliance on it is prohibited.  The reliability of  this
method of communication cannot be guaranteed.  Email can be intercepted,
corrupted, delayed, incompletely transmitted, virus-laden,  or otherwise
affected during transmission. Reasonable steps have been taken to reduce
the risk of viruses, but we cannot accept liability for damage sustained
as a result of this message. If you have received this message in error,
please immediately delete it and all copies of it and notify the sender.


More information about the XeTeX mailing list