[XeTeX] xdvipdfmx, line breaks and hyphenated words

Pablo Rodríguez oinos at web.de
Sun Jan 14 14:06:50 CET 2007

William Adams wrote:
> On Jan 12, 2007, at 2:02 PM, Pablo Rodríguez wrote:
>> one of the things that I think it would be interesting to implement in
>> xdvipdfmx is one feature that Adobe generated documents (such as
>> http://pdf.codev2.cc/Lessig-Codev2.pdf) have: that the searchable text
>> contains no line breaks (within the same paragraphs) and hyphenated
>> works aren't hyphenated in the text within.
> I'm not fully understanding what you're saying here.

Thanks, William, for your answer. Sorry, but I have expressed myself
wrong. And I'm afraid that I chose the wrong example. The right one is

> There's a hyphen ``transla-
> tion'' on the first line of the book (pg.ix).

Let's take “soft-ware” on page xiii. Acrobat is not able to find a
hyphen there. And it copies an unhyphenated word.

> If you mean that Acrobat allows a search for ``translation'' to find  
> ``transla-
> tion'', well that works for TeX document too --- Adobe simply chose  
> to ignore / stitch together parts of words on different lines  
> separated by a hyphen.

My experience is that Acrobat does not find “transla-tion” searching
from “translation” on page ix in Lessig-Codev2.pdf. And it finds
“soft-ware” on page xiii in freeculture.pdf. I think that Acrobat finds
it, because of the way the PDF document is generated (and not because of
a general feature that the one you described).

I have uncompressed freeculture.pdf and edited it with vim, but I was
not able to find the reason why Acrobat is able to find “soft-ware” when
 searching for “software”. No surprise, since I have no knowledge of the
PDF specification.

Thanks for your help,


