[XeTeX] search arabic text in pdf using adobe reader 7.0

François Charette firmicus at ankabut.net
Wed Feb 6 10:00:47 CET 2008


sh a écrit :
> I am using MiKTeX-XeTeX 2.7.2904 (0.997 svn 539) (MiKTeX 2.7) in
> Microsoft Windows XP . I have been successful in creating pdf using
> arabxetex and Scheherazade opentype font.
>
> I am using adobe reader 7.0 to read the pdf. When I copy the arabic
> characters from the pdf, I get garbage characters when I paste it to
> MS Word (which is set to use the unicode Arial MT font). The
> individual characters copy just fine, but the characters that are in
> the intermediate form do not get copied.
>
> What do I need to do to able to copy the characters from the pdf. The
> pdf is encoded with identity-H/CID. I suspect I need to do something
> with Cmap or mapping?
>   
This seems to be an issue (not only for copying but also for searching) 
with the font Scheherazade, which also occurs when it is typeset with 
plain xetex (and so is not related to your operating system or your PDF 
viewer). In fact, only *isolated* characters can be correctly copied or 
searched, the other characters come out, as you say, as "garbage" 
(actually as characters with code-points above U+100000, in the 
so-called "Supplementary Private Use Area B" of Unicode). I suppose 
Jonathan should be able to tell us more about this...

In a PDF file with two identical Arabic paragraphs, one set in 
Scheherazade (with heading in Lateef) and the second with Lotus Linotype 
(a commercial font), copying and searching works without problem with 
the former, but not with Scheherazade or Lateef. (Note that all three 
fonts are encoded with Identy-H/CID). See the attachment, where the 
first paragraph is Lateef+Scheherazade and the second Lotus Linotype.

I intend to test this with other Arabic fonts later.

FC
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ArabicOpenType.pdf
Type: application/pdf
Size: 55469 bytes
Desc: not available
Url : http://tug.org/pipermail/xetex/attachments/20080206/6cabeed9/attachment-0001.pdf 


More information about the XeTeX mailing list