[XeTeX] Fontspec question
Jonathan Kew
jonathan_kew at sil.org
Thu Sep 7 14:19:21 CEST 2006
On 7 Sep 2006, at 1:01 pm, Ralf Stubner wrote:
> Peter Dyballa <Peter_Dyballa at Web.DE> writes:
>
>> On a side mark, Will: could you add \usepackage{cmap} to your TeX
>> source of fontspec?
>
> cmap.sty is specific for pdfTeX in PDF-mode with non-virtual fonts.
> So I
> doubt it would help here. Recently pdfTeX has aquired an automatic
> CMAP-
> generator, which uses the glyph names as basis. Similar things
> exist in
> (x)dvipdfmx.
Right; it attempts to synthesize CMap resources based on glyph names
in OpenType/TrueType fonts.
With xdv2pdf (which Peter may have been using), I have no real
control over what happens, as it's all handled by Apple's Quartz
(CoreGraphics) framework. It often seems to work pretty well, but
there may well be cases that aren't handled.
> For XeTeX it would of course be best if a CMAP where
> generated based on the Unicode /input/, since that would work even
> when
> glyphames are wrong or missing.
However, the whole business of extracting text/searching/etc in PDF
files based on CMap resources is a mess, and my advice would be to
regard PDF as a medium for viewing and printing, not for text data
exchange. The stream of glyphs present in the PDF may have very
complex relationships to the underlying Unicode text -- consider, for
example, Indic scripts where there is extensive reordering of
elements within the syllable. As I understand it, to search for
"hindi" in a PDF with Acrobat, you'd effectively have to type "ihndi"
as the search string (and that's just a small example; it gets much
worse).
Sure, it's nice (especially for plain English text) when copy/paste
and text search give you a good approximation of what you'd expect,
but until there's a (widely-supported) way to "annotate" the glyph
stream in the PDF with the associated Unicode text, rather than
attempting to recover Unicode characters from the actual sequence of
glyphs, it will never really be universal and reliable. The character-
to-glyph process is not fully reversible; there's too much complexity
and potential ambiguity in the mappings and transformations.
JK
More information about the XeTeX
mailing list