[XeTeX] XeTeX for Linux and xdvipdfmx
jonathan_kew at sil.org
Tue Jun 6 10:33:27 CEST 2006
On 6 Jun 2006, at 9:14 am, Ralf Stubner wrote:
> Pablo Rodríguez <oinos at web.de> writes:
>> PDF documents generated from XeLaTeX have fonts embedded many times
>> (with characters duplicated in subset fonts) and some characters
>> be copied/extracted with pdftotext or Adobe Reader.
> I have encountered a similar, maybe related issue. For text extraction
> to work reliably it is useful for the PDF file to contain a cmap/
> toUnicode table. (x)dvipdfmx when working on a dvi file generates
> such a
> table based on the glyph names when usig a Type1 fonts. In particular,
> when (x)dvipdfmx finds a glyph named <base>.<variant> and the unicode
> position for a glyph named <base> is known, one will get <base> from
> text extraction. Typical example would be small caps named 'a.sc' etc,
> where text extraction would find 'a'.
Yes, that's how it is supposed to work.
> When using xetex in conjunction with an OpenType font that no longer
> works. If the small caps are encoded in the PUA (eg MinionPro),
> xdvipdfmx seems to embed this into the toUnicode table. If the small
> caps are unencoded (eg Palatino Linotype), xdvipdfmx gives warning
> messages that for certain glyphs there is no unicode mapping
> I don't know what information is present in the xdv file. I assume
> it is
> only informtation about glyphs, not about characters which xetex still
> knows (after all, a.sc is accessed as 'a + smcp feature').
Right - the xdv information is entirely glyph-based.
> But maybe the
> method with glyph names used by (x)dvipdfmx when working on dvi files
> with Type1 fonts could be used here, too.
That's the idea, but this area is incomplete. There's plenty of work
still to be done in xdvipdfmx to improve the font management; the
current code is just an initial attempt to get basic output.
Does it make a difference whether you're using TrueType- or CFF-
flavored OpenType, BTW?
More information about the XeTeX