[XeTeX] PDFs and advanced font features

Thu Oct 28 17:05:30 CEST 2010

If you need an example of the problem, see the XeTeX manual at

http://tug.ctan.org/tex-archive/info/xetexref/XeTeX-reference.pdf

On page 4 there are two examples of small caps usage. On my computer,
at least, the first one (Warnock Pro in italic+small caps) cannot be
copied correctly. The second example (in Hoefler Text, bold+small
caps) however does work. I suspect Hoefler Text uses a different font
file for the small caps rather than feature tags in a font with normal
minuscules.

--Bogdan Butnaru

On Thu, Oct 28, 2010 at 16:18, Bogdan Butnaru <bogdanb+xetex at gmail.com> wrote:
> Hello!
>
> I’m having a problem with the way the advanced font features of XeTeX
> interact with PDF reader programs. I’m not exactly sure where exactly
> is the culprit, so I apologize if this is not the right place to ask
> for help; (re-)directions are welcome if such is the case.
>
> I’ve been writing my CV (I think the more correct US term is resume)
> in LaTeX, using xelatex to compile it to PDF. I managed to get it to
> look pretty much exactly as I wanted. (I’m not quite a typography
> expert, but I’m quite pleased with the result if I may say so.)
>
> The document uses a nice font with many OpenType features like small
> and titling capitals, lining and old-style numerals, and superscripts
> and the like. (Those are the ones I use, there are others.) Therein
> lies the problem: as far as I can tell “variant” characters, like
> small-caps or superscript letters, are represented as additional
> (private) code-points within the font, rather than as separate fonts.
> For display and printing, this is not a problem: the font is embedded
> in the PDF, and everywhere I tried it it seems to look as it should.
>
> However, when copying and pasting the contents in another program—big
> failure. Everything that isn’t displayed in the “normal” variant is
> copied to the clipboard as a set of (what I believe to be) private
> codepoints rather than the “semantic” Unicode codepoints it
> represents.
>
> This is a big problem for this document, as I expect a potential
> employer might try to copy&paste parts of it (e.g., address) and fail
> unexpectedly (getting gibberish).
>
> I’ve tried searching for solutions or workarounds, with little
> success. If (as I assume) this is a well-known problem, don’t hesitate
> to just point me towards a document that explains it.
>
> I’ve seen PDF documents that seemed to have a kind of “text overlay”:
> these were all scanned documents with (I assume) some kind of OCR
> processing. For display and printing purposes, only the scanned image
> was used (i.e., the OCRed text was invisible). However, when selecting
> (and copy/pasting), a text layer was used.
>
> I’ve no idea what PDF feature this used and if it’s accessible via
> LaTeX. I was hoping there was a way to add a “replacement” text for
> affected areas (and I searched fruitlessly the hyperref documentation
> for it), such that on copy-paste the replacement is used rather than
> just private characters. Since it’s a one-page document it wouldn’t be
> a lot of work to add the replacements.
>
> The only alternative I could think of was to take FontForge and
> manually split the font in pieces (e.g., one for small caps, one for
> superscripts, etc.), such that each variant glyph is encoded in its
> “semantic” position. But it’s a big and complex font, so that would
> take a lot more work than just “hinting” the document. I also worry
> that messing around with it in FontForge will cause me to loose
> hinting and other features I (or it) may not be aware of.
>
> I welcome all ideas, and thank you in advance.
>
> --Bogdan Butnaru
>
> PS. What I’m using identifies itself as “XeTeX 3.1415926-2.2-0.9995.2
> (TeX Live 2009/Debian)” on Ubuntu. Fontspec reports itself as
> “2008/08/09 v1.18”. The problem manifests itself on every PDF viewer I
> tried (about one each for Linux, Windows and Mac OS X, and also Google
> Docs’ viewer).
>