[XeTeX] Fwd: Text figures (Old style figure) in XeTeX
Jonathan Kew
jonathan_kew at sil.org
Wed Dec 19 22:20:59 CET 2007
On 19 Dec 2007, at 8:36 pm, Michael B. Trausch wrote:
> On Wed, 2007-12-19 at 19:28 +0000, Jonathan Kew wrote:
>> I think this is happening because Adobe encodes these glyphs in the
>> Private Use Area of Unicode, rather than leaving them unencoded and
>> relying solely on OpenType features to access them. Because of this,
>> the ToUnicode mapping that xdvipdfmx embeds in the PDF will map
>> these
>> to PUA codepoints (U+F643..F64C, in the case of the OldStyle
>> numerals) rather than the proper digit codepoints.
>>
>> IMO, this is a poor design choice by the font developer; they should
>> not be using PUA character codes for things that are not distinct
>> characters but glyph variants of existing standard characters. To
>> some extent, it may be a legacy of the pre-OpenType days when every
>> glyph had to be directly encoded in some way, in order to be
>> accessible (hence "expert sets", etc.). In these days of Unicode and
>> OpenType, this is no longer necessary or appropriate.
>>
>> It may be possible to modify xdvipdfmx's algorithms for ToUnicode
>> generation to handle such fonts better; I'll look into it when time
>> permits.
>
> I have (recently; as in yesterday) just noticed this behavior for
> these
> fonts when typeset with small caps, as well; I am using XeTeX from
> TeXLive, and viewing the resulting PDF in Evince.
Right; the small caps are also encoded in the Private Use Area in
Adobe's fonts -- or at least in older Adobe OpenType fonts. (They
seem to be moving away from this practice in newer fonts I have seen.
However, xdvipdfmx still needs some further work to make the text
properly searchable/extractable; it doesn't handle CFF fonts as well
as TrueType in this respect, I think.)
JK
More information about the XeTeX
mailing list