[XeTeX] Detect, whether a font contains a certain character

Tobias Schoel liesdiedatei at googlemail.com
Tue Nov 29 16:24:28 CET 2011


Thanks a lot for the macro. A small summary (as I understand the code): 
\IfXeTeXTextCharExists takes three arguments:
1. the character (as direct input or via ^^^^-notation)
2. then-clause (executed, if the char exists in the _current_ font)
3. else-clause

As I'm no texnician, only XeLaTeX-User, can I copy the code from 
%%%Begin%%% to %%%End%%% into my own package and it'll work as described 
above, or do I have to take care of something else?

Toscho

Am 29.11.2011 17:05, schrieb Heiko Oberdiek:
> On Tue, Nov 29, 2011 at 07:40:13AM +0000, Jonathan Kew wrote:
>
>> On 28 Nov 2011, at 08:06, Heiko Oberdiek wrote:
>>
>>> \catcode`\{=1
>>> \catcode`\}=2
>>> \catcode`\^=7
>>> \showboxdepth=10000
>>> \showboxbreadth=10000
>>> \tracingonline=1
>>> \font\rm=cmr10\relax
>>> \rm
>>> \setbox0=\hbox{\kern1pt^^^^018e}
>>> \showbox0
>>> \csname @@end\endcsname\end
>>>
>>> And where is the inserted ".notdef" glyph?
>>
>> There won't be one with cmr10: that's a TFM font, so missing chars get dropped, just like in standard TeX. But if \rm is a native truetype/opentype font, it'll be there:
>>
>> \catcode`\{=1
>> \catcode`\}=2
>> \catcode`\^=7
>> \showboxdepth=10000
>> \showboxbreadth=10000
>> \scrollmode
>> \tracingonline=1
>> \font\rm="Trebuchet MS"
>> \rm
>> \setbox0=\hbox{\kern1pt^^^^018e}
>> \showbox0
>> \showthe\wd0
>> \end
>>
>> -->
>>
>> This is XeTeX, Version 3.1415926-2.3-0.9997.5 (TeX Live 2011)
>>   restricted \write18 enabled.
>> entering extended mode
>> (./x.tex
>> Missing character: There is no ?? in font Trebuchet MS!
>>> \box0=
>> \hbox(5.45789+0.0)x6.0
>> .\kern 1.0
>> .\rm ??
>>
>> ! OK.
>> l.11 \showbox0
>>
>>> 6.0pt.
>> l.12 \showthe\wd0
>>
>>   )
>>
>> Which tells us that the width of .notdef in Trebuchet MS is 5pt, but tells
>> us nothing (from within the document - the "Missing character" message
>> tells us externally, of course) about the presence or absence of U+018E in
>> this font.
>
> Thanks for clarifying.
>
> I try to summarize, state of the art for testing the existence
> of a glyph is the following algorithm, implemented in the
> macro \IfXeTeXTextCharExists. I have added a local
> \tracinglostchars=0 to get rid of the warning in the .log file.
>
> \catcode`\{=1
> \catcode`\}=2
> \catcode`\#=6
> \catcode`\^=7
> \showboxdepth=10000
> \showboxbreadth=10000
> \scrollmode
> \tracingonline=1
>
> %%% Begin %%%
> \def\IfXeTeXTextCharExists#1{%
>    \begingroup
>      \long\def\next##1##2{##2}%
>      % or in LaTeX: \let\next\@secondoftwo
>      \ifnum\XeTeXfonttype\font>0 %
>        \ifnum\XeTeXcharglyph`#1>0 %
>          \long\def\next##1##2{##1}%
>          % or in LaTeX: \let\next\@firstoftwo
>        \fi
>      \else
>        \setbox0=\hbox{%
>          \tracinglostchars=0 %
>          \kern1sp#1%
>          \expandafter
>        }%
>        \ifnum\lastkern=1 %
>        \else
>          \long\def\next##1##2{##1}%
>          % or in LaTeX: \let\next\@firstoftwo
>        \fi
>      \fi
>    \expandafter\endgroup
>    \next
> }
> %%% End %%%
>
> \def\Test#1#2{%
>    \begingroup
>      \font\test=#1\relax
>      \test
>      \IfXeTeXTextCharExists{#2}{%
>        \immediate\write16{YES (\detokenize{#1/#2})}%
>      }{%
>        \immediate\write16{NO (\detokenize{#1/#2})}%
>      }%
>    \endgroup
> }
> \Test{"Trebuchet MS"}{A}
> \Test{cmr10}{A}
> \Test{"Trebuchet MS"}{^^^^018e}
> \Test{cmr10}{^^^^018e}
>
> \end
>
>>>>> the problem is rather that a existing glyph can have width zero
>>>>> (not likely in your case)
>
> The algorithm doesn't look for the width, that avoids that problem.
>
>> and that there is a warning in the .log file.
>
> Solved by a local setting of \tracinglostchars=0.
>
>>> Or what do you suggest for a general test of glyph existence?
>>
>> For native Unicode fonts, as I said, use \XeTeXcharglyph.
>
> I agree, see above.
>
>> For TFM fonts, I
>> don't think the question is particularly interesting or worthwhile.
>
>> TFM fonts do not have a standard encoding, so querying them for a
>> particular "character code" is meaningless - you have to know the encoding
>> of the font you're using in order to do anything useful with it, in which
>> case you should already know what characters it supports.
>
> If TFM fonts are used, then the encoding and character code has to
> be known. But that does not answer the question whether the character
> is available in the font. There are incomplete fonts, see the
> subencodings of TS1.
>    Testing this revealed another general glyph test problem: A font might
> not support a glyph, but provide a funny replacement instead,
> thus at TeX level this cannot be detected, because the glyph exists.
>
> Yours sincerely
>    Heiko Oberdiek
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>    http://tug.org/mailman/listinfo/xetex


More information about the XeTeX mailing list