[XeTeX] Detect, whether a font contains a certain character

Heiko Oberdiek heiko.oberdiek at googlemail.com
Tue Nov 29 16:05:50 CET 2011


On Tue, Nov 29, 2011 at 07:40:13AM +0000, Jonathan Kew wrote:

> On 28 Nov 2011, at 08:06, Heiko Oberdiek wrote:
> 
> > \catcode`\{=1
> > \catcode`\}=2
> > \catcode`\^=7
> > \showboxdepth=10000
> > \showboxbreadth=10000
> > \tracingonline=1
> > \font\rm=cmr10\relax
> > \rm
> > \setbox0=\hbox{\kern1pt^^^^018e}
> > \showbox0
> > \csname @@end\endcsname\end
> > 
> > And where is the inserted ".notdef" glyph?
> 
> There won't be one with cmr10: that's a TFM font, so missing chars get dropped, just like in standard TeX. But if \rm is a native truetype/opentype font, it'll be there:
> 
> \catcode`\{=1
> \catcode`\}=2
> \catcode`\^=7
> \showboxdepth=10000
> \showboxbreadth=10000
> \scrollmode
> \tracingonline=1
> \font\rm="Trebuchet MS"
> \rm
> \setbox0=\hbox{\kern1pt^^^^018e}
> \showbox0
> \showthe\wd0
> \end
> 
> -->
> 
> This is XeTeX, Version 3.1415926-2.3-0.9997.5 (TeX Live 2011)
>  restricted \write18 enabled.
> entering extended mode
> (./x.tex
> Missing character: There is no ?? in font Trebuchet MS!
> > \box0=
> \hbox(5.45789+0.0)x6.0
> .\kern 1.0
> .\rm ??
> 
> ! OK.
> l.11 \showbox0
>               
> > 6.0pt.
> l.12 \showthe\wd0
>                  
>  )
> 
> Which tells us that the width of .notdef in Trebuchet MS is 5pt, but tells
> us nothing (from within the document - the "Missing character" message
> tells us externally, of course) about the presence or absence of U+018E in
> this font.

Thanks for clarifying.

I try to summarize, state of the art for testing the existence
of a glyph is the following algorithm, implemented in the
macro \IfXeTeXTextCharExists. I have added a local
\tracinglostchars=0 to get rid of the warning in the .log file.

\catcode`\{=1
\catcode`\}=2
\catcode`\#=6
\catcode`\^=7
\showboxdepth=10000
\showboxbreadth=10000
\scrollmode
\tracingonline=1

%%% Begin %%%           
\def\IfXeTeXTextCharExists#1{%
  \begingroup
    \long\def\next##1##2{##2}%
    % or in LaTeX: \let\next\@secondoftwo
    \ifnum\XeTeXfonttype\font>0 %
      \ifnum\XeTeXcharglyph`#1>0 %
        \long\def\next##1##2{##1}%
        % or in LaTeX: \let\next\@firstoftwo
      \fi
    \else
      \setbox0=\hbox{%
        \tracinglostchars=0 %
        \kern1sp#1%
        \expandafter
      }%
      \ifnum\lastkern=1 %
      \else
        \long\def\next##1##2{##1}%
        % or in LaTeX: \let\next\@firstoftwo
      \fi
    \fi
  \expandafter\endgroup
  \next
}
%%% End %%%

\def\Test#1#2{%
  \begingroup
    \font\test=#1\relax
    \test
    \IfXeTeXTextCharExists{#2}{%
      \immediate\write16{YES (\detokenize{#1/#2})}%
    }{%
      \immediate\write16{NO (\detokenize{#1/#2})}%
    }%
  \endgroup
}
\Test{"Trebuchet MS"}{A}
\Test{cmr10}{A}
\Test{"Trebuchet MS"}{^^^^018e}
\Test{cmr10}{^^^^018e}

\end

> >>> the problem is rather that a existing glyph can have width zero
> >>> (not likely in your case)

The algorithm doesn't look for the width, that avoids that problem.

> and that there is a warning in the .log file.

Solved by a local setting of \tracinglostchars=0.

> > Or what do you suggest for a general test of glyph existence?
> 
> For native Unicode fonts, as I said, use \XeTeXcharglyph.

I agree, see above.

> For TFM fonts, I
> don't think the question is particularly interesting or worthwhile.

> TFM fonts do not have a standard encoding, so querying them for a
> particular "character code" is meaningless - you have to know the encoding
> of the font you're using in order to do anything useful with it, in which
> case you should already know what characters it supports.

If TFM fonts are used, then the encoding and character code has to
be known. But that does not answer the question whether the character
is available in the font. There are incomplete fonts, see the
subencodings of TS1.
  Testing this revealed another general glyph test problem: A font might
not support a glyph, but provide a funny replacement instead,
thus at TeX level this cannot be detected, because the glyph exists.

Yours sincerely
  Heiko Oberdiek


More information about the XeTeX mailing list