[XeTeX] anti-xunicode ;-)

Tue Jul 25 20:58:20 CEST 2006

Hi Ross,

Ross Moore <ross at ics.mq.edu.au> writes:

> One thing that I noticed in doing this work is that it's
> impossible to tell exactly what goes into the PDF file,
> as the streams are encoded for compression.
> Is there a way to turn off compression ?  (on a Mac)
> JK ?
>
>   xetex --help   tells nothing about this.

Just for te record: With xdvipdfmx this can be done with the option 
'-z 0'. I don't know if xdv2pdf has a similar option. One could probably
also look at the xdv file, but I haven't tried that.

>> I am just not sure if LaTeX needs to know that a certain accented
>> character is available, since XeTeX seems to find that character when
>> presented with suitable decomposed input.
>
> That's true when you know that you have a "smart"(-ish) font.
>
> It's not so with old 7- or 8-bit fonts that you may be using
> within the same document, and which may still be needed for
> accents.
> So you need the means to be able to tell the difference.

I am unable to produce an example file for this. Inputting
<e><ogonekcomb> gives me the prebuild eogonek with every native font I
have tried, including "dumb" Type 1 fonts.

> However, we don't want people writing ad hoc macros that solve the
> problem in a very limited way. Later they will try to extend these
> macros into more complicated situations, fail at this, then ask
> for help fixing them.
>
> Better is to do it right, in the most general way, first off.

Agreed. I just think there are things which cannot be done via TeX
macros at the moment.

>> * With XeTeX one can test for a glyph for the precomposed  
>> character. One
>>   can also test for a combining accent. One cannot test for the
>>   existance of suitable smart font features that make the latter work
>>   properly, though. Imagine the case of U+1E0B ḋ together with a  
>> font
>>   that contains a combining dot accent but no prebuild ḋ. The ideal
>>   rendering for this character might have the dot to the left of the
>>   ascender of the d. Such a behaviour can be implemeted via a 'mark'
>>   feature, but we can't be sure. Just letting XeTeX render  
>> <d><dotcomb>
>>   might also produce results where TeX's fallback of centering the dot
>>   above the <d> would be prefereable.
>
> Such things can only be determined by a human, having a look and
> being dissatisfied with what the system can give automatically.

I think if a font contains a 'mark' feature for a give base-accent
combination, one can savely assume that the accent will be correctly
positioned using it. The problem is that you cannot check within XeTeX
if such a feature exists.

> So yes, get around it with the active-character trick.

The problem with active characters is that you cannot extend it to cases
where Unicode does not provide a precomposed character. And these cases
exist in the real world since AFAIK Unicode has basically stopped adding
more precomposed characters in the Latin script. 

cheerio
ralf