[XeTeX] misplaced combining diacritical marks 2

Alexander Schultheiß aschulth at googlemail.com
Wed Sep 1 11:34:10 CEST 2010


Hey,

I'm starting this second thread because I'm unable to figure out how
to reply to a single message within a digest :(. I turned digest mode
off by now but decided to open a second thread in order to reply to
everybody in one mail. Thanks for all the help.

@ Ross Moore

> Please show the complete preamble for the coding that you are using.
> We need to see what packages you use, and how the font is loaded.

The preamble I' using is:

----------------------------------------------------------------------
\documentclass[a4paper,12pt]{article}
\usepackage{fontspec}
\usepackage{xltxtra}
\usepackage{xunicode}

\fontspec{Junicode}
\setmainfont{Junicode}
----------------------------------------------------------------------

I don't think it's a problem with the preamble, though. This is the
same preamble David suggested except for the command
\defaultfontfeatures{Mapping=tex-text} which I haven't included
because I don't know what it does and haven't really bothered yet to
find out. But even with this command included the problem persists.

> Also provide a screenshot of what you see.
> For example, the attached image shows what various fonts
> (including styles of Junicode) should produce.

I have attached an image (see below).

@ Fr. Michael Gilmary

Thank you for your suggestion. I have to admit I don't understand what
the code really does. Also, I'm not merely trying to get a certain
glyph working and I'm hesitant to define every combination in latex
and kern them by hand as it should be latex's job to do this. Even
more so, if anchors are provided and the program doesn't have to
guess. I keep it in mind, however, as a last resort.

@ David J. Perry

> I happen to have the Unicode values for the diacritics memorized, so
> I just typed them in
> [...]
> la\char"0304 \char"0323 m
> hy\char"0304\char"0301 s

I tried it and it works somehow but the results are not as good as if
I were using the standard latex commands \d{} &c. I've attached an
image where you can see in line 4 that neither the macron nor the
acute is placed correctly. I think this due to the fact that xelatex
just puts the glyphs one after the other and since comb.diacritical
marks have negative witdh they roughly happen to be above the a.

@ Khaled Hosny

> In an ideal world, fonts should have proper GPOS anchors for
> positioning arbitrary combining marks, and sometimes this is only
> way to get proper output as there isn't a pre-composed form (and, as
> a matter of policy, Unicode is not adding any new pre-composed
> forms), but almost all fonts fail on that, fortunately TeX can still
> provide workarounds for that.

My suspicion is, that xelatex does not honor anchors at least when it
comes to latin script. Rather it relies on pre-composed glyphs and
the combining diacritical marks.

1. Observation:

As noted before, if there are no pre-composed latin glyphs like
a+acute at the apporpriate unicode codepoints xelatex prints a blank,
which is confirmed if you look at the second image I've attached,
where I deleted the a+macron glyph from the Junicode font.

2. Observation:

Oder of commands does play an important role. If you look at line 3 of
the second image, you'll find that there is at least an a+macron as
opposed to lines 1 & 2. The commands used in line 1 and 2 look for the
unicode codepoint whereas the commands in line 3 try to assemble the
glyph based on the combining diacrtical marks, it seems.

3. Observation:

Xelatex does not honor mark-to-mark anchors in latin script. This
seems to be confirmed by the glyphs a+macron+acute in line 3 and 4
where, according to Obs. 2, the combining diacritical marks are used
but not stacked correctly as they should (Junicode has mark-to-mark
anchors. The macron has an "Anchor-3 base" and the acute has the
corresponding "Anchor-3 mark").

Summary

If there is no pre-composed glyph xelatex _never_ assembles the glyph
correctly. Look at image 1 for example. a+acute and a+macron in line 1
are correct because pre-composed glyphs do exist. The a+macron+acute
does not exist in unicode, yet the distance between the macron and the
acute is too big. According to the mark-to-mark anchors within
Junicode the distance between the acute and the macron should be 109
(2048 em-size) while the distance between the macron and the a is 234,
more than double. This confirmes obs. 3 above. In line 2 the distance
between the acute and the macron is even bigger. How can this be?

In addition in line 3 the acutes are off to the left a bit. While,
interestingly, similar is true for the dots in line 2. Further, in
line 2 the dot of a+acute+dotbelow is even more off to the left than
the already misplaced dots in the remaining two glyphs!?

As a summary we can say that in line 2 both, the accents and the dots
are placed incorrectly while in line 3 the dots are correctly placed
as well as the macrons (making the middle glyph the only correctly
assembled one!) while the accents are off. This seems to be due to the
order of commands. The first applied diacritic is placed correctly
_because_ there is a pre-composed glyph for it (if not see image 2).

So it really seems that xelatex doesn't care about the GPOS for latin
script and just assumes that there are pre-composed glyphs. I wonder
why this is the case. Wouldn't it be easier to make xelatex recognize
anchors so that users can compose arbitrary glyphs? Believe it or not
some humanities use weird combinations :).

Thanks for the help, Alex


More information about the XeTeX mailing list