Google Disk (was: XeLaTeX to Word/OpenOffice - the state of the art?)

Ross Moore ross.moore at
Sun Mar 17 22:19:39 CET 2019

Hi Zdenek,

> Yes, but it is better for the CMaps to at least be appropriate, rather than inaccurate or missing altogether, as can be the case. Different software tools get information from different places, so ideally one needs to provide the best values for all those possible places.
No, CMaps help for simple scripts only.

Fine. But a CMap must be present for validation.
As I said earlier (repeated below), they will not always be sufficient for proper Accessibility.

Let's imagine a person name
written বৌমিক in the Bengali script and transliterated as Bowmik. OW
is a two part matra (dependent vowel) which looks as e-matra preceding
the consonant and o-matra following the consonant. I-matra always
precedes the consonant thus using a CMap only the word would become
eboimak with two spelling errors. An editor will complain on an
e-matra at the beginning of a word and i-matra following o-matra, the
editor will indicate missing consonants. Similarly Hindi word स्थापित
(sthaapit) would be extraxted as sthaaipat which is wrong because
i-matra must not follow aa-matra. If I had time, I could give you
several thousands examples where CMaps fail. In past I did many tests
with Devanagari and without ActualText the problem cannot be solved.

I’m really happy that you have done such tests, and determined this.
It’s certainly not an area that I could have researched.
It demonstrates that supporting Accessibility properly can be a lot more complicated
than any single simple-minded approach would support.

This is the very reason why \XeTeXgenerateactualtext was implemented.
It is not just a problem of save as text/rtf/doc, in addition search
does not work.

Great addition.
However, it’s useful insofar as those AT getting information from the /ActualText.
Some screen-readers go for other places.

Indeed the PDF/A and PDF/UA specifications expect the Accessible text to come
from the  /Alt(…) tagging of the structure element parent of the tagged text.
(Obviously not all AT follow these specifications.)

This is why I suggest to populate more than one place with information that is helpful...

> And Zdenek's comment emphasises how what might work well in one language setting can be quite insufficient for others. We need to be able to accommodate all things that are helpful.
> That is surely what the U (for Universal) means in PDF/UA.

… requiring an appreciation for the intricacies of the language and intended audiences.

> Cheers,
> Ross

Zdeněk Wagner

I don’t see us as arguing against each other; rather we are sharing
experiences which indicate the depth of what is needed.

Cheers again,


Dr Ross Moore
Department of Mathematics and Statistics
E: ross.moore at

