Google Disk (was: XeLaTeX to Word/OpenOffice - the state of the art?)

Ross Moore ross.moore at mq.edu.au
Sun Mar 17 22:19:39 CET 2019


Hi Zdenek,

On 18 Mar 2019, at 7:58 am, Zdenek Wagner <zdenek.wagner at gmail.com<mailto:zdenek.wagner at gmail.com>> wrote:

ne 17. 3. 2019 v 19:57 odesílatel Ross Moore <ross.moore at mq.edu.au<mailto:ross.moore at mq.edu.au>> napsal:
>

> Yes, but it is better for the CMaps to at least be appropriate, rather than inaccurate or missing altogether, as can be the case. Different software tools get information from different places, so ideally one needs to provide the best values for all those possible places.
>
No, CMaps help for simple scripts only.

Fine. But a CMap must be present for validation.
As I said earlier (repeated below), they will not always be sufficient for proper Accessibility.

Let's imagine a person name
written বৌমিক in the Bengali script and transliterated as Bowmik. OW
is a two part matra (dependent vowel) which looks as e-matra preceding
the consonant and o-matra following the consonant. I-matra always
precedes the consonant thus using a CMap only the word would become
eboimak with two spelling errors. An editor will complain on an
e-matra at the beginning of a word and i-matra following o-matra, the
editor will indicate missing consonants. Similarly Hindi word स्थापित
(sthaapit) would be extraxted as sthaaipat which is wrong because
i-matra must not follow aa-matra. If I had time, I could give you
several thousands examples where CMaps fail. In past I did many tests
with Devanagari and without ActualText the problem cannot be solved.

I’m really happy that you have done such tests, and determined this.
It’s certainly not an area that I could have researched.
It demonstrates that supporting Accessibility properly can be a lot more complicated
than any single simple-minded approach would support.

This is the very reason why \XeTeXgenerateactualtext was implemented.
It is not just a problem of save as text/rtf/doc, in addition search
does not work.

Great addition.
However, it’s useful insofar as those AT getting information from the /ActualText.
Some screen-readers go for other places.

Indeed the PDF/A and PDF/UA specifications expect the Accessible text to come
from the  /Alt(…) tagging of the structure element parent of the tagged text.
(Obviously not all AT follow these specifications.)

This is why I suggest to populate more than one place with information that is helpful...


> And Zdenek's comment emphasises how what might work well in one language setting can be quite insufficient for others. We need to be able to accommodate all things that are helpful.
> That is surely what the U (for Universal) means in PDF/UA.

… requiring an appreciation for the intricacies of the language and intended audiences.

>
> Cheers,
>
> Ross
>

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml<https://protect-au.mimecast.com/s/VW2aCANpnDC1mY4QF9GZJv?domain=ttsm.icpf.cas.cz>
http://icebearsoft.euweb.cz<https://protect-au.mimecast.com/s/OefSCBNqgBClZojGHjGNEq?domain=icebearsoft.euweb.cz>

I don’t see us as arguing against each other; rather we are sharing
experiences which indicate the depth of what is needed.


Cheers again,

Ross


Dr Ross Moore
Department of Mathematics and Statistics
12 Wally’s Walk, Level 7, Room 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955  |  F: +61 2 9850 8114
M:+61 407 288 255  |  E: ross.moore at mq.edu.au<mailto:ross.moore at mq.edu.au>
http://www.maths.mq.edu.au
[cid:image001.png at 01D030BE.D37A46F0]
CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University. <http://mq.edu.au/>
<http://mq.edu.au/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/xetex/attachments/20190317/b737f9f6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: image001.png
URL: <https://tug.org/pipermail/xetex/attachments/20190317/b737f9f6/attachment-0001.png>


More information about the XeTeX mailing list