Google Disk (was: XeLaTeX to Word/OpenOffice - the state of the art?)

Zdenek Wagner zdenek.wagner at gmail.com
Sun Mar 17 21:58:54 CET 2019


ne 17. 3. 2019 v 19:57 odesílatel Ross Moore <ross.moore at mq.edu.au> napsal:
>
> Hi Andrew,
>
> On 18/03/2019, at 0:18, "Andrew Cunningham" <lang.support at gmail.com> wrote:
>
> Ross,
>
> It is also dependent in the fonts themselves and the scripts the language is written in.
>
>
> Absolutely.
>
> Depending on the language and script the only way to ensure accessibility is to include the ActualText attributes for each relevant tag.
>
>
> Indeed, provided you have supplied tagging at all, as of course should be done.
>
> Considering how complex opentype fonts  can become for some scripts the simplistic To Unicode mappings in a PDF can be insufficient.
>
>
> Yes, but it is better for the CMaps to at least be appropriate, rather than inaccurate or missing altogether, as can be the case. Different software tools get information from different places, so ideally one needs to provide the best values for all those possible places.
>
No, CMaps help for simple scripts only. Let's imagine a person name
written বৌমিক in the Bengali script and transliterated as Bowmik. OW
is a two part matra (dependent vowel) which looks as e-matra preceding
the consonant and o-matra following the consonant. I-matra always
precedes the consonant thus using a CMap only the word would become
eboimak with two spelling errors. An editor will complain on an
e-matra at the beginning of a word and i-matra following o-matra, the
editor will indicate missing consonants. Similarly Hindi word स्थापित
(sthaapit) would be extraxted as sthaaipat which is wrong because
i-matra must not follow aa-matra. If I had time, I could give you
several thousands examples where CMaps fail. In past I did many tests
with Devanagari and without ActualText the problem cannot be solved.
This is the very reason why \XeTeXgenerateactualtext was implemented.
It is not just a problem of save as text/rtf/doc, in addition search
does not work.

> And text in a PDF may by WCAG definition be non-textual content.
>
>
> Presumably you mean, adding descriptive text to graphics that convey meaningful information; e.g. a company logo, and most illustrations.
> Of course this should be done too. But this can only be useful if the alternate descriptive text can be found via the structure tagging; hence the need for fully tagged PDF, navigable via that tagging.
>
> And Zdenek's comment emphasises how what might work well in one language setting can be quite insufficient for others. We need to be able to accommodate all things that are helpful.
> That is surely what the U (for Universal) means in PDF/UA.
>
>
> Cheers,
>
>       Ross
>

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
>
>
>
> On Sunday, 17 March 2019, Ross Moore <ross.moore at mq.edu.au> wrote:
>>
>> Hi Karljūrgen,
>>
>> On 17/03/2019, at 1:42, "Karljürgen Feuerherm" <kfeuerherm at kfeuerherm.ca> wrote:
>>
>> > Ross,
>> >
>> > Your reply caught my eye, and I am now looking at the pdfx package documentation.
>> >
>> > May I ask, if accessibility is a concern, why a-2b/-2u rather than -ua-1, which seems directly targeted at this?
>>
>> PDF/UA and PDF/A-1a,2a,3a  require a fully tagged PDF.
>> This is a highly non-trivial task, which requires adding much extra to the document, done almost entirely through \special commands. The pdfx package does not provide this, but is useful for meeting the Metadata and other requirements of these formats.
>>
>> Abstractly, accessibility is about having sufficient information stored in the PDF for software tools to be able to build and present a description of the content and structure, other than the visual one. The same can be said of software for converting into a different format.
>>
>> A significant part of this is being able to correctly identify each character in the fonts used within the TeX/produced PDF. Even this is a non-trivial problem, due to TeX's non-standard font encodings, and virtual font technique.
>>
>> >
>> > Many thanks,
>> >
>> > K
>> >
>> >> You should use the  pdfx  package and prepare for  PDF/A-2b or -2u.
>> >> This fixes many of these things that affect conversions, as well as Accessibility and Archivability.
>> >>
>> >> It's not fully tagged PDF, but handles many other technical issues.
>> >>
>>
>>
>> Hope this helps.
>>
>> Ross
>>
>
>
> --
> Andrew Cunningham
> lang.support at gmail.com
>
>
>



More information about the XeTeX mailing list