[XeTeX] Roman Numerals as stylistic alternatives

Mon Jun 20 23:45:33 CEST 2011

Hi Tobias,

On 20/06/2011, at 7:40 AM, Tobias Schoel wrote:

>> And both /Alt and /ActualText allow multiple values having been preceded
>> by a /Lang tag, so that the actual vocalization generated by the
>> screen-reader can be adjusted for different languages --- the document
>> author normally would provide this, but a sophisticated PDF browser
>> plug-in might be programmed to produce a translation on-the-fly.
>> 
> 
> What exactly is the intention of the /Alt tagging?

Clearly one use is the same as the  alt="..."  attribute in HTML,
attached to an image or other visually-based layout of material.
It allows you to provide a short description of an image, say
e.g.  <img alt="TUG logo" src="..." ... />
You don't describe the contents of the image, but just what is
its purpose, or what it is about overall.

The same idea can be applicable to other content, including words.

But more generally than this, you need to appreciate that Adobe's
Acrobat Pro allows you to save 2 different kinds of view of the 
text in a PDF document. It has export options:

   Save As Text
   Save As Text (Accessible)

It is this 2nd view that is used by Assistive Technology, such as
(so-called) "screen-readers" for people with poor eyesight.
This is a misnomer, since they need not be reading the screen at all, 
but instead from an underlying text view based upon the words that 
are shown. The /Alt tag allows you to substitute something else
for anything that otherwise would not read particularly well.

For example, the different uses and pronunciations of 'a';
as in "a dog" or "the letter a ", or "the variable $a$".
A priori, you would expect these to be all read the same way,
but this can be altered in the latter cases by, say,  
    /Span<</Alt( ay )>> BDC ... (a) ... EMC

(I think I had 'BMC' in my previous post. That was incorrect syntax,
which should have been 'BDC' for  "begin dictionary-affected content",
whereas  BMC is "begin marked content", used when there is no extra
dictionary affecting anything about the marked content.
Now EMC is "end marked content", used in both situations.)

>>> 
>>> Actually, Roman numerals are mostly used when the numerical information is
>>> almost irrelevant as such. Nobody uses the "XIV" in "Louis XIV" to perform
>>> calculations. That's just a different way of writing "quatorze".

In this context, you might use:

    /Span<</Alt( roman numeral X I V, meaning 14, )BDC .... (XIV) ... EMC

explaining exactly what is meant, since a visually impaired person
cannot take the visual cue of a sequence of (specific) capital letters,
to interpret the special meaning. A screen reader getting just 'XIV'
might otherwise try to read pronounce this as something like "ksiv",
which would be quite confusing to the listener.
By the way, the use of the ',' in the /Alt string affects the timing
of how it is read out.

Another example is when you have words borrowed from another language;
especially names.
 e.g.  /Span<<( Lennard Oiler, a famous Swiss mathematician, )>>BDC 
            ... (Leonhard) ... (Euler) ... EMC

While this extra verbosity could be useful generally, mostly when 
you use "Save As Text" you would not be expecting to get this as you
deliberately can have nonsense words that just happen to be pronounced
the way you want the listener to hear.

Hence the distinction between /ActualText and /Alt for non-image content.

>> 
>> Right. So /ActualText tagging can support this distinction in meaning.
>> It is *not* intended to support calculations --- that is the domain
>> of "Content Tagging" using MathML.
> 
> As nearly all roman numerals used in pratice are in the range up to 5000, no on-the-fly calculation should be needed. That can be done by the producing software.

Exactly.
The producing software should do all the hard work, since it should
be able to analyse using quite sophisticated algorithms, and the 
results can be checked before "going to print".

Furthermore, the reading software is of unknown quality and 
sophistication, so it is much better for the producer to enrich
the document with as much extra potentially useful information
as possible. Good Assistive Technology should then have settings
to be able to select whatever level of verbosity is appropriate
to the person using it.

> 
>> 
>>> 
>>> I see it just as the ability to copy "quatorze" from a text and paste it into a
>>> worksheet cell accepting numbers to get 14. In the case of Roman numerals
>>> it may be simpler, of course. But is it useful?
>> 
>> Most certainly it is useful.
>> It is part of the way of the future for smart PDF documents.
> 
> Exactly. It is a different representation form of numbers not the actual letters. It doesn't matter, when the pdf is only intended to be printed, but for electronic use, it does matter.
> 
> bye
> 
> Toscho

Hope this helps,

	Ross

------------------------------------------------------------------------
Ross Moore                                       ross.moore at mq.edu.au 
Mathematics Department                           office: E7A-419      
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------