[XeTeX] Roman Numerals as stylistic alternatives

Tobias Schoel liesdiedatei at googlemail.com
Sun Jun 19 23:40:48 CEST 2011


Hi,

there are some commercial fonts which claim to have a arabic->roman 
feature (may be reduced to the numbers up to 5000 or so) e.g. P22 
Operina Pro.

Btw: Arabic->Roman is probably much easier to implement than the 
inverse. I don't know font details, but from what I have read, it could 
be as simple as

sub one ? ?? ??? by one.M ? ?? ???
...
sub five ? by five.L

where the ? are wildcards (do these exist?)



>> What the OP wants is that "CXV" is stored as a unique glyph representing 115.
Am I the OP? (I don't know that abbreviation and therefore I am 
confused.) But this question is interesting:

Roman Numerals derive from a alphabetic writing system and thus are 
"words" consisting of single "letters" with a meaning based on its 
identity and position. But nowadays, they are more often read as 
logograms. That's presumably the reason, why Unicode has codepoints for 
the roman numerals from 1 to 12 (why 12?).
So should CXV be stored as an ideogram for 115 or composed of three 
glyphs 1.C, 1.X and 5.V?

> In the PDF ISO standard, you have the option of using /ActualText tagging.
> The PDF would contain a portion of the page contents stream, such as:
>
>    /Span<</ActualText(115)>>BMC .... (graphics to position and produce
> the letters 'C' 'X' and 'V' ) ... EMC
>
> Now *any* attempt to select any portion of the visible string "CXV"
> is supposed to result in the whole string being included when copying.

That seems to be a good solution for pdf-targets. Copypasting parts of 
words is mostly senseless or wrong, so this shouldn't be a problem. I 
can't think of an example with a non-pdf-target in which I use roman 
numerals. But someone else might. Then, the discussion should be moved.
>
> The problem is that not all PDF browsers are fully conformant, so this
> behaviour may not be what you actually get with a particular piece of
> software.  (BTW, Apple is one of the biggest offenders.)

That's the non-conformat PDF browsers fault and I don't give a damn 
about Apple. The only time I care about writing apple with a capital A 
is at the beginning of a sentence. (Pun intended.)

> Note that ISO PDF also has an alternative method of tagging.
> E.g.
>      /Span<</Alt(123)>>BMC .... EMC
> Screen-readng software is meant to use the /Alt tagging.
>
> And both /Alt and /ActualText allow multiple values having been preceded
> by a /Lang tag, so that the actual vocalization generated by the
> screen-reader can be adjusted for different languages --- the document
> author normally would provide this, but a sophisticated PDF browser
> plug-in might be programmed to produce a translation on-the-fly.
>

What exactly is the intention of the /Alt tagging?
>>
>> Actually, Roman numerals are mostly used when the numerical information is
>> almost irrelevant as such. Nobody uses the "XIV" in "Louis XIV" to perform
>> calculations. That's just a different way of writing "quatorze".
>
> Right. So /ActualText tagging can support this distinction in meaning.
> It is *not* intended to support calculations --- that is the domain
> of "Content Tagging" using MathML.

As nearly all roman numerals used in pratice are in the range up to 
5000, no on-the-fly calculation should be needed. That can be done by 
the producing software.

>
>>
>> I see it just as the ability to copy "quatorze" from a text and paste it into a
>> worksheet cell accepting numbers to get 14. In the case of Roman numerals
>> it may be simpler, of course. But is it useful?
>
> Most certainly it is useful.
> It is part of the way of the future for smart PDF documents.

Exactly. It is a different representation form of numbers not the actual 
letters. It doesn't matter, when the pdf is only intended to be printed, 
but for electronic use, it does matter.

bye

Toscho


More information about the XeTeX mailing list