[XeTeX] Roman Numerals as stylistic alternatives

Sun Jun 19 23:40:48 CEST 2011

Hi,

there are some commercial fonts which claim to have a arabic->roman 
feature (may be reduced to the numbers up to 5000 or so) e.g. P22 
Operina Pro.

Btw: Arabic->Roman is probably much easier to implement than the 
inverse. I don't know font details, but from what I have read, it could 
be as simple as

sub one ? ?? ??? by one.M ? ?? ???
...
sub five ? by five.L

where the ? are wildcards (do these exist?)

>> What the OP wants is that "CXV" is stored as a unique glyph representing 115.
Am I the OP? (I don't know that abbreviation and therefore I am 
confused.) But this question is interesting:

Roman Numerals derive from a alphabetic writing system and thus are 
"words" consisting of single "letters" with a meaning based on its 
identity and position. But nowadays, they are more often read as 
logograms. That's presumably the reason, why Unicode has codepoints for 
the roman numerals from 1 to 12 (why 12?).
So should CXV be stored as an ideogram for 115 or composed of three 
glyphs 1.C, 1.X and 5.V?

> In the PDF ISO standard, you have the option of using /ActualText tagging.
> The PDF would contain a portion of the page contents stream, such as:
>
>    /Span<</ActualText(115)>>BMC .... (graphics to position and produce
> the letters 'C' 'X' and 'V' ) ... EMC
>
> Now *any* attempt to select any portion of the visible string "CXV"
> is supposed to result in the whole string being included when copying.

That seems to be a good solution for pdf-targets. Copypasting parts of 
words is mostly senseless or wrong, so this shouldn't be a problem. I 
can't think of an example with a non-pdf-target in which I use roman 
numerals. But someone else might. Then, the discussion should be moved.
>
> The problem is that not all PDF browsers are fully conformant, so this
> behaviour may not be what you actually get with a particular piece of
> software.  (BTW, Apple is one of the biggest offenders.)

That's the non-conformat PDF browsers fault and I don't give a damn 
about Apple. The only time I care about writing apple with a capital A 
is at the beginning of a sentence. (Pun intended.)

> Note that ISO PDF also has an alternative method of tagging.
> E.g.
>      /Span<</Alt(123)>>BMC .... EMC
> Screen-readng software is meant to use the /Alt tagging.
>
> And both /Alt and /ActualText allow multiple values having been preceded
> by a /Lang tag, so that the actual vocalization generated by the
> screen-reader can be adjusted for different languages --- the document
> author normally would provide this, but a sophisticated PDF browser
> plug-in might be programmed to produce a translation on-the-fly.
>

What exactly is the intention of the /Alt tagging?
>>
>> Actually, Roman numerals are mostly used when the numerical information is
>> almost irrelevant as such. Nobody uses the "XIV" in "Louis XIV" to perform
>> calculations. That's just a different way of writing "quatorze".
>
> Right. So /ActualText tagging can support this distinction in meaning.
> It is *not* intended to support calculations --- that is the domain
> of "Content Tagging" using MathML.

As nearly all roman numerals used in pratice are in the range up to 
5000, no on-the-fly calculation should be needed. That can be done by 
the producing software.

>
>>
>> I see it just as the ability to copy "quatorze" from a text and paste it into a
>> worksheet cell accepting numbers to get 14. In the case of Roman numerals
>> it may be simpler, of course. But is it useful?
>
> Most certainly it is useful.
> It is part of the way of the future for smart PDF documents.

Exactly. It is a different representation form of numbers not the actual 
letters. It doesn't matter, when the pdf is only intended to be printed, 
but for electronic use, it does matter.

bye

Toscho