[XeTeX] Roman Numerals as stylistic alternatives

Mon Jun 20 00:03:26 CEST 2011

It seems to me (having worked with OpenType fonts for some years) that 
while it might be possible to make an Arabic-->Roman converter at the 
font level, that's going to be one of the most inefficient possible ways 
to handle it. With OT you can make a set of rules that says

Here's a 1 followed by two digits; substitute a C;
Here's a 2 followed by one digit; substitute XX
Here's a 3 followed by a something other than a digit; substitute III.

But it can't understand numbers the way a programming language can do. 
If you want to be able to write XC for 90, the task gets somewhat more 
complex, because OT definitely can't say

For a number in the range 90-99, do the following . . .

Surely a programmatic solution would be better; and (La)TeX has an 
understanding of roman numbers built in. With a little Googling I was 
able to come up with this file, which works:

%&program=xelatex
%&encoding=UTF-8 Unicode

\documentclass[11pt,letterpaper,twoside,openany]{book}

\usepackage{fontspec}

\makeatletter
\newcommand{\rmnum}[1]{\romannumeral #1}
\newcommand{\Rmnum}[1]{\expandafter\@slowromancap\romannumeral #1@}
\makeatother

\begin{document}

There are \rmnum{123}\ fish in the sea.

And there are \Rmnum{5123}\ leaves on the tree.

\end{document}

But I don't know how to make a file that will use /ActualText. Maybe 
someone here can explain that.

Peter Baker

On 6/19/11 4:43 PM, Ross Moore wrote:
> Hello Enrico,
>
> On 20/06/2011, at 5:42 AM, enrico.gregorio at univr.it wrote:
>
>> What the OP wants is that "CXV" is stored as a unique glyph representing 115.
>> Maybe this can be done by reserving, say, five thousand slots in Unicode to
>> contain the numbers from 1 to 5000 in Roman form that are built from the basic
>> digits, embedding in the font (or in the typesetting engine) the algorithm for building
>> them from the Western/Arabic representation.
> No.
> In the PDF ISO standard, you have the option of using /ActualText tagging.
> The PDF would contain a portion of the page contents stream, such as:
>
>    /Span<</ActualText(115)>>BMC .... (graphics to position and produce
> the letters 'C' 'X' and 'V' ) ... EMC
>
> Now *any* attempt to select any portion of the visible string "CXV"
> is supposed to result in the whole string being included when copying.
>
> The problem is that not all PDF browsers are fully conformant, so this
> behaviour may not be what you actually get with a particular piece of
> software.  (BTW, Apple is one of the biggest offenders.)
>
>> This might be done in two passes:
>> represent the number using the codes for Roman numerals and start a ligaturing
>> process.
> Trying to do it character by character at the font level doesn't seem
> overly practical to me. The concept is the number "123" but represented
> in a non-standard way. The use of /ActualText tagging seems to be much
> more helpful to readers, and also to other software that tries to
> extract the meaning being represented with a PDF, for whatever purpose.
>
> Note that ISO PDF also has an alternative method of tagging.
> E.g.
>      /Span<</Alt(123)>>BMC .... EMC
> Screen-readng software is meant to use the /Alt tagging.
>
> And both /Alt and /ActualText allow multiple values having been preceded
> by a /Lang tag, so that the actual vocalization generated by the
> screen-reader can be adjusted for different languages --- the document
> author normally would provide this, but a sophisticated PDF browser
> plug-in might be programmed to produce a translation on-the-fly.
>
>> Actually, Roman numerals are mostly used when the numerical information is
>> almost irrelevant as such. Nobody uses the "XIV" in "Louis XIV" to perform
>> calculations. That's just a different way of writing "quatorze".
> Right. So /ActualText tagging can support this distinction in meaning.
> It is *not* intended to support calculations --- that is the domain
> of "Content Tagging" using MathML.
>
>> I see it just as the ability to copy "quatorze" from a text and paste it into a
>> worksheet cell accepting numbers to get 14. In the case of Roman numerals
>> it may be simpler, of course. But is it useful?
> Most certainly it is useful.
> It is part of the way of the future for smart PDF documents.
>
>
>> Ciao
>> Enrico
>
> ------------------------------------------------------------------------
> Ross Moore                                       ross.moore at mq.edu.au
> Mathematics Department                           office: E7A-419
> Macquarie University                             tel: +61 (0)2 9850 8955
> Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
> ------------------------------------------------------------------------
>
>
>
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>    http://tug.org/mailman/listinfo/xetex