# [texhax] unicode

pierre.mackay pierre.mackay at comcast.net
Fri Aug 5 18:34:59 CEST 2005

Alexander Grahn wrote:

>On Fri, Aug 05, 2005 at 03:13:25PM +0200, Karl Berry wrote:
>
>
>>Sorry, but I've lost track of the original question.  What are you
>>trying to accomplish?
>>
>>
>
>I'm trying to expand my LaTeX-Package for generating PDF with embedded
>multimedia. I want to add some hack written in JavaScript which might
>
>I want to write out a name tree into the document Catalog, mapping names
>to object refs. The PDF-Spec says that name strings must be Unicode
>formatted (actually Big Endian UTF16).
>
>As the strings that I want to write out consist of [a-zA-Z0-9_] only,
>every UTF16 representation of the character in question to be written
>out is formed from a Zero byte x00 followed by the byte with the ASCII
>code of the character, e. g.
>
>A --> x0041
>B --> x0042
>etc.
>
>
>
When in doubt, go straight to the primitives. \char places an eight-bit
value directly into the output, and works for NULL as well as everything
else.

From the input: cdef\char0 A\char0 Bghijk

From the DVI file:

cdef^@A^@Bghijk

From the output of dvips:

(cdef\000A\000Bghijk)

which appears to be a string with two unmistakable 16-bit wide chars in
it, whete the octal \000
provides the first null byte. The ggv default font appears to have an
uppercase Gamma there.

what ps2pdf will do with that I don't quite know, because I find it
nearly impossible to read pdf binaries;

So, inserting \char0 will work, and this can be done relatively
painlessly with a tail-recursion macro.

You could also \def\0{\char0} to make the string more readable and
easier to type

This is really neat, because it shows an---admittedly complex---way of
inserting multicharacter UTF8 and wide-chars without altering TeX to
enable set2 and set3.

Pierre MacKay