[pdftex] Revisiting (About CJKbookmarks)

Heiko Oberdiek oberdiek at uni-freiburg.de
Thu Mar 2 09:32:57 CET 2006

On Thu, Mar 02, 2006 at 02:28:40PM +1100, Ross Moore wrote:

> Indeed, for my application it might be useful to have a macro:
>   \texorpdforXMLorHTMLorliteral
> in which appropriate \catcode changes were made with each
> variant of the macro-expansion.    :-)

\texorpdfstring is usually used in \section like commands,
thus no chance for changing catcodes.

> >>So if I want to replace the strings 'lambda', 'alpha', 'omega', etc.
> >>by appropriate unicode representations,
> >>
> >> a.  what needs to go into the .out file ?
> >>
> >> b.  what else needs to be done ?
> >>      e.g.  options to hyperref, or \hypersetup
> >
> >
> >Many Greek letters are already supported, given as \text... macros.
> >
> >\usepackage[unicode]{hyperref}
> >
> >\pdfstringdefDisableCommands{%
> >  \let\lambda\textlambda
> >  \let\alpha\textalpha
> >  \let\omega\textomega
> >  % etc.
> >}
> OK. It's the double-octal notation used for Unicode strings
> that I'd not encountered before. Thanks for the heads-up.
> This works (so far) in my setting, with the following provisos:
>  a.  the .out  file more than doubles in size, which
>      increase occurs also in the PDF.
>      But this is only ~5kb increase, so no big deal really.

Unicode takes two bytes instead of one byte. But you can run
the pdf file through a post processor that optimizes strings
(using unicode where needed only, string representation, ...)
and other things (white space). I think this would be more efficient
than doing it in few places at TeX macro level, where often the
full control is missing (driver dependent).

> Presumably this could be reduced by using Unicode only for
> those bookmarks that really need it.

The code for the bookmark strings is already quite large.
Probably it is more efficient to do it in a post processor.
Also hyperref supports several drivers, e.g. PostScript drivers.
Then the string representation in the PDF file depends
on the distiller application.

Thus a program that inputs PDF and rewrites it in an optimized
version (strings, white space, free objects, ...) would also be
useful for pdf files generally.

>  b.  the loading of  puenc.def  causes a macro-name clash,
>      with those math-authors who like to define \C
>      as a shorthand for \mathbb{C} or  \mathcal{C}
>      --- easily fixed, but most annoying.

It also supports cyrillic, \C is used in T2A, T2B, T2C.

>      Presumably these guys never use cyrillics for Russian
>      or Eastern European names in bibliographies.

I have just tried to use the official LaTeX interface.

> >>Also,
> >>  Is it possible to use different typefaces ?
> >
> >AFAIK you can use color or bold/italic for the whole string.
> And you intend working on providing support for this, right ?

Yes, it is on my todo list and there are other things to provide
(color, actions, ...). Thus my intention is to rewrite the
bookmark organizing stuff.

> The Adobe document for the PDF 1.6 specs  shows what is needed
> for colours and faces (italic and/or bold) in bookmarks.

I know it.

> However, the same document actually has a logo-image in each
> of its own bookmarks!  How did they do that ?

This is done by your viewer application (AR) and the symbol
is used for all bookmarks, there is nothing in the PDF file.
It changes its appearance, if the page of the bookmark in question
is currently displayed.

> However, there are raised and lowered letters in the
> "Phonetic Extensions" area, and elsewhere.
> I've now made use of these, to produce raised superscripts in
> mathematics used for titles, etc. when it contains only:
>     a.  letters,   excluding  fqzCFQSVXYZ
> or  b.  digits 0-9
> or  c.  symbols  + - = ( )
> or  d.  punctuation , .  (i.e. comma or stop).
> Similarly for subscripts, using just the characters in  b. and/or c.
> The TeX coding to achieve this makes slight patches to some
> hyperref methods:
>   \HyPsd@@RemoveBraces      to retain markers of bracings
>   \HyPsd at CatcodeWarning     to retain ^ and _
>   \HyPsd at ConvertToUnicode   to allow some extra post-processing
>                              before converting to Unicode
> as well as adding new post-processing methods prior to
> using \HyPsd at ConvertToUnicode :
>   \raise at BracedSupscripts      handles ^{...}
>   \remove at falseBracePairs      removes any left-over brace markers
> and methods added via the \pdfstringdefPostHook :
>   \replaceSupAst         ^* becomes just *
>   \replaceSupscript      handles non-braced ^
>   \replaceSubscript      handles non-braced _
> as well as many macro re-definitions, via   
> \pdfstringdefDisableCommands .

Very interesting. How stable is it? Do you want to provide
it as package or may I put this in hyperref?

Yours sincerely
  Heiko <oberdiek at uni-freiburg.de>

More information about the pdftex mailing list