[pdftex] Generating CJK in PDF

Petr Sojka sojka at informatics.muni.cz
Fri May 4 19:30:00 CEST 2001


On Fri, May 04, 2001 at 02:57:15PM +0200, Otfried Cheong wrote:
Hi,

> Currently it is not really possible to produce CJK PDF files with
> pdftex.  
I don't think so.
> Certainly one can use Werner's CJK package, or hlatex etc.,
> using several Type1 8bit fonts to cover a single 16bit font.  But the
> resulting PDF files are essentially encrypted - it is possible to view
> and print them, but one cannot copy and paste, or search for text in
> the file, because the viewer has no idea what the character codes mean.

It has been possible since Jul 2000; take your CMAP resources, or Adobe's from
http://partners.adobe.com/asn/developer/technotes/acrobatpdf.html
and put them in the PDF file using \pdffontattr command
(i don't know whether Thanh already managed putting a note
about it into the manual :-( ):


>From thanh at informatics.muni.cz Sat Jul  8 16:48 MET 2000
Subject: tounicode
To: sojka at anxur.fi.muni.cz (Petr Sojka),

pridal jsem novy primitiv \pdffontattr, ktery umozni pridat /ToUnicode do
Font dict. Priklad pouziti:

==================== cut here ==========================
\font\f=ptmr8r\f % *nesmi* byt virtualni font

% obsah CMap objektu (kopirovano z t1.pdf):
\immediate\pdfobj{%
/CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<
/Registry (NewsSerifEE-Roman+0) /Ordering (T1UV) /Supplement 0 >> def
/CMapName /NewsSerifEE-Roman+0 def
1 begincodespacerange <01> <0a> endcodespacerange
10 beginbfrange
<01> <01> <010D>
<02> <02> <0078>
<03> <03> <00E9>
<04> <04> <011B>
<05> <05> <0161>
<06> <06> <0159>
<07> <07> <017E>
<08> <08> <00FD>
<09> <09> <00E1>
<0a> <0a> <00ED>
endbfrange
endcmap CMapName currentdict /CMap defineresource pop end end}

\pdffontattr\f{/ToUnicode \the\pdflastobj\space 0 R} % pridat /ToUnicode
==================== cut here ==========================

I don't know any font handling macropackage that supports it, so you should
specify \pdffontattr for every "cut&pasteable&searchable" raw font
manually, or write macros for it.

> Here is a possible design: Extend the syntax of font map files with
> the same subsetting implemented by ttf2tfm:
> 
> ntukai@<subsetting spec file>@ ntukai.ttf

\pdffontattr seems more flexible.

> There is a much simpler route to enabling copy-and-paste and text
> searching, at least in theory: one can add a "ToUnicode" character map
> to each of the subfonts.  Viewers that correctly implement the PDF
> specification should then be able to provide search and copy.  But
> does this really work in practice?

Yes, searching does work, and cut&paste under windows platform only, 
(tested under win2000 with Czech fonts [not all Czech characters
are in AdobeStandardEncoding, so we need it] as there is not
clipboard equivalent under X-windows AFAIK :-(.

Hope helps.

--ps

> Xpdf certainly does not support
> it.  Does it work using Acrobat Reader on Chinese Windows, or Hangul
> Windows?  Does Acrobat Reader for Linux actually have any support for
> it? 



More information about the pdftex mailing list