[XeTeX] Japanese Characters in PDF do not match thosein source file.

Andrew A. Adams aaa at meiji.ac.jp
Fri Aug 20 06:06:27 CEST 2010


I recently upgraded my Fedora Core 10 to Fedora Core 13. I'm getting a very 
strange behaviour from processing latex files including Japanese text and 
processed using xelatex. I've created a minimal input source file which 
demonstrates the problem, which is that the unicode characters in the input 
file are not the ones that appear in the output. It's possible that somehow 
I'm getting Chinese characters instead of the Japanese ones in my original 
file. I create my files in xemacs, and set the buffer encoding to UTF-8. I 
use a script to process the file using xelatex with my default options:

xelatex -interaction=nonstopmode -output-driver="xdvipdfmx -p a4 -V5 " $1.tex 
&& acroread -tempFile $1.pdf

Attached are the sample tex file, the resulting output file, the log file 
from manual xelatex processing and the output from manual xdvipdfmx 
processing.

Sorry if this is an xdvpdfmx problem and I should be sending my queries over 
there instead.


-- 
Professor Andrew A Adams                      aaa at meiji.ac.jp
Professor at Graduate School of Business Administration,  and
Deputy Director of the Centre for Business Information Ethics
Meiji University, Tokyo, Japan       http://www.a-cubed.info/

-------------- next part --------------
This is XeTeXk, Version 3.141592-2.2-0.996 (Web2C 7.5.6) (format=xelatex 2010.8.20)  20 AUG 2010 12:46
entering extended mode
 %&-line parsing enabled.
**Kanji_Test.tex
(./Kanji_Test.tex
LaTeX2e <2005/12/01>
Babel <v3.8h> and hyphenation patterns for english, usenglishmax, dumylang, noh
yphenation, arabic, basque, bulgarian, coptic, welsh, czech, slovak, german, ng
erman, danish, esperanto, spanish, catalan, galician, estonian, farsi, finnish,
 french, greek, monogreek, ancientgreek, croatian, hungarian, interlingua, ibyc
us, indonesian, icelandic, italian, latin, mongolian, dutch, norsk, polish, por
tuguese, pinyin, romanian, russian, slovenian, uppersorbian, serbian, swedish, 
turkish, ukenglish, ukrainian, loaded.
(/usr/share/texmf/tex/latex/base/article.cls
Document Class: article 2005/09/16 v1.4f Standard LaTeX document class
(/usr/share/texmf/tex/latex/base/size10.clo
File: size10.clo 2005/09/16 v1.4f Standard LaTeX file (size option)
)
\c at part=\count79
\c at section=\count80
\c at subsection=\count81
\c at subsubsection=\count82
\c at paragraph=\count83
\c at subparagraph=\count84
\c at figure=\count85
\c at table=\count86
\abovecaptionskip=\skip41
\belowcaptionskip=\skip42
\bibindent=\dimen102
)
(/usr/share/texmf/tex/xelatex/fontspec/fontspec.sty
Package: fontspec 2006/12/24 v1.13 Advanced font selection for XeLaTeX

(/usr/share/texmf/tex/generic/ifxetex/ifxetex.sty
Package: ifxetex 2006/08/21 v0.3 Provides ifxetex conditional
)
\c at zf@newff=\count87
\c at zf@index=\count88
\c at zf@script=\count89
\c at zf@language=\count90

(/usr/share/texmf/tex/latex/tools/calc.sty
Package: calc 2005/08/06 v4.2 Infix arithmetic (KKT,FJ)
\calc at Acount=\count91
\calc at Bcount=\count92
\calc at Adimen=\dimen103
\calc at Bdimen=\dimen104
\calc at Askip=\skip43
\calc at Bskip=\skip44
LaTeX Info: Redefining \setlength on input line 75.
LaTeX Info: Redefining \addtolength on input line 76.
\calc at Ccount=\count93
\calc at Cskip=\skip45
)
(/usr/share/texmf/tex/latex/xkeyval/xkeyval.sty
Package: xkeyval 2006/11/18 v2.5f package option processing (HA)

(/usr/share/texmf/tex/latex/xkeyval/xkeyval.tex
\XKV at toks=\toks14
\XKV at depth=\count94
File: xkeyval.tex 2006/11/18 v2.5f key=value parser (HA)

(/usr/share/texmf/tex/latex/xkeyval/keyval.tex)))
LaTeX Info: Redefining \itshape on input line 1050.
LaTeX Info: Redefining \slshape on input line 1053.
LaTeX Info: Redefining \scshape on input line 1056.
LaTeX Info: Redefining \upshape on input line 1059.

fontspec.cfg loaded.
(/usr/share/texmf/tex/xelatex/fontspec/fontspec.cfg))
(/usr/share/texmf/tex/latex/base/inputenc.sty
Package: inputenc 2006/05/05 v1.1b Input encoding file
\inpenc at prehook=\toks15
\inpenc at posthook=\toks16

(/usr/share/texmf/tex/latex/base/utf8.def
File: utf8.def 2006/03/30 v1.1i UTF-8 support for inputenc
Now handling font encoding OML ...
... no UTF-8 mapping file for font encoding OML
Now handling font encoding T1 ...
... processing UTF-8 mapping file for font encodingT1

(/usr/share/texmf/tex/latex/base/t1enc.dfu
File: t1enc.dfu 2006/03/30 v1.1i UTF-8 support for inputenc
   defining Unicode char U+00A1 (decimal 161)
   defining Unicode char U+00A3 (decimal 163)
   defining Unicode char U+00AB (decimal 171)
   defining Unicode char U+00BB (decimal 187)
   defining Unicode char U+00BF (decimal 191)
   defining Unicode char U+00C0 (decimal 192)
   defining Unicode char U+00C1 (decimal 193)
   defining Unicode char U+00C2 (decimal 194)
   defining Unicode char U+00C3 (decimal 195)
   defining Unicode char U+00C4 (decimal 196)
   defining Unicode char U+00C5 (decimal 197)
   defining Unicode char U+00C6 (decimal 198)
   defining Unicode char U+00C7 (decimal 199)
   defining Unicode char U+00C8 (decimal 200)
   defining Unicode char U+00C9 (decimal 201)
   defining Unicode char U+00CA (decimal 202)
   defining Unicode char U+00CB (decimal 203)
   defining Unicode char U+00CC (decimal 204)
   defining Unicode char U+00CD (decimal 205)
   defining Unicode char U+00CE (decimal 206)
   defining Unicode char U+00CF (decimal 207)
   defining Unicode char U+00D0 (decimal 208)
   defining Unicode char U+00D1 (decimal 209)
   defining Unicode char U+00D2 (decimal 210)
   defining Unicode char U+00D3 (decimal 211)
   defining Unicode char U+00D4 (decimal 212)
   defining Unicode char U+00D5 (decimal 213)
   defining Unicode char U+00D6 (decimal 214)
   defining Unicode char U+00D8 (decimal 216)
   defining Unicode char U+00D9 (decimal 217)
   defining Unicode char U+00DA (decimal 218)
   defining Unicode char U+00DB (decimal 219)
   defining Unicode char U+00DC (decimal 220)
   defining Unicode char U+00DD (decimal 221)
   defining Unicode char U+00DE (decimal 222)
   defining Unicode char U+00DF (decimal 223)
   defining Unicode char U+00E0 (decimal 224)
   defining Unicode char U+00E1 (decimal 225)
   defining Unicode char U+00E2 (decimal 226)
   defining Unicode char U+00E3 (decimal 227)
   defining Unicode char U+00E4 (decimal 228)
   defining Unicode char U+00E5 (decimal 229)
   defining Unicode char U+00E6 (decimal 230)
   defining Unicode char U+00E7 (decimal 231)
   defining Unicode char U+00E8 (decimal 232)
   defining Unicode char U+00E9 (decimal 233)
   defining Unicode char U+00EA (decimal 234)
   defining Unicode char U+00EB (decimal 235)
   defining Unicode char U+00EC (decimal 236)
   defining Unicode char U+00ED (decimal 237)
   defining Unicode char U+00EE (decimal 238)
   defining Unicode char U+00EF (decimal 239)
   defining Unicode char U+00F0 (decimal 240)
   defining Unicode char U+00F1 (decimal 241)
   defining Unicode char U+00F2 (decimal 242)
   defining Unicode char U+00F3 (decimal 243)
   defining Unicode char U+00F4 (decimal 244)
   defining Unicode char U+00F5 (decimal 245)
   defining Unicode char U+00F6 (decimal 246)
   defining Unicode char U+00F8 (decimal 248)
   defining Unicode char U+00F9 (decimal 249)
   defining Unicode char U+00FA (decimal 250)
   defining Unicode char U+00FB (decimal 251)
   defining Unicode char U+00FC (decimal 252)
   defining Unicode char U+00FD (decimal 253)
   defining Unicode char U+00FE (decimal 254)
   defining Unicode char U+00FF (decimal 255)
   defining Unicode char U+0102 (decimal 258)
   defining Unicode char U+0103 (decimal 259)
   defining Unicode char U+0104 (decimal 260)
   defining Unicode char U+0105 (decimal 261)
   defining Unicode char U+0106 (decimal 262)
   defining Unicode char U+0107 (decimal 263)
   defining Unicode char U+010C (decimal 268)
   defining Unicode char U+010D (decimal 269)
   defining Unicode char U+010E (decimal 270)
   defining Unicode char U+010F (decimal 271)
   defining Unicode char U+0110 (decimal 272)
   defining Unicode char U+0111 (decimal 273)
   defining Unicode char U+0118 (decimal 280)
   defining Unicode char U+0119 (decimal 281)
   defining Unicode char U+011A (decimal 282)
   defining Unicode char U+011B (decimal 283)
   defining Unicode char U+011E (decimal 286)
   defining Unicode char U+011F (decimal 287)
   defining Unicode char U+0130 (decimal 304)
   defining Unicode char U+0131 (decimal 305)
   defining Unicode char U+0132 (decimal 306)
   defining Unicode char U+0133 (decimal 307)
   defining Unicode char U+0139 (decimal 313)
   defining Unicode char U+013A (decimal 314)
   defining Unicode char U+013D (decimal 317)
   defining Unicode char U+013E (decimal 318)
   defining Unicode char U+0141 (decimal 321)
   defining Unicode char U+0142 (decimal 322)
   defining Unicode char U+0143 (decimal 323)
   defining Unicode char U+0144 (decimal 324)
   defining Unicode char U+0147 (decimal 327)
   defining Unicode char U+0148 (decimal 328)
   defining Unicode char U+014A (decimal 330)
   defining Unicode char U+014B (decimal 331)
   defining Unicode char U+0150 (decimal 336)
   defining Unicode char U+0151 (decimal 337)
   defining Unicode char U+0152 (decimal 338)
   defining Unicode char U+0153 (decimal 339)
   defining Unicode char U+0154 (decimal 340)
   defining Unicode char U+0155 (decimal 341)
   defining Unicode char U+0158 (decimal 344)
   defining Unicode char U+0159 (decimal 345)
   defining Unicode char U+015A (decimal 346)
   defining Unicode char U+015B (decimal 347)
   defining Unicode char U+015E (decimal 350)
   defining Unicode char U+015F (decimal 351)
   defining Unicode char U+0160 (decimal 352)
   defining Unicode char U+0161 (decimal 353)
   defining Unicode char U+0162 (decimal 354)
   defining Unicode char U+0163 (decimal 355)
   defining Unicode char U+0164 (decimal 356)
   defining Unicode char U+0165 (decimal 357)
   defining Unicode char U+016E (decimal 366)
   defining Unicode char U+016F (decimal 367)
   defining Unicode char U+0170 (decimal 368)
   defining Unicode char U+0171 (decimal 369)
   defining Unicode char U+0178 (decimal 376)
   defining Unicode char U+0179 (decimal 377)
   defining Unicode char U+017A (decimal 378)
   defining Unicode char U+017B (decimal 379)
   defining Unicode char U+017C (decimal 380)
   defining Unicode char U+017D (decimal 381)
   defining Unicode char U+017E (decimal 382)
   defining Unicode char U+200C (decimal 8204)
   defining Unicode char U+2013 (decimal 8211)
   defining Unicode char U+2014 (decimal 8212)
   defining Unicode char U+2018 (decimal 8216)
   defining Unicode char U+2019 (decimal 8217)
   defining Unicode char U+201A (decimal 8218)
   defining Unicode char U+201C (decimal 8220)
   defining Unicode char U+201D (decimal 8221)
   defining Unicode char U+201E (decimal 8222)
   defining Unicode char U+2030 (decimal 8240)
   defining Unicode char U+2031 (decimal 8241)
   defining Unicode char U+2039 (decimal 8249)
   defining Unicode char U+203A (decimal 8250)
   defining Unicode char U+2423 (decimal 9251)
)
Now handling font encoding OT1 ...
... processing UTF-8 mapping file for font encodingOT1

(/usr/share/texmf/tex/latex/base/ot1enc.dfu
File: ot1enc.dfu 2006/03/30 v1.1i UTF-8 support for inputenc
   defining Unicode char U+00A1 (decimal 161)
   defining Unicode char U+00A3 (decimal 163)
   defining Unicode char U+00B8 (decimal 184)
   defining Unicode char U+00BF (decimal 191)
   defining Unicode char U+00C5 (decimal 197)
   defining Unicode char U+00C6 (decimal 198)
   defining Unicode char U+00D8 (decimal 216)
   defining Unicode char U+00DF (decimal 223)
   defining Unicode char U+00E6 (decimal 230)
   defining Unicode char U+00EC (decimal 236)
   defining Unicode char U+00ED (decimal 237)
   defining Unicode char U+00EE (decimal 238)
   defining Unicode char U+00EF (decimal 239)
   defining Unicode char U+00F8 (decimal 248)
   defining Unicode char U+0131 (decimal 305)
   defining Unicode char U+0141 (decimal 321)
   defining Unicode char U+0142 (decimal 322)
   defining Unicode char U+0152 (decimal 338)
   defining Unicode char U+0153 (decimal 339)
   defining Unicode char U+2013 (decimal 8211)
   defining Unicode char U+2014 (decimal 8212)
   defining Unicode char U+2018 (decimal 8216)
   defining Unicode char U+2019 (decimal 8217)
   defining Unicode char U+201C (decimal 8220)
   defining Unicode char U+201D (decimal 8221)
)
Now handling font encoding OMS ...
... processing UTF-8 mapping file for font encodingOMS

(/usr/share/texmf/tex/latex/base/omsenc.dfu
File: omsenc.dfu 2006/03/30 v1.1i UTF-8 support for inputenc
   defining Unicode char U+00A7 (decimal 167)
   defining Unicode char U+00B6 (decimal 182)
   defining Unicode char U+00B7 (decimal 183)
   defining Unicode char U+2020 (decimal 8224)
   defining Unicode char U+2021 (decimal 8225)
   defining Unicode char U+2022 (decimal 8226)
)
Now handling font encoding OMX ...
... no UTF-8 mapping file for font encoding OMX
Now handling font encoding U ...
... no UTF-8 mapping file for font encoding U
   defining Unicode char U+00A9 (decimal 169)
   defining Unicode char U+00AA (decimal 170)
   defining Unicode char U+00AE (decimal 174)
   defining Unicode char U+00BA (decimal 186)
   defining Unicode char U+02C6 (decimal 710)
   defining Unicode char U+02DC (decimal 732)
   defining Unicode char U+200C (decimal 8204)
   defining Unicode char U+2026 (decimal 8230)
   defining Unicode char U+2122 (decimal 8482)
   defining Unicode char U+2423 (decimal 9251)
)) (./Kanji_Test.aux)
\openout1 = `Kanji_Test.aux'.

LaTeX Font Info:    Checking defaults for OML/cmm/m/it on input line 8.
LaTeX Font Info:    ... okay on input line 8.
LaTeX Font Info:    Checking defaults for T1/cmr/m/n on input line 8.
LaTeX Font Info:    ... okay on input line 8.
LaTeX Font Info:    Checking defaults for OT1/cmr/m/n on input line 8.
LaTeX Font Info:    ... okay on input line 8.
LaTeX Font Info:    Checking defaults for OMS/cmsy/m/n on input line 8.
LaTeX Font Info:    ... okay on input line 8.
LaTeX Font Info:    Checking defaults for OMX/cmex/m/n on input line 8.
LaTeX Font Info:    ... okay on input line 8.
LaTeX Font Info:    Checking defaults for U/cmr/m/n on input line 8.
LaTeX Font Info:    ... okay on input line 8.
\symlegacymaths=\mathgroup4
LaTeX Font Info:    Overwriting symbol font `legacymaths' in version `bold'
(Font)                  OT1/cmr/m/n --> OT1/cmr/bx/n on input line 8.
LaTeX Font Info:    Redeclaring math accent \acute on input line 8.
LaTeX Font Info:    Redeclaring math accent \grave on input line 8.
LaTeX Font Info:    Redeclaring math accent \ddot on input line 8.
LaTeX Font Info:    Redeclaring math accent \tilde on input line 8.
LaTeX Font Info:    Redeclaring math accent \bar on input line 8.
LaTeX Font Info:    Redeclaring math accent \breve on input line 8.
LaTeX Font Info:    Redeclaring math accent \check on input line 8.
LaTeX Font Info:    Redeclaring math accent \hat on input line 8.
LaTeX Font Info:    Redeclaring math accent \dot on input line 8.
LaTeX Font Info:    Redeclaring math accent \mathring on input line 8.
LaTeX Font Info:    Redeclaring math symbol \colon on input line 8.
LaTeX Font Info:    Redeclaring math symbol \Gamma on input line 8.
LaTeX Font Info:    Redeclaring math symbol \Delta on input line 8.
LaTeX Font Info:    Redeclaring math symbol \Theta on input line 8.
LaTeX Font Info:    Redeclaring math symbol \Lambda on input line 8.
LaTeX Font Info:    Redeclaring math symbol \Xi on input line 8.
LaTeX Font Info:    Redeclaring math symbol \Pi on input line 8.
LaTeX Font Info:    Redeclaring math symbol \Sigma on input line 8.
LaTeX Font Info:    Redeclaring math symbol \Upsilon on input line 8.
LaTeX Font Info:    Redeclaring math symbol \Phi on input line 8.
LaTeX Font Info:    Redeclaring math symbol \Psi on input line 8.
LaTeX Font Info:    Redeclaring math symbol \Omega on input line 8.
LaTeX Font Info:    Redeclaring math symbol \mathdollar on input line 8.
LaTeX Font Info:    Redeclaring symbol font `operators' on input line 8.
LaTeX Font Info:    Encoding `OT1' has changed to `U' for symbol font
(Font)              `operators' in the math version `normal' on input line 8.
LaTeX Font Info:    Overwriting symbol font `operators' in version `normal'
(Font)                  OT1/cmr/m/n --> U/cmr/m/n on input line 8.
LaTeX Font Info:    Encoding `OT1' has changed to `U' for symbol font
(Font)              `operators' in the math version `bold' on input line 8.
LaTeX Font Info:    Overwriting symbol font `operators' in version `bold'
(Font)                  OT1/cmr/bx/n --> U/cmr/m/n on input line 8.
LaTeX Font Info:    Overwriting symbol font `operators' in version `normal'
(Font)                  U/cmr/m/n --> U/cmr/m/n on input line 8.
LaTeX Font Info:    Overwriting math alphabet `\mathrm' in version `normal'
(Font)                  U/cmr/m/n --> U/cmr/m/n on input line 8.
LaTeX Font Info:    Overwriting math alphabet `\mathit' in version `normal'
(Font)                  OT1/cmr/m/it --> U/cmr/m/it on input line 8.
LaTeX Font Info:    Overwriting math alphabet `\mathbf' in version `normal'
(Font)                  OT1/cmr/bx/n --> U/cmr/bx/n on input line 8.
LaTeX Font Info:    Overwriting math alphabet `\mathsf' in version `normal'
(Font)                  OT1/cmss/m/n --> U/cmss/m/n on input line 8.
LaTeX Font Info:    Overwriting math alphabet `\mathtt' in version `normal'
(Font)                  OT1/cmtt/m/n --> U/cmtt/m/n on input line 8.
LaTeX Font Info:    Overwriting symbol font `operators' in version `bold'
(Font)                  U/cmr/m/n --> U/cmr/bx/n on input line 8.
LaTeX Font Info:    Overwriting math alphabet `\mathrm' in version `bold'
(Font)                  U/cmr/m/n --> U/cmr/bx/n on input line 8.
LaTeX Font Info:    Overwriting math alphabet `\mathit' in version `bold'
(Font)                  OT1/cmr/bx/it --> U/cmr/bx/it on input line 8.
LaTeX Font Info:    Overwriting math alphabet `\mathsf' in version `bold'
(Font)                  OT1/cmss/bx/n --> U/cmss/bx/n on input line 8.
LaTeX Font Info:    Overwriting math alphabet `\mathtt' in version `bold'
(Font)                  OT1/cmtt/m/n --> U/cmtt/bx/n on input line 8.
\c at zf@famc at IPAMincho=\count95
Package fontspec Info: Defining font family for "IPAMincho" with options [] on 
input line 12.
Package fontspec Info: Could not resolve font IPAMincho/B (it might not exist) 
on input line 12.
Package fontspec Info: Could not resolve font IPAMincho/I (it might not exist) 
on input line 12.
Package fontspec Info: Could not resolve font IPAMincho/BI (it might not exist)
 on input line 12.
 [1

]
(./Kanji_Test.aux) ) 
Here is how much of TeX's memory you used:
 2117 strings out of 191470
 41652 string characters out of 1925444
 123237 words of memory out of 1500000
 5336 multiletter control sequences out of 10000+200000
 4261 words of font info for 18 fonts, out of 1200000 for 2000
 605 hyphenation exceptions out of 8191
 28i,4n,28p,223b,150s stack positions out of 5000i,500n,6000p,200000b,15000s

Output written on Kanji_Test.xdv (1 page, 516 bytes).
-------------- next part --------------
xdvipdfmx -p a4 -V5  -vv Kanji_Test.xdv
xdvipdfmx -p a4 -V5 -vv Kanji_Test.xdv
DVI Comment:  XeTeX output 2010.08.20:1246
Kanji_Test.xdv -> Kanji_Test.pdf
[1<cmbx12 at 14.35pt(TFM:cmbx12[/usr/share/texmf/fonts/tfm/public/cm/cmbx12.tfm])
pdf_font>> Simple font "cmbx12" enc_id=<builtin,-1> opened at font_id=<cmbx12,0>.
><IPAMincho(IPAMincho:Regular)@9.96pt<NATIVE-FONTMAP:IPAMincho/H>
fontmap: IPAMincho/H -> /usr/share/fonts/ttf/japanese/ipam.ttf(Identity-H)

pdf_font>> Input encoding "Identity-H" requires at least 2 bytes.
pdf_font>> The -m <00> option will be assumed for "/usr/share/fonts/ttf/japanese/ipam.ttf".

** NOTICE: This document contains a `Preview & Print only' licensed font **
(CID:IPAMincho)
pdf_font>> Type0 font "/usr/share/fonts/ttf/japanese/ipam.ttf" cmap_id=<Identity-H,0> opened at font_id=<IPAMincho/H,1>.
><cmr10 at 9.96pt(TFM:cmr10[/usr/share/texmf/fonts/tfm/public/cm/cmr10.tfm])
pdf_font>> Simple font "cmr10" enc_id=<builtin,-1> opened at font_id=<cmr10,2>.
>](cmbx12[CMBX12][built-in][Type1][9 glyphs][1003 bytes])(cmr10[CMR10][built-in][Type1][2 glyphs][328 bytes])
otf_cmap>> Creating ToUnicode CMap for "/usr/share/fonts/ttf/japanese/ipam.ttf"...

** WARNING ** Invalid CMap mapping entry. (ignored)
(CID:/usr/share/fonts/ttf/japanese/ipam.ttf[VMHGDV+IPAMincho][CIDFontType2][13 glyphs (Max CID: 3528)][19922 bytes])
Compression eliminated approximately 17652 bytes
8514 bytes written
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Kanji_Test.pdf
Type: application/pdf
Size: 8513 bytes
Desc: Kanji_Test.pdf
URL: <http://tug.org/pipermail/xetex/attachments/20100820/bd6e125f/attachment-0001.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Kanji_Test.tex
Type: application/x-tex
Size: 497 bytes
Desc: Kanji_Test.tex
URL: <http://tug.org/pipermail/xetex/attachments/20100820/bd6e125f/attachment-0001.tex>


More information about the XeTeX mailing list