[texhax] Japanese Fonts

pierre.mackay pierre.mackay at comcast.net
Wed Apr 12 20:29:29 CEST 2006

Chris Bourke wrote:

>I've been experimenting with including a bit of Japanese in my TeX
>documents.  I'm using Windows XP along with Microsoft's Input method
>editor to write.
>Under MikTeX I am able to compile JIS encoded examples without any
>problems.  However, I am unable to compile with UTF-8 or Unicode.
>I've been using Notepad to write, which only supports UTF-8 and Unicode.
>I know the emacs supports JIS, but I don't want to go through the trouble
>of installing and using it (I primarily use WinEDT but it is unable to
>handle anything other than ascii).
>Does anyone have any suggestions?  Are there any barebones programs that
>will support JIS encoding?  Is there any way to get latex (especially
>pdflatex) to understand unicode and/or UTF-8?
The enclosed macros are a model, rather than a complete auxiliary 
package. They illustrate a simple and economical way to parse UTF-8 
sequences from 0x0000 to 0xFFFF (and could be extended to 0x1FFFF). I 
have filled out only a few of the 3-byte sequences, but the model should 
be clear, and I doubt there will ever be a need for a single package 
that covers all of the range.

The UTF-8 parser returns the Unicode page in \unipgid, and suggests how 
this can be translated into the name of a TFM file. The codepoint is 
returned in \unichar, which I show as recast into a \chardef so that it 
can be inserted directly into the DVI. This is only one possible 
approach. It depends on what your printer driver expects. The two values 
could be dumped in sequence into the DVI to form a UCS-2 16-bit value, 
or used as a two-step index into a comprehensive Open Type font. My 
preference is to keep within the framework of traditional 8-bit TFM and 
VF technology, in order to demonstrate that there is no need to insert 
incompatible extensions into Donald Knuth's TeX engine, not even the 
ones he left stubs for.

I am not a LaTeX user, but I believe this package should get along well 
with LaTeX. Unless LaTeX has taken to using a large number of active 
characters in the 0xC0--0xE0 range, there is no risk other than the 
minuscule possibility of conflict with a control sequence that is 
already preempted by LaTeX. That could easily be fixed because this code 
is so simple.

I hope this may be of some interest to you because it is important to 
start answering the growing prejudice that trip-tested 8-bit TeX "cannot 
handle" things like Unicode input.

Pierre MacKay

-------------- next part --------------
A non-text attachment was scrubbed...
Name: utf8.tex
Type: text/x-tex
Size: 7105 bytes
Desc: not available
Url : http://tug.org/pipermail/texhax/attachments/20060412/0882c620/utf8-0001.bin

More information about the texhax mailing list