[texhax] Japanese Fonts
pierre.mackay at comcast.net
Wed Apr 12 20:29:29 CEST 2006
Chris Bourke wrote:
>I've been experimenting with including a bit of Japanese in my TeX
>documents. I'm using Windows XP along with Microsoft's Input method
>editor to write.
>Under MikTeX I am able to compile JIS encoded examples without any
>problems. However, I am unable to compile with UTF-8 or Unicode.
>I've been using Notepad to write, which only supports UTF-8 and Unicode.
>I know the emacs supports JIS, but I don't want to go through the trouble
>of installing and using it (I primarily use WinEDT but it is unable to
>handle anything other than ascii).
>Does anyone have any suggestions? Are there any barebones programs that
>will support JIS encoding? Is there any way to get latex (especially
>pdflatex) to understand unicode and/or UTF-8?
The enclosed macros are a model, rather than a complete auxiliary
package. They illustrate a simple and economical way to parse UTF-8
sequences from 0x0000 to 0xFFFF (and could be extended to 0x1FFFF). I
have filled out only a few of the 3-byte sequences, but the model should
be clear, and I doubt there will ever be a need for a single package
that covers all of the range.
The UTF-8 parser returns the Unicode page in \unipgid, and suggests how
this can be translated into the name of a TFM file. The codepoint is
returned in \unichar, which I show as recast into a \chardef so that it
can be inserted directly into the DVI. This is only one possible
approach. It depends on what your printer driver expects. The two values
could be dumped in sequence into the DVI to form a UCS-2 16-bit value,
or used as a two-step index into a comprehensive Open Type font. My
preference is to keep within the framework of traditional 8-bit TFM and
VF technology, in order to demonstrate that there is no need to insert
incompatible extensions into Donald Knuth's TeX engine, not even the
ones he left stubs for.
I am not a LaTeX user, but I believe this package should get along well
with LaTeX. Unless LaTeX has taken to using a large number of active
characters in the 0xC0--0xE0 range, there is no risk other than the
minuscule possibility of conflict with a control sequence that is
already preempted by LaTeX. That could easily be fixed because this code
is so simple.
I hope this may be of some interest to you because it is important to
start answering the growing prejudice that trip-tested 8-bit TeX "cannot
handle" things like Unicode input.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 7105 bytes
Desc: not available
Url : http://tug.org/pipermail/texhax/attachments/20060412/0882c620/utf8-0001.bin
More information about the texhax