[XeTeX] Polyglossia: Support for romanization of CJK

Andy Lin kiryen at gmail.com
Wed Jun 22 11:35:16 CEST 2011


I'd forgotten about the state of Japanese romanization. I'm just so
used to seeing some form of mangled Hepburn because people don't
bother typing macrons. But this type of fragmentation is mostly
acceptable (that is, people can still understand each other despite
the variation) and doesn't necessarily affect hyphenation.

On Mon, Jun 20, 2011 at 22:46, Mike "Pomax" Kamermans
<pomax at nihongoresources.com> wrote:
> Thus, for German audiences, you would use something like "dzjanai" to make
> sure that even if the meaning is unknown, the correct pronunciation is
> conveyed.

This refers to a situation where you want to demonstrate the
pronunciation. That takes you away from romanization, the primary
purpose of which is to map the script/phonemes. With pronunciation,
you're venturing into very muddy waters. Consider this article
discussing the pronunciation of Toronto place names:
http://www.metronews.ca/toronto/local/article/811715--tarawna-where-the-streets-have-mispronounced-names
. Its transcriptions are wildly inconsistent and not what you would
expect to be hyphenating.

Or take the example of Chinese romanization, I can't think of a
language off-hand that uses q in the same way (although I remember
reading about one). But people use "q" nonetheless when they talk
about romanization. They don't use "ch" or "tch" or "cz", even though
it might be closer in pronunciation in their language. That's not to
say that a pronunciation guide or a newspaper might choose to
transcribe it in such a way, but that is not romanization, that is
transcription. At that point, you might as well hyphenate it using the
target language's patterns because that's essentially what it's
transformed into.

My original thought was that if romanization is dependent on the
target language and not the source language, then it becomes useless
for people who only know the source language and not the target
language. Romanization is supposed to provide scholars access to a
language which they would otherwise be unable to describe due to a
writing system that they can't use. It doesn't make sense to then
restrict the scholarship to a particular target language through the
use of a unique romanization system. Or, to use an analogy, if I'm
going to rip a CD for my friends, I'm going to rip it to mp3, not to
ogg, because I'm the only person who knows what an ogg file is. If my
ultimate goal is to ensure broad dissemination and adoption, I'm not
going to base my decision on my preferences but on what's most
accepted.

I don't disagree that variation exists in the wild. But conceptually,
there's something wrong about different hyphenation patterns for
romanization based on surrounding text.

-Andy


More information about the XeTeX mailing list