[XeTeX] \hyphenation{} and combining diacritics

maxwell maxwell at umiacs.umd.edu
Fri Jul 8 23:55:35 CEST 2011


On Fri, 8 Jul 2011 15:50:07 -0500, Joshua and Amy <josh.ruthamy at gmail.com>
wrote:
> So, I guess I was foolish to hope that Google has figured out how to
return
> results that have non-identical but equivalent strings?

I'm sure google has figured this out, and some programs to an automatic
conversion to composed or decomposed form.  But I wouldn't be surprised if
some programmer's editors, for example, don't do that (for some purposes,
such as search-and-replace, the difference might be important), and maybe
some other programs don't either.

> I hope it's not too off-topic for this list, but can you point me to any
> good resources on normalization (is there a straightforward automation
for
> someone who doesn't do scripting? am I supposed to use decomposed
> characters?)?

You can use either composed or decomposed characters for most purposes,
although as I say some programs do an automatic (and possibly invisible)
conversion.  

There's a general article on this issue here:
   http://en.wikipedia.org/wiki/Unicode_equivalence
I know of library functions in Python that do the conversion; I'm sure
they exist in Perl too.  But I'm not aware of a general program (like
iconv) that does it.  (I think there's a hack with iconv that allows it to
create decomposed forms, but that is not a bidirectional conversion.) 
Maybe someone else on this list knows of tools that do that.  (What OS are
you working on?)

   Mike Maxwell


More information about the XeTeX mailing list