[XeTeX] \hyphenation{} and combining diacritics

maxwell maxwell at umiacs.umd.edu
Fri Jul 8 23:55:35 CEST 2011

On Fri, 8 Jul 2011 15:50:07 -0500, Joshua and Amy <josh.ruthamy at gmail.com>
> So, I guess I was foolish to hope that Google has figured out how to
> results that have non-identical but equivalent strings?

I'm sure google has figured this out, and some programs to an automatic
conversion to composed or decomposed form.  But I wouldn't be surprised if
some programmer's editors, for example, don't do that (for some purposes,
such as search-and-replace, the difference might be important), and maybe
some other programs don't either.

> I hope it's not too off-topic for this list, but can you point me to any
> good resources on normalization (is there a straightforward automation
> someone who doesn't do scripting? am I supposed to use decomposed
> characters?)?

You can use either composed or decomposed characters for most purposes,
although as I say some programs do an automatic (and possibly invisible)

There's a general article on this issue here:
I know of library functions in Python that do the conversion; I'm sure
they exist in Perl too.  But I'm not aware of a general program (like
iconv) that does it.  (I think there's a hack with iconv that allows it to
create decomposed forms, but that is not a bidirectional conversion.) 
Maybe someone else on this list knows of tools that do that.  (What OS are
you working on?)

   Mike Maxwell

