<font size="2"><font face="verdana,sans-serif">Unicode normalization was discussed on this list a couple of months ago. Phil Taylor provided a small program to do the job, and other utilities were referred to. There's also a command within XeTeX that normalizes unicode before passing it to TeX's digestion. Try this in your header:<br>
<br>% Normalize any residual Unicode combining accents, <br>% and write out error messages, if any:<br>\XeTeXinputnormalization=1<br>\tracinglostchars=1 <br>\tracingonline=1 <br><br>Dominik<br></font></font>
<br><br><div class="gmail_quote">On 8 July 2011 22:50, Joshua and Amy <span dir="ltr"><<a href="mailto:josh.ruthamy@gmail.com">josh.ruthamy@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
So, I guess I was foolish to hope that Google has figured out how to return results that have non-identical but equivalent strings?<br><br>I hope it's not too off-topic for this list, but can you point me to any good resources on normalization (is there a straightforward automation for someone who doesn't do scripting? am I supposed to use decomposed characters?)?<br>
<br>Thanks.<br><br>Josh<div><div></div><div class="h5"><br><br><div class="gmail_quote">On Fri, Jul 8, 2011 at 3:11 PM, maxwell <span dir="ltr"><<a href="mailto:maxwell@umiacs.umd.edu" target="_blank">maxwell@umiacs.umd.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Fri, 8 Jul 2011 15:00:42 -0500, Joshua and Amy <<a href="mailto:josh.ruthamy@gmail.com" target="_blank">josh.ruthamy@gmail.com</a>><br>
wrote:<br>
<div>> I'm creating some hyphenation rules for Jarai texts that I'm<br>
> interlinearizing. Here's the problem: In various texts, a complex<br>
character<br>
> such as LATIN SMALL LETTER A WITH BREVE might be encoded as a single<br>
code<br>
> point (U+0103) or as a combination of code points (LATIN SMALL LETTER A:<br>
> U+0061 plus COMBINING BREVE: U+0306).<br>
<br>
</div>Can't (shouldn't!) you pass your texts through a Unicode normalization<br>
process? Otherwise search on them might not work either, depending on how<br>
smart your search tool is.<br>
<br>
Mike Maxwell<br>
<br>
<br>
--------------------------------------------------<br>
Subscriptions, Archive, and List information, etc.:<br>
<a href="http://tug.org/mailman/listinfo/xetex" target="_blank">http://tug.org/mailman/listinfo/xetex</a><br>
</blockquote></div><br>
</div></div><br><br>
<br>
--------------------------------------------------<br>
Subscriptions, Archive, and List information, etc.:<br>
<a href="http://tug.org/mailman/listinfo/xetex" target="_blank">http://tug.org/mailman/listinfo/xetex</a><br>
<br></blockquote></div><br>