<font size="2"><font face="verdana,sans-serif">Unicode normalization was discussed on this list a couple of months ago.  Phil Taylor provided a small program to do the job, and other utilities were referred to.  There's also a command within XeTeX  that normalizes unicode before passing it to TeX's digestion.  Try this in your header:<br>


<br>% Normalize any residual Unicode combining accents, <br>% and write out error messages, if any:<br>\XeTeXinputnormalization=1<br>\tracinglostchars=1 <br>\tracingonline=1 <br><br>Dominik<br></font></font>

<br><br><div class="gmail_quote">On 8 July 2011 22:50, Joshua and Amy <span dir="ltr"><<a href="mailto:josh.ruthamy@gmail.com">josh.ruthamy@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


So, I guess I was foolish to hope that Google has figured out how to return results that have non-identical but equivalent strings?<br><br>I hope it's not too off-topic for this list, but can you point me to any good resources on normalization (is there a straightforward automation for someone who doesn't do scripting? am I supposed to use decomposed characters?)?<br>


<br>Thanks.<br><br>Josh<div><div></div><div class="h5"><br><br><div class="gmail_quote">On Fri, Jul 8, 2011 at 3:11 PM, maxwell <span dir="ltr"><<a href="mailto:maxwell@umiacs.umd.edu" target="_blank">maxwell@umiacs.umd.edu</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


On Fri, 8 Jul 2011 15:00:42 -0500, Joshua and Amy <<a href="mailto:josh.ruthamy@gmail.com" target="_blank">josh.ruthamy@gmail.com</a>><br>

wrote:<br>

<div>> I'm creating some hyphenation rules for Jarai texts that I'm<br>

> interlinearizing. Here's the problem: In various texts, a complex<br>

character<br>

> such as LATIN SMALL LETTER A WITH BREVE might be encoded as a single<br>

code<br>

> point (U+0103) or as a combination of code points (LATIN SMALL LETTER A:<br>

> U+0061 plus COMBINING BREVE: U+0306).<br>

<br>

</div>Can't (shouldn't!) you pass your texts through a Unicode normalization<br>

process?  Otherwise search on them might not work either, depending on how<br>

smart your search tool is.<br>

<br>

   Mike Maxwell<br>

<br>

<br>

--------------------------------------------------<br>

Subscriptions, Archive, and List information, etc.:<br>

  <a href="http://tug.org/mailman/listinfo/xetex" target="_blank">http://tug.org/mailman/listinfo/xetex</a><br>

</blockquote></div><br>

</div></div><br><br>

<br>

--------------------------------------------------<br>

Subscriptions, Archive, and List information, etc.:<br>

  <a href="http://tug.org/mailman/listinfo/xetex" target="_blank">http://tug.org/mailman/listinfo/xetex</a><br>

<br></blockquote></div><br>