[XeTeX] Strange hyphenation with polyglossia in French
Philip Taylor (Webmaster, Ret'd)
P.Taylor at Rhul.Ac.Uk
Wed Oct 20 09:47:32 CEST 2010
Khaled Hosny wrote:
> On Wed, Oct 20, 2010 at 12:21:12AM +0200, Mojca Miklavec wrote:
>> Arthur also reminded me that one might want to treat scedilla and
>> scommaaccent as equivalent characters for Romanian,
> Lately, I've been told that Romanians are now strongly against this
> scedilla=scommaaccent thing being legacy artifact, and that continuing
> to support it is causing harm than good.
I don't think it's up to us (the TeX/XeTeX/LuaTeX/hyphenation community)
to police such things : if Unicode implements this equivalence relationship
then so should we. Part of the specification currently reads :
> In Turkish and Romanian, a cedilla and a comma below sometimes replace one another
> depending on the font style, as shown in example 4 in Figure 7-1. The form with the cedilla
> is preferred in Turkish, and the form with the comma below is preferred in Romanian. The
> characters with explicit commas below are provided to permit the distinction from characters
> with a cedilla. Legacy encodings for these characters contain only a single form of each
> of these characters. ISO/IEC 8859-2 maps these to the form with the cedilla, while ISO/IEC
> 8859-16 maps them to the form with the comma below. Migrating Romanian 8-bit data to
> Unicode should be done with care.
> In general, characters with cedillas or ogoneks below are subject to variable typographical
> usage, depending on the availability and quality of fonts used, the technology, and the geographic
> area. Various hooks, commas, and squiggles may be substituted for the nominal
> forms of these diacritics below, and even the directions of the hooks may be reversed.
> Implementers should become familiar with particular typographical traditions before
> assuming that characters are missing or are wrongly represented in the code charts in the
> Unicode Standard.
More information about the XeTeX