[tex-hyphen] Apostrophe

Petr Sojka sojka at fi.muni.cz
Mon Jun 16 18:30:18 CEST 2008


On Mon, Jun 16, 2008 at 04:51:35PM +0100, Jonathan Kew wrote:
Jonathan (and others),

> >>IMO, where some patterns have traditionally included the  
> >>apostrophe (x27),
> >>we should probably provide duplicate patterns with U+2019 as well.
> >
> >Any little/tiny chance to use some other way to achieve the same? It's
> >seem like yet-another-hack to me, that will prevent us from direct
> >conversion to 8-bit patterns.
> >
> >1.) create a list of equivalent characters
> >
> >2.)
> >a) parse contents of \patterns and if some character from the list
> >belongs to that list, duplicate the pattern before it's passed to TeX
> 
> It ought to be possible to do this, I guess, but it's fairly painful  
> as TeX macro programming. (For LuaTeX it could no doubt be done much  
> more easily in Lua, but that doesn't help XeTeX.)
> 
> >b) extend the engine (only XeTeX/LuaTeX in that case) in some way to
> >accept hints that some characters are equivalent during hyphenation. I
> >guess that \lccode does exactly that, but I'm not sure what will
> >happen if I set lccode of "adiaeresis" to lccode of "a" for example,
> >when I want to use some macro to do uppercasing/lowercasing of words
> >for me.
> 
> Or to take the specific example of the apostrophe, we could set  
> \lccode"2019="27 (or vice versa, depending which way we want to write  
> the patterns). But then if someone applies \lowercase to a run of  
> text that includes the ? character, they'll be surprised to see it  
> changed to '.
> 
> The trouble is that \lccode is overloaded, being used for multiple  
> purposes that may not always want the same set of mappings. I suppose  
> if we had a separate \hyphequiv table, that would help -- but you're  
> not getting a new feature like that in time for the TL2008 release!

\hyphequiv table might be the right way of doing that (it was suggested 
some 13 years ago) to make patterns independent of font encodings.
Anyway, for the purpose of unifying char positions just for hyphenation
and not lowercasing, one can use etex's \savinghyphcodes macro:
see sec. 3.10 of http://www.tug.org/teTeX/texmf-dist/doc/etex/base/etex-man.pdf

Jonathan, do you have etex in XeTeX, right?

--ps
 
> >I would really prefer not to introduce new hacks in patterns.
> >Apostrophe represents a single character, so it should be left as a
> >single character in patterns (assuming that we leave it there), only
> >TeX might see it in a different way.
> 
> The correct Unicode character to use would be U+2019, I think, so we  
> could simply use that in the patterns and ignore U+0027. The trouble  
> is that there are sure to be users who have U+0027 in their text, and  
> expect this to behave the same way; in order to support both the  
> "best practice" and the "ASCII-like" encoding of the data, we need  
> two versions of the patterns. That's not really a "hack in patterns",  
> IMO, it's a concession to the fact that real-life data will not  
> always be encoded in the purest and best Unicode Way, and it may be  
> helpful to try and support these "variant spellings" where possible.
> 
> JK


More information about the tex-hyphen mailing list