# [luatex] No hyphenation when \lccodes zero with explicit hyphens

Hans Hagen pragma at wxs.nl
Sun May 15 13:20:01 CEST 2016

On 5/14/2016 6:07 PM, Ulrike Fischer wrote:
> Am Sat, 14 May 2016 16:35:32 +0200 schrieb Hans Hagen:
>
>>>
>>> Your version is too old. But I still see the issue in the newest
>>> luatex from TL2016 pretest. So I would say it has not been fixed.
>>
>> luatex has \hjcode for this (it initializes from lccodes but from then
>> on works with hjcodes thereby untangling character casing from hyphenation)
>
> Changing the \hjcode enables the hyphenation (like changing the
> \lccode) but this doesn't answer the question if it is deliberate
> that there is no global hyphenation point after the hyphen.
>
> Do I have to set all \hjcodes to be able to get an break after an
> hyphen?

there are several things happening:

(1) The hyphen char is only looked at when there is a valid (in terms of
subtype, language and hjcode) character left of it; this all relates to
the split in stages, language and hyphenation properties carried with
glyphs, language specific hyphens etc. This is how luatex works.

(2) The word start is determined by checks against node types.

(3) The same is true for word ends.

So, this is why

\parindent0pt \hsize=1.1cm
12-34-56 \par
12-34-\hbox{56} \par
12-34-\vrule width 1em height 1.5ex \par
12-\hbox{34}-56 \par
12-\vrule width 1em height 1.5ex-56 \par
\hjcode\1=\1 \hjcode\2=\2 \hjcode\3=\3 \hjcode\4=\4
\vskip.5cm
12-34-56 \par
12-34-\hbox{56} \par
12-34-\vrule width 1em height 1.5ex \par
12-\hbox{34}-56 \par
12-\vrule width 1em height 1.5ex-56 \par

gives different results than in pdftex.

Now, (1) is what luatex does, so that's unlikely to change (there are
some more subtle differences anyway and we don't aim at complete
compatibility).

The (2) and (3) conditions are currently a bit strict and can be made to
include e.g. boxes and rules. I played with it and the easiest solution
is just an option: be more like pdftex in that respect, but provide a
flag to be strict (1=start of word, 2=end of word, 3=both). This seems
to work ok (at least at my end, but i didn't check extensively).

But as we are in code freeze stage this will not happen in 0.95 (i need
to think of a proper name for a variable anyway).

Hans

-----------------------------------------------------------------