<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">David Carlisle wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAEW6iOgGq7+g=fTfnBMUxbE+ptn9Xcm3tH6JOTdq=PYhGKaZgQ@mail.gmail.com">

      <pre class="moz-quote-pre" wrap="">The hyphenation in luatex is different in many ways to that of classic

tex even when they use the same \pattern data so it's not that

surprising that you get different results for some constructs.

Here the main issue is that apart from some legacy compatibility

luatex has moved away from the (frankly weird) reliance on lowercase

codes for determining which characters take part in hyphenation. So

luatex sees the whole construct as a single word and looks it up using

the patterns, pdftex sees the * as a word boundary and doesn't start

the next word until after it sees some white space so skips most of

this.

If you set the lccode of * to itself then you get the same result as luatex</pre>

    </blockquote>

    <br>

    Ah yes, <i>mea culpa</i> — it is \lccode, not \catcode, that

    matters here (see below).  But I would, with respect, beg to differ

    with your assertion that TeX has a "frankly weird" reliance on

    \lccodes; at the time that TeX was written, every last byte counted,

    and if overloading the significance of \lccode allowed other more

    important features to be included, then I would suggest that the

    decision was a wise one at the time.<br>

    <br>

    <blockquote type="cite"> TEX looks for potentially hyphenatable

      words by searching ahead from each glue<br>

      item that is not in a math formula. The search bypasses characters

      whose<br>

      \lccode is zero, or ligatures that begin with such characters; it

      also bypasses whatsits<br>

      and implicit kern items, i.e., kerns that were inserted by TEX

      itself because of information<br>

      stored with the font. If the search finds a character with nonzero

      \lccode, or if it finds a ligature<br>

      that begins with such a character, that character is called the

      starting letter. But if any<br>

      other type of item occurs before a suitable starting letter is

      found, hyphenation is abandoned<br>

      (until after the next glue item). Thus, a box or rule or mark, or

      a kern that was explicitly inserted<br>

      by \kern or \/, must not intervene between glue and a hyphenatable

      word. If the starting<br>

      letter is not lowercase (i.e., if it doesn’t equal its own

      \lccode), hyphenation is abandoned<br>

      unless \uchyph is positive.<br>

      <br>

       If a suitable starting letter is found, let it be in font f.

      Hyphenation is abandoned unless<br>

      the \hyphenchar of f is a number between 0 and 255, inclusive. If

      this test is<br>

      passed, TEX continues to scan forward until coming to something

      that’s not one of the following<br>

      three “admissible items”: (1) a character in font f whose \lccode

      is nonzero; (2) a ligature<br>

      formed entirely from characters of type (1); (3) an implicit kern.

      The first inadmissible<br>

      item terminates this part of the process; the trial word consists

      of all the letters found in admissible<br>

      items. Notice that all of these letters are in font f.</blockquote>

    <br>

    -- <br>

    <i>** Phil.</i><br>

  </body>

</html>