[luatex] Hash tokens meaning

Arthur Reutenauer arthur.reutenauer at normalesup.org
Tue May 28 13:01:39 CEST 2013


> Well, it's not clear because I am not trying to *do* anything.

  That's what I'm saying, it's a problem.

>                                                                I am
> trying to analyze what TeX produces (dumping the contents of
> tex.hashtokens() by the way).

  Oh, so that was it ...  Well, I can say with confidence that there
were probably only three to five people in the world who had any chance
of understanding what you meant by "hash tokens", in your original
email, and none of them is contributing to this discussion (but one of
them definitely is subscribed to this list ;-)

  So part of what I said earlier doesn't apply, you really are looking
at TeX's hash table.  This is yet different, and happens at a very low
level.  The documentation for that is in tex.web, and the change files
for the different extensions.  You're not really doing yourself a favour
by starting with LuaTeX; better to start with Knuth's TeX, in my
opinion.  Its source code, along with the comments, actually is
published as a book.

> So I am only looking at the tokens produced by TeX, feeding a LaTeX
> file: I know what LaTeX does, but since it uses TeX as an engine, I
> wanted to know what TeX does with my document structure (labels,
> chapters, floats, bibliographies, ...).

  Which is absurd: shouldn't you look at the source code of *LaTeX*
first, before looking at a dump of TeX's memory?  It's almost like you
want to be confused.

> As before, I just dump to file what tex.hashtokens() contains. I can
> attach the file if needed.

  Yes, obviously, we need the source file.  Did you really imagine that
we could say anything substantial about random bits of TeX's memory
without knowing what the input was?

> ===BEGIN===
> sffamily
> ^A
> tracingoutput
>                     <=== THIS IS A TAB
> ^H
> ^K
> macc at palette
> ^M
> ^L
> ^N
> @currdir
> makesm at sh
> pdftrue
> ?\textless
> @@MP:P:curveto
> ^Y
> ^[
> ^Z
> luatexUroot
> !
>                     <=== THIS IS A SPACE
> ====END====

  OK, so you meant white space.  Blank is indeed a misleading word to
call these strings.  Yes, there may be white space.  Why does it bother
you?  "\ " actually is a pretty common user-level command of TeX.

> ===BEGIN===
> pagecolor
> ￿,                 <=== THIS IS A WEIRD ONE
> skipemptyMPgraphictrue
> 
> ====END====
> 
> With an hex editor, I find that the second line is EF BF BF 2C.

  This is perfectly valid UTF-8, it's the byte sequence for two
characters: U+FFFF and U+002C.  The former is not supposed to be used in
files, and usually appears as a replacement of an invalid character, and
the latter is simply a comma.

> It seems to me that TeX is using a very low level encoding, which I
> find again weird (or wrong, in the sense that I don't know how to
> correctly dump the tokens).

  You may have dumped the tokens correctly, there is a lot of low-level
stuff in TeX.  What's surprising to me is that you find it weird!

> Yes, I imagined it was related to the Narnian way of encoding fonts,
> but I don't know how it encodes it (I found a document by Rahtz on
> TUG, but I see no mention of "<>").

  Look again, then.  The long string you quoted (<5><6> etc.) clearly is
the fifth argument to \DeclareFontShape, one of the standard NFSS
commands.  It's part of the LaTeX2e and is documented in several places,
for example the LaTeX Companion, or, for a free resource,
doc/latex/base/fntguide.pdf in most TeX distributions.

>>    See above about Narnia.  Seriously, I have no idea what you mean.  The
>> vast majority of control sequences are expandable, meaning that their
>> low-level meaning usually is something else than what you type.  That is
>> of course the whole point of a macro language; there would be little
>> advantage to defining commands such as \label, if all "\label{something}"
>> did was to print "\label{something}" on the page.  It might be that in
>> the course of being expanded, \label{something} produces a control
>> sequence \r at something; I do not know the details of how it's implemented,
>> and have no inclination to look into it.
> 
>> I fail to see how it's interesting anyway.
> 
> 
> You don't have this interest, it's ok, but I really do! I like to
> know how something works! ;)

  You're missing the point.  Producing \r at something is *one of the many*
things that happens when you type \label{something}; it's probably the
control sequence whose name is most obviously related to the label you
created, but there is nothing special about that particular control
sequence.  That's why I remarked that it's not an interesting fact, and
you probably wouldn't have noticed it, hadn't it been for your biased
approach of looking at static memory dumps.

  Far more interesting are the different commands defined by LaTeX when
\label is called, look for "ltxref.dtx" in latex.ltx.  The letter "r"
(in \r at something) is introduced in a macro called \newlabel (line 3881
of my copy of latex.ltx), and "@" in \@newl at bel, one line above it.

	Arthur


More information about the luatex mailing list