[luatex] Hash tokens meaning
Sensei
senseiwa at gmail.com
Tue May 28 15:04:24 CEST 2013
On 5/28/13 1:01 PM, Arthur Reutenauer wrote:
>> I am trying to analyze what TeX produces (dumping the contents of
>> tex.hashtokens() by the way).
>
> Oh, so that was it ... Well, I can say with confidence that there
> were probably only three to five people in the world who had any chance
> of understanding what you meant by "hash tokens", in your original
> email, and none of them is contributing to this discussion (but one of
> them definitely is subscribed to this list ;-)
I supposed it was clear, because I posted to luatex asking about hash
tokens. I was obviously misleading others! :)
> So part of what I said earlier doesn't apply, you really are looking
> at TeX's hash table. This is yet different, and happens at a very low
> level. The documentation for that is in tex.web, and the change files
> for the different extensions. You're not really doing yourself a favour
> by starting with LuaTeX; better to start with Knuth's TeX, in my
> opinion. Its source code, along with the comments, actually is
> published as a book.
Good to know, I supposed I could start by getting a low level
impression, the same way I do when asking for symbols in an object
files, and next take a look at the disassembly.
>> So I am only looking at the tokens produced by TeX, feeding a LaTeX
>> file: I know what LaTeX does, but since it uses TeX as an engine, I
>> wanted to know what TeX does with my document structure (labels,
>> chapters, floats, bibliographies, ...).
>
> Which is absurd: shouldn't you look at the source code of *LaTeX*
> first, before looking at a dump of TeX's memory? It's almost like you
> want to be confused.
That is not my purpose, and by the way, yes I look at the memory when
trying to figure out how a piece of software works (it's part of my
job), especially when you assume that you don't have the source code.
>> As before, I just dump to file what tex.hashtokens() contains. I can
>> attach the file if needed.
>
> Yes, obviously, we need the source file. Did you really imagine that
> we could say anything substantial about random bits of TeX's memory
> without knowing what the input was?
I attached parts of it, since the symbol table is 70K.
>
>> ===BEGIN===
>> sffamily
>> ^A
>> tracingoutput
>> <=== THIS IS A TAB
>> ^H
>> ^K
>> macc at palette
>> ^M
>> ^L
>> ^N
>> @currdir
>> makesm at sh
>> pdftrue
>> ?\textless
>> @@MP:P:curveto
>> ^Y
>> ^[
>> ^Z
>> luatexUroot
>> !
>> <=== THIS IS A SPACE
>> ====END====
>
> OK, so you meant white space. Blank is indeed a misleading word to
> call these strings. Yes, there may be white space. Why does it bother
> you? "\ " actually is a pretty common user-level command of TeX.
Because it's new. I thought of it as an escaping in C, a sort, let's say
this, of protection of the next character, as in \% (the same way in C
for \").
>> ===BEGIN===
>> pagecolor
>> , <=== THIS IS A WEIRD ONE
>> skipemptyMPgraphictrue
>>
>> ====END====
>>
>> With an hex editor, I find that the second line is EF BF BF 2C.
>
> This is perfectly valid UTF-8, it's the byte sequence for two
> characters: U+FFFF and U+002C. The former is not supposed to be used in
> files, and usually appears as a replacement of an invalid character, and
> the latter is simply a comma.
Yes, I knew that once opened the hash file with an hex editor. I knew
TeX didn't have support for unicode, and I thought that lualatex
translated into TeX, which produced an output. So a unicode string was
unexpected, and I thought I messed up with my dump code.
>> It seems to me that TeX is using a very low level encoding, which I
>> find again weird (or wrong, in the sense that I don't know how to
>> correctly dump the tokens).
>
> You may have dumped the tokens correctly, there is a lot of low-level
> stuff in TeX. What's surprising to me is that you find it weird!
Pardon me, but I'm used to write code in C, assembly, C++, or whatever
other programming language (mainly those three, in that order). TeX is
very, very different.
>> Yes, I imagined it was related to the Narnian way of encoding fonts,
>> but I don't know how it encodes it (I found a document by Rahtz on
>> TUG, but I see no mention of "<>").
>
> Look again, then. The long string you quoted (<5><6> etc.) clearly is
> the fifth argument to \DeclareFontShape, one of the standard NFSS
> commands. It's part of the LaTeX2e and is documented in several places,
> for example the LaTeX Companion, or, for a free resource,
> doc/latex/base/fntguide.pdf in most TeX distributions.
Good!
>> You don't have this interest, it's ok, but I really do! I like to
>> know how something works! ;)
>
> You're missing the point. Producing \r at something is *one of the many*
> things that happens when you type \label{something}; it's probably the
> control sequence whose name is most obviously related to the label you
> created, but there is nothing special about that particular control
> sequence. That's why I remarked that it's not an interesting fact, and
> you probably wouldn't have noticed it, hadn't it been for your biased
> approach of looking at static memory dumps.
>
> Far more interesting are the different commands defined by LaTeX when
> \label is called, look for "ltxref.dtx" in latex.ltx. The letter "r"
> (in \r at something) is introduced in a macro called \newlabel (line 3881
> of my copy of latex.ltx), and "@" in \@newl at bel, one line above it.
That is awesome, I now have a place to start!
Anyway, at some point there *is* a static version of a code somewhere,
otherwise there would be no output. Yes, I am biased by my job and
education, but I find hard to grasp the opposition to this approach.
You look "top down", I use the "bottom up" approach :)
More information about the luatex
mailing list