[luatex] Hash tokens meaning

Sensei senseiwa at gmail.com
Tue May 28 08:53:56 CEST 2013


On 5/27/13 5:15 PM, Arthur Reutenauer wrote:
>    That's your problem here.  What are you trying to do, and what were
> the errors you encountered?  So far as I can see you're not actually
> doing anything real, only asking vague questions, the purpose of which
> is not clear at all.

Well, it's not clear because I am not trying to *do* anything. I am 
trying to analyze what TeX produces (dumping the contents of 
tex.hashtokens() by the way).

So I am only looking at the tokens produced by TeX, feeding a LaTeX 
file: I know what LaTeX does, but since it uses TeX as an engine, I 
wanted to know what TeX does with my document structure (labels, 
chapters, floats, bibliographies, ...).

>    Not only is there no such document, but if it did it would probably
> not help you very much.  The problem with the word "cheatsheet" is that
> the information it contains is only valuable to you if you actually know
> something beforehand.  It doesn't really help you to cheat, more to
> quickly find something that temporarily slipped your mind, or that you
> learned a long time ago and need to brush up on.
>
>    Now, being well understood that the answers probably won't help you at
> all, there is no reason why we shouldn't try and answer your questions,
> to the extent that they make sense:

Thanks! :)


>> I am trying to grasp the meaning of the hash tokens I find in a latex
>> document. I am a novice TeX user, so please bear with my (almost surely)
>> stupid questions: I've only used LaTeX, and never looked into the pit :)
>
>    First of all, I believe that what you're calling "hash token" is a
> control sequence, i. e., any sequence of characters starting with
> backslash (\).  It's not necessarily always backslash, by the way, but
> usually is.  Is that what you mean?

Probably, except that tex.hashtokens() contains no leading "\" for many 
commands (for example endequation, while others to as \T1\.-\i).


>> - First of all, I find many tokens that contain @ or ! or ^, for example
>> "tagsleft at false", "ps at headings", "\T1\^-\i", or "!!stringa": do these
>> characters have a special meaning?
>
>    Not necessarily.  The commercial at sign (@) is often used in LaTeX2e
> packages by a special convention, that dictates that its use in control
> sequence names is reserved to developers writing packages, and that
> users should not type them in their documents.  This enables developers
> to be reasonably certain that their internal commands won't be
> overwritten by users at typesetting time, and is enforced at a technical
> level (by changing @'s category code); but it is no more than a
> convention, and it's not uncommon for users to change the category code
> themselves at times (the LaTeX kernel even provides commands to switch
> back and forth).

I understand. This means that an average LaTeX user won't even try to 
change the meaning of @, and this behavior is almost completely well 
defined.

>    The command \^ takes one argument and puts a circumflex accent over
> it.  The caret character is only visual and mnemonic.
>
>    I don't know where you took "!!stringa" from, but it seems to be a
> similar case to @, using the exclamation mark to ensure that the control
> sequence won't be overwritten by users (as ! is usually not legal in
> control sequences).

As before, I just dump to file what tex.hashtokens() contains. I can 
attach the file if needed.

>> - Why do I have so many "blank" tokens? At the end of this you'll find a
>> sample from a dump file viewed from emacs, so an encoding of special
>> characters is visible.
>
>    I don't know what you mean.  In the file you attached, there's only
> one blank line, and since you're not telling us how you produced the
> file exactly, it could be anything.

This is how I am dumping the strings:

      \directlua{
         local f = io.open("hash.txt", "w")
         for name in pairs(tex.hashtokens()) do
             f:write(name .. "\string\n")
         end
         f:close()}

Maybe I used a misleading word. I am referring to lines contain 
non-printable characters, for example:

===BEGIN===
sffamily
^A
tracingoutput
                     <=== THIS IS A TAB
^H
^K
macc at palette
^M
^L
^N
@currdir
makesm at sh
pdftrue
?\textless
@@MP:P:curveto
^Y
^[
^Z
luatexUroot
!
                     <=== THIS IS A SPACE
====END====


Control sequences as ^H seemed to me very strange, not to say that 
something is "wrong" in how I create the tokens file. Moreover, it seems 
that something is screwing with the encoding à la UTF, for example I find

===BEGIN===
pagecolor
￿,                 <=== THIS IS A WEIRD ONE
skipemptyMPgraphictrue 

====END====

With an hex editor, I find that the second line is EF BF BF 2C.

It seems to me that TeX is using a very low level encoding, which I find 
again weird (or wrong, in the sense that I don't know how to correctly 
dump the tokens).

>> -Are tokens as
>> "<5><6><7><8><9>gen*cmbx<10><10.95>cmbx10<12><14.4><17.28><20.74><24.88>cmbx12"
>> real, or is it some kind of "encoding" that I see?
>
>    They're not real, of course, they're imaginary tokens from the land of
> Narnia.  Ah, the Narnian tokens!  Many a night I've spent hunting down
> those that, like sheep, had gone astray, and turned them back to the
> right path.
>
>    (Or, they could be related to the NFSS.)


Yes, I imagined it was related to the Narnian way of encoding fonts, but 
I don't know how it encodes it (I found a document by Rahtz on TUG, but 
I see no mention of "<>").


>> - Are these tokens portable? I mean, a friend of mine on windows will
>> produce the very same tokens with the same document?
>
>    Assuming again that you mean control sequences by "tokens", yes,
> they're extremely portable, if they come from the LaTeX kernel.  If
> they've been defined by a package, your correspondent needs to have the
> same package, of course, but there usually are very few incompatibility
> between OSes.  XeTeX, and to some extent LuaTeX, have changed that
> partly.

That is good news, I like portability :)

>> - All \label{something} seem to be processed as token "r at something". Is
>> this real, or just my imagination?
>
>    See above about Narnia.  Seriously, I have no idea what you mean.  The
> vast majority of control sequences are expandable, meaning that their
> low-level meaning usually is something else than what you type.  That is
> of course the whole point of a macro language; there would be little
> advantage to defining commands such as \label, if all "\label{something}"
> did was to print "\label{something}" on the page.  It might be that in
> the course of being expanded, \label{something} produces a control
> sequence \r at something; I do not know the details of how it's implemented,
> and have no inclination to look into it.

> I fail to see how it's interesting anyway.


You don't have this interest, it's ok, but I really do! I like to know 
how something works! ;)


>> Again, I am sorry to ask such trivial questions, but I didn't find
>> anything anywhere on the internals of TeX!
>
>    Read the TeXbook.

I will, I promise. And I am curious about these internals.


Thanks!






More information about the luatex mailing list