[luatex] Hash tokens meaning
Sensei
senseiwa at gmail.com
Tue May 28 08:53:56 CEST 2013
On 5/27/13 5:15 PM, Arthur Reutenauer wrote:
> That's your problem here. What are you trying to do, and what were
> the errors you encountered? So far as I can see you're not actually
> doing anything real, only asking vague questions, the purpose of which
> is not clear at all.
Well, it's not clear because I am not trying to *do* anything. I am
trying to analyze what TeX produces (dumping the contents of
tex.hashtokens() by the way).
So I am only looking at the tokens produced by TeX, feeding a LaTeX
file: I know what LaTeX does, but since it uses TeX as an engine, I
wanted to know what TeX does with my document structure (labels,
chapters, floats, bibliographies, ...).
> Not only is there no such document, but if it did it would probably
> not help you very much. The problem with the word "cheatsheet" is that
> the information it contains is only valuable to you if you actually know
> something beforehand. It doesn't really help you to cheat, more to
> quickly find something that temporarily slipped your mind, or that you
> learned a long time ago and need to brush up on.
>
> Now, being well understood that the answers probably won't help you at
> all, there is no reason why we shouldn't try and answer your questions,
> to the extent that they make sense:
Thanks! :)
>> I am trying to grasp the meaning of the hash tokens I find in a latex
>> document. I am a novice TeX user, so please bear with my (almost surely)
>> stupid questions: I've only used LaTeX, and never looked into the pit :)
>
> First of all, I believe that what you're calling "hash token" is a
> control sequence, i. e., any sequence of characters starting with
> backslash (\). It's not necessarily always backslash, by the way, but
> usually is. Is that what you mean?
Probably, except that tex.hashtokens() contains no leading "\" for many
commands (for example endequation, while others to as \T1\.-\i).
>> - First of all, I find many tokens that contain @ or ! or ^, for example
>> "tagsleft at false", "ps at headings", "\T1\^-\i", or "!!stringa": do these
>> characters have a special meaning?
>
> Not necessarily. The commercial at sign (@) is often used in LaTeX2e
> packages by a special convention, that dictates that its use in control
> sequence names is reserved to developers writing packages, and that
> users should not type them in their documents. This enables developers
> to be reasonably certain that their internal commands won't be
> overwritten by users at typesetting time, and is enforced at a technical
> level (by changing @'s category code); but it is no more than a
> convention, and it's not uncommon for users to change the category code
> themselves at times (the LaTeX kernel even provides commands to switch
> back and forth).
I understand. This means that an average LaTeX user won't even try to
change the meaning of @, and this behavior is almost completely well
defined.
> The command \^ takes one argument and puts a circumflex accent over
> it. The caret character is only visual and mnemonic.
>
> I don't know where you took "!!stringa" from, but it seems to be a
> similar case to @, using the exclamation mark to ensure that the control
> sequence won't be overwritten by users (as ! is usually not legal in
> control sequences).
As before, I just dump to file what tex.hashtokens() contains. I can
attach the file if needed.
>> - Why do I have so many "blank" tokens? At the end of this you'll find a
>> sample from a dump file viewed from emacs, so an encoding of special
>> characters is visible.
>
> I don't know what you mean. In the file you attached, there's only
> one blank line, and since you're not telling us how you produced the
> file exactly, it could be anything.
This is how I am dumping the strings:
\directlua{
local f = io.open("hash.txt", "w")
for name in pairs(tex.hashtokens()) do
f:write(name .. "\string\n")
end
f:close()}
Maybe I used a misleading word. I am referring to lines contain
non-printable characters, for example:
===BEGIN===
sffamily
^A
tracingoutput
<=== THIS IS A TAB
^H
^K
macc at palette
^M
^L
^N
@currdir
makesm at sh
pdftrue
?\textless
@@MP:P:curveto
^Y
^[
^Z
luatexUroot
!
<=== THIS IS A SPACE
====END====
Control sequences as ^H seemed to me very strange, not to say that
something is "wrong" in how I create the tokens file. Moreover, it seems
that something is screwing with the encoding à la UTF, for example I find
===BEGIN===
pagecolor
, <=== THIS IS A WEIRD ONE
skipemptyMPgraphictrue
====END====
With an hex editor, I find that the second line is EF BF BF 2C.
It seems to me that TeX is using a very low level encoding, which I find
again weird (or wrong, in the sense that I don't know how to correctly
dump the tokens).
>> -Are tokens as
>> "<5><6><7><8><9>gen*cmbx<10><10.95>cmbx10<12><14.4><17.28><20.74><24.88>cmbx12"
>> real, or is it some kind of "encoding" that I see?
>
> They're not real, of course, they're imaginary tokens from the land of
> Narnia. Ah, the Narnian tokens! Many a night I've spent hunting down
> those that, like sheep, had gone astray, and turned them back to the
> right path.
>
> (Or, they could be related to the NFSS.)
Yes, I imagined it was related to the Narnian way of encoding fonts, but
I don't know how it encodes it (I found a document by Rahtz on TUG, but
I see no mention of "<>").
>> - Are these tokens portable? I mean, a friend of mine on windows will
>> produce the very same tokens with the same document?
>
> Assuming again that you mean control sequences by "tokens", yes,
> they're extremely portable, if they come from the LaTeX kernel. If
> they've been defined by a package, your correspondent needs to have the
> same package, of course, but there usually are very few incompatibility
> between OSes. XeTeX, and to some extent LuaTeX, have changed that
> partly.
That is good news, I like portability :)
>> - All \label{something} seem to be processed as token "r at something". Is
>> this real, or just my imagination?
>
> See above about Narnia. Seriously, I have no idea what you mean. The
> vast majority of control sequences are expandable, meaning that their
> low-level meaning usually is something else than what you type. That is
> of course the whole point of a macro language; there would be little
> advantage to defining commands such as \label, if all "\label{something}"
> did was to print "\label{something}" on the page. It might be that in
> the course of being expanded, \label{something} produces a control
> sequence \r at something; I do not know the details of how it's implemented,
> and have no inclination to look into it.
> I fail to see how it's interesting anyway.
You don't have this interest, it's ok, but I really do! I like to know
how something works! ;)
>> Again, I am sorry to ask such trivial questions, but I didn't find
>> anything anywhere on the internals of TeX!
>
> Read the TeXbook.
I will, I promise. And I am curious about these internals.
Thanks!
More information about the luatex
mailing list