# [tex-live] fwd: \delta in overfull \hboxes screws up xterm

David Kastrup dak at gnu.org
Fri Jul 16 10:44:56 CEST 2004

Vladimir Volovich <vvv at vsu.ru> writes:

> David Kastrup wrote:
>
> > Vladimir Volovich <vvv at vsu.ru> writes:
> > > "DK" == David Kastrup writes:
> > >
> > >  DK> Sorry, but when TeX is running inside of a TeX shell
> > >  DK> (writing into a pipe or pseudo terminal) there are valid
> > >  DK> reasons for wanting 8bit throughput.  So while we don't
> > >  DK> want a _default_ print-through setting probably, we at
> > >  DK> least want an option to have the terminal output go
> > >  DK> through.
> > >
> > > BTW, is it that hard to automatically translate ^^-notation to
> > > 8-bit notation inside the AUC-TeX?
> >
> > The problem is that it is wrong.  If I have something like
> >
> >   You can use \verb+^^I+ to produce a TAB character.
> >
> > then this translation should not happen.  If I do this sort of
> > translation from the log file, then I don't get a string from the
> > error message context suitable for searching in the source text,
> > since it is completely unknown whether ^^I or a TAB character was
> > actually typed.  I would then have to convert each ^^ occurence
> > into a regular expression like $$?:\^\^I\|$$.  And then we have
> > different ^^ conventions to cater for in connection with Omega and
> > stuff.
>
>
> > > The user *did not* input the character corresponding to ^^J, but
> > > the 6 characters comprising the string "\delta". The fact that
> > > someone \mathcode-ed the character ^^J to mean the same thing as
> > > the \mathchardef named \delta should not have an effect on the
> > > printing of the control sequence "\delta".
> >
> > There is no such control sequence.  This is the output from an
> > overfull hbox message.  It contains only nameless characters.
> > Whether they were produced by \mathcode, \char, \delta, whatever is
> > not known anymore.
>
> You argue that it's not possible to "go back" to \delta once it
> reached TeX's stomach (which is quite true); I see no major
> difference between this "loss of connection to original input" and
> the same loss of connection to original input which occurs in
> e.g. \verb+^^I+: once TeX have read the input, and converted it to
> tokens, the connection to the original input is lost.

Well, and your point is?  The error context in TeX's error messages
is straight from the input buffer, _before_ conversion to tokens.
And it is the veracity of those error messages that I am talking
about all the time.  Tokenization has absolutely nothing to with it.
Neither has internal list presentation.

The problem is that TeX is using the same output routines for
rendering error message contexts "readable" as it does for rendering
material passed through or produced by TeX readable.

> You also wrote:
>
> > It has been an uphill battle all the way to butcher TeX
> > implementations and locales into putting out ü as ü on the
> > terminal.
>
> If one uses inputenc-like approach, then the document may have
> contained the character ü in some input encoding (it may even have
> been represented by several bytes in case of utf-8 input encoding),
> and once this character reaches the TeX's stomach, the ü ends up as a
> completely different thing compared to what it may have been in the
> original document.

Error message contexts don't pass TeX's stomach.  They don't even
pass its tokenizer.

> And TeX may output to terminal in two "fundamentally" different
> situations:
>
> * unprocessed text (in the original input encoding) as it appears in
>   the document, when TeX shows error context lines
>
> * text from stomach e.g. when printing "overfill hbox" messages
>
> There is nothing which can guarantee that the encodings of the
> characters in input encoding (in error context lines messages) match
> encodings of characters in TeX's internal representation.

Who said it was?

> So, even if TeX will output ü in 8-bit form to the terminal, it will
> ABSOLUTELY not help AUC-TeX, because this character absolutely may
> have no relation to letter ü.

I am talking about error message contexts.  The important thing is
that the error context corresponds with the source, regardless of what
input encoding may be active.  The input encoding is not even
involved at the level I am talking about.  I am not sure, though, how
the effects of using a latin2 tcx file or similar would be to those
error contexts, and it is not the recommended LaTeX way, anyhow.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum