[OS X TeX] Invisible character

Jonathan Kew jonathan_kew at sil.org
Fri Jun 23 10:50:13 CEST 2006


On 23 Jun 2006, at 2:15 am, Ross Moore wrote:

> Hi Jonathan,
>
> I know this thread has gone a long way since this message, but ...

Indeed it has, but I'm going to comment once more, as I think some  
corrections are needed......

>> Another option would be to redefine the bullet so that it  
>> disappears. For example,
>>
>>   \catcode`\•=\active \def•{}
>>
>> will do this, by making the bullet character a macro that expands  
>> to nothing.
>
> Why make it a macro ?
>
> Using  pdfeTeX  the character is ignored naturally,

This is not strictly accurate, I think. pdfetex does not "naturally  
ignore" the character; it will by default treat it as a normal  
character to be printed in the current font. It may *appear* to  
ignore it, if you're using the default Computer Modern fonts, simply  
because these fonts only support character codes 0..127. But if you  
check the .log file, you'll see messages about a "Missing  
character" (unless you've turned off \tracinglostchars); and if you  
change to a font that supports 256 characters rather than 128, even  
without any special input- or font-encoding packages, it'll appear in  
the output.

The OP was originally relying on exactly this type of behavior -- a  
character code that was "invisible" in the output, because it was not  
present in the font -- but something in his setup must have changed  
(choice of font, encoding-related packages, etc), such that the  
character started to show up. Not necessarily as "itself", depending  
on encodings, but as *something* unwanted, at least.

Actually, another issue that we have ignored so far is the encoding  
of the input file. We talk of • in the input, but all TeX cares about  
is the byte value that it sees. I guess this is likely to be either  
0xA5 ("bullet" in MacRoman) or 0x95 (Windows CP1252), though there  
are of course other possibilities, including 0xE1 or 0xB7 ("middle  
dot" in MacRoman and CP1252 respectively). But what all these have in  
common is that the byte codes are >= 128 and therefore are missing  
characters when using CM fonts.

(It could even be <0xE2, 0x80, 0xA2>, "bullet" in UTF-8, but to use  
that with (non-Xe)LaTeX, loading an input-encoding package would be  
pretty much a requirement; messing with \catcode`\• etc will no  
longer work because • is not a single byte code.)


> Another possibility is to set it to:    \catcode`\•= 15  (invalid  
> character).
> Now TeX will stop with a warning:
>
> ! Text line contains an invalid character.
> l.32 •
>       ••   •••
> ?
>
> This is a pretty strong reminder that you've forgotten to do  
> something.

Yes, but the OP specifically wanted to be able to leave placeholders  
in the source and have them disappear from the output.

> Alternatively, you could try:
>      \catcode`\•= 14  (comment character)
> which makes the •  act in the same way as  % .
>
> This now lets you write comments after the  •  to remind yourself  
> of the
> kind of data that needs to be inserted; e.g.,
>
> \catcode`\•= 14
> \begin{tabular}{lcrc}
> • left-aligned text goes here
> &• centered-text goes here
> &• right-aligned text goes here
> &• more centered-text goes here
> \end{tabular}

True; or simply use % as the placeholder, as someone suggested during  
this thread. But this will fail if he ever uses a "compact layout"  
along the lines of

   \begin{tabular}{lcrc}
     • & • & • & • \\
     • & • & • & • \\
     • & • & • & • \\
   \end{tabular}

as a template.

Actually, TeX has a catcode that will cause an input character to be  
ignored (without skipping the rest of the line as a comment): just set

   \catcode`\•=9

and it will be silently dropped. I didn't suggest this mainly because  
I consider it a much more obscure approach than making the code  
\active (catcode 13) and then defining it as desired. The active  
character with an explicit definition also makes it easy to vary the  
behavior according to current needs, with options such as

   \def•{}  % just disappear
   \def•{\ignorespaces}  % cause any following spaces to also be ignored
   \def•{{\bf MISSING!}}  % printed to get proof-reader's attention
   \def•{\errmessage{placeholder}}  % halt with an error message

and to see at a glance which is in use.

> There is another point that needs to be considered here.
>
> If you tried leaving the •  totally unspecified, then beware of  
> what happens when
> you change processing engine.
> For example, XeTeX would not see • as a benign character, to be  
> ignored upon input,
> but would place the • character itself into the output.

It's not a question of processing engine. Standard (pdf)TeX does not  
see any of the "bullet" codes mentioned above as being a "benign  
character, to be ignored"; it sees them as category "other", to be  
printed. They only "vanish" if missing from the current font.

> This suggests that perhaps XeTeX might allow an extra catcode value  
> that
> declares a character to be ignored on input, for compatibility with  
> what can
> be achieved with other engines such as eTeX and TeX itself.

See above: 9.

JK


------------------------- Info --------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
          & FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/




More information about the macostex-archives mailing list