[texhax] Blank first page problem (how to remove?)
Pierre MacKay
pierre.mackay at comcast.net
Tue Jun 7 02:01:07 CEST 2011
>>>> No, there is certainly NOT a font character corresponding with
this byte sequence.
Clarification: the three-byte UTF-8 <EF BB BF> sequence resolves to
U+FEFF, which is treated (and deplored) as a zero-width no-break space
if it simply can't be avoided. But in UTF-8 it can always be avoided.
The sequence 0XFEFF indicates that the source 16- or 32-bit stream was
Bigendian, and the corollary is that 0XFFFE indicates that the stream
was Littleendian. (I would see that as one colossal reason to avoid the
use of 16- or 32-bit streams, even when CJK text might suggest a
specious efficiency.) FFFE is specific identifiable as
"not-a-character" in Unicode 5.0, so that if it appears, it gives you
only the historic information that the stream being processed began as a
littleendian stream, an old horror wished on us by the Intel 80xx series
chips. "eHplI a mrtpaep dnia nI MBP .C" 0XFFFE must be removed
altogether, and it might have been better if 0XFEFF had been subjected
to the same fate.
UTF-8 avoids that problem entirely. Long live UTF-8.
Pierre MacKay
More information about the texhax
mailing list