[texhax] Blank first page problem (how to remove?)
Pierre MacKay
pierre.mackay at comcast.net
Mon Jun 6 05:27:18 CEST 2011
On 06/05/2011 01:30 PM, Reinhard Kotucha wrote:
> On 2011-06-05 at 14:54:34 -0400, Thomas Schneider wrote:
>
> > > > There are three bogus bytes at the very beginning of the file:
> > ...
> >
> > > So it seems notepad in Windows have done some formatting of the file
> > > formatting which I didn't notice.
> >
> > That's yet Another reason to add to the pile to avoid Windows. Under
> > Unix with vim you would have seen those characters.
>
> Are you sure? It's only *one* character and I doubt that there is a
> font which has a glyph for it.
>
> Regards,
> Reinhard
>
>
I quote from page 105, of The _Unicode 5.0 standard_:
Because the UTF-8 encoding schene already deals in ordered byte
sequences, the UTF-8 encoding scheme is trivial. The
byte ordering is completely defined by the UTF-8 code unit itself.
While there is obviously no need for a byte order (= bigendian vs
littleendian ) signature when using UTF-8, there are occasions
when processes convert UTF-16 or UTF-32 data containing a byte order
mark into UTF-8. When represented in UTF-8, the byte order
mark turns into the byte sequence <EF BB BF>. Its usage at the
beginning of a UTF-8 data stream is neither required nor
recommended by the Unicode standard.
Notepad has, in typical Microsoft behavior,"made it better for you" by
including a totally unnecessary byte sequence that properly designed
software would have left out.
No, there is certainly NOT a font character corresponding with this byte
sequence.
The specifications for UTF-8 are absolutely brilliant, and are followed
by all Unix/Linux applications that I have encountered.. Perhaps
someday Microsoft will enter the 21st century too
Pierre MacKay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/texhax/attachments/20110605/f730e61a/attachment.html>
More information about the texhax
mailing list