<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#ffffff">
On 06/05/2011 01:30 PM, Reinhard Kotucha wrote:
<blockquote cite="mid:19947.59266.555275.877787@zaphod.ms25.net"
type="cite">
<pre wrap="">On 2011-06-05 at 14:54:34 -0400, Thomas Schneider wrote:
> > > There are three bogus bytes at the very beginning of the file:
> ...
>
> > So it seems notepad in Windows have done some formatting of the file
> > formatting which I didn't notice.
>
> That's yet Another reason to add to the pile to avoid Windows. Under
> Unix with vim you would have seen those characters.
Are you sure? It's only *one* character and I doubt that there is a
font which has a glyph for it.
Regards,
Reinhard
</pre>
</blockquote>
I quote from page 105, of The <u>Unicode 5.0 standard</u>:<br>
<br>
Because the UTF-8 encoding schene already deals in ordered byte
sequences, the UTF-8 encoding scheme is trivial. The
byte ordering is completely defined by the UTF-8 code unit itself. <br>
<br>
<br>
While there is obviously no need for a byte order (= bigendian vs
littleendian ) signature when using UTF-8, there are occasions
when processes convert UTF-16 or UTF-32 data containing a byte order
mark into UTF-8. When represented in UTF-8, the byte order
mark turns into the byte sequence <EF BB BF>. Its usage at the
beginning of a UTF-8 data stream is neither required nor
recommended by the Unicode standard.<br>
<br>
Notepad has, in typical Microsoft behavior,"made it better for you" by
including a totally unnecessary byte sequence that properly designed
software would have left out.<br>
<br>
No, there is certainly NOT a font character corresponding with this
byte sequence.<br>
<br>
The specifications for UTF-8 are absolutely brilliant, and are followed
by all Unix/Linux applications that I have encountered.. Perhaps
someday Microsoft will enter the 21st century too<br>
<br>
Pierre MacKay<br>
<br>
<br>
</body>
</html>