[OS X TeX] Re: Some Encoding & Keyboard Questions
bph at gmx.info
Fri Feb 3 17:19:50 CET 2006
Herbert Schulz said the following on 3.2.2006 15:54 Uhr:
> [...] 2)If I have my default file encoding set to UTF-8 how does
> TeXShop know that a certain file is not in UTF-8 when it reads it? If
> I open a MacOSRoman (my actual default - just because) file a dialog
> box comes up saying it isn't UTF-8 and will be read in as MacOSRoman.
> Is there some sort of BOM at the start of a UTF-8 file that
> distinguishes it from other (indistinguishable by TeXShop) formats?
If I may be so frank, I will just quote Richard Koch from a private mail
to me concerning the handling of text encoding in TeXShop:
Richard Koch said the following on 16.1.2006 20:44 Uhr:
> Richard Koch said the following on 16.1.2006 20:44 Uhr:
> I suspect that TeXShop is working correctly. Let me explain how TeXShop
> knows that a file is opened with the wrong encoding.
> Two kinds of encoding are available in TeXShop and other programs. The
> first kind is an encoding with 256 possible characters. Each byte in such
> a file is a legal character, but the encoding determines which unicode
> character corresponds to each byte.
> Thus 0xa2 means one thing is you use MacOSRoman and another thing
> if you use ISO Latin 9, but it is a legal entry in both encodings.
> If you create a file with ISO Latin 9 and save it, and then load it as
> some of the characters will be wrong. But TeXShop won't know that because
> the file is legal in both encodings.
> However UTF-8 is different. It is a file format in which the standard
> 128 ascii
> characters are encoded as usual, but
> the remaining unicode characters are coded in a special way which takes
> 2 or more bytes. Moreover, a random stream of bytes will usually not be
> a legal utf-8 file.
> Here is how TeXShop works: Internally it uses unicode. When it comes
> time to write out the file, the internal representation is converted to
> a string
> using an encoding. (This is necessary even if the encoding is a Unicode
> encoding, because the Unicode standard doesn't specify a particular way
> of writing unicode to disk. So utf-8 is one possible unicode encoding, but
> not the only one.)
> What happens if there is a unicode character in the text which is not
> in the particular encoding chosen? Apple's routines contain a parameter
> indicates whether this should create an error or if instead the
> character should
> just be ignored or converted to something else. I choose "ignore or
> convert to
> something else." So if you type, say, a Euro symbol, but the encoding
> support it, then TeXShop will still write out the file.
> There is somewhat similar code when you read text from disk. Apple's
> require that an encoding be specified, and then the file is converted
> into Apple's
> internal unicode form and displayed in the editor.
> But this time there is another problem. Suppose the encoding is utf-8
> and the file isn't legal urtf-8. Then when Apple's code reads the file,
> it suddenly
> says "wait, this doesn't make sense." In that case, it stops reading and
> an error to TeXShop. TeXShop then puts up the dialog you have reported
> and reads the file again in MacOSRoman. (Every file is a legal MacOSRoman
> Now I think you understand. If you write out a file as ISO Latin 9 and read
> it back as, say, ISO Latin 1, the file will be legal ISO Latin 1, so
> TeXShop doesn't
> report a problem. But some of the characters will be wrong.
------------------------- Info --------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
& FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/
More information about the macostex-archives