[OS X TeX] Input encoding question

Maxwell, Adam R adam.maxwell at pnl.gov
Fri Feb 20 19:49:27 CET 2009


On 02/20/09 01:44, "Jonathan Kew" <jonathan at jfkew.plus.com> wrote:

> On 20 Feb 2009, at 06:06, Richard Koch wrote:
> 
>> With the current default encoding or Latin 1 or most other
>> encodings, files always open and ascii always works great, and the
>> only trouble you'll run into is that a few characters may not be
>> what you expect.
> 
> With a UTF-8 default, "ascii always works great" too. And if a file
> can't be interpreted as valid UTF-8, you can fall back to a default 8-
> bit encoding *and warn the user to check the non-ASCII characters*,
> which is better than blindly opening a file as MacRoman when it might
> equally well be Latin-1 (or vice versa).

FWIW, this sounds like the approach we took with BibDesk:  UTF-8 is now the
default in preferences, which is compatible with the previous default
(ASCII), and a fallback encoding is /never/ applied when double-clicking a
file in Finder.  You get a dire warning if the file's encoding doesn't
appear to match your default.

Adding references to an existing file via drag-and-drop or an external URL
will guess at the encoding, though, since there's no other way to set it.
In that case, we first look for a Unicode BOM; if that fails, then try
UTF-8; if that fails, then give up and use MacRoman.  The first two have
essentially no danger of misinterpretation, but MacRoman is dangerous as a
guess because it's gapless; this makes it a good fallback encoding, but it
also means it will fail silently when your data is in a different encoding
(e.g. Latin 1).

> 
> Of course, if the file comes with (internal or external) metadata that
> tells you its encoding, that's a different matter altogether.

On Leopard, this is in an extended attribute com.apple.TextEncoding, and is
recognized by NSString in Cocoa.  That's incorporated into the guessing
scheme I mentioned above as well.

-- 
Adam




More information about the macostex-archives mailing list