[texworks] Use of BOM in XeTeX and TeXworks
Manfred Lotz
ml_news at posteo.de
Tue Jul 16 05:58:12 CEST 2019
Hi Reinhard,
On Tue, 16 Jul 2019 00:31:16 +0200
Reinhard Kotucha <reinhard.kotucha at web.de> wrote:
> On 2019-07-15 at 21:43:50 +0200, Manfred Lotz wrote:
>
> > Nowadays, usage of BOM for UTF-8 is neither required nor
> > recommended.
>
> Hi Manfred,
> UTF-8 BOMs were never required.
Exactly. I should have said 'was never required'.
> UTF-8 is a prefix encoding and was
> designed to be able to synchronize with a character stream which has
> no concept of "beginning of file". Suppose that you paste something
> from another file. Neither can one expect that this file has a UTF-8
> BOM at all nor can one expect that any program supports it.
>
BOM for UTF-8 isn't really necessary. When authors upload a package to
CTAN, and we detect UTF-8 with BOM we usually ask them to remove the
BOM, especially for README files. It doesn't happen very often but it
happens from time to time.
> > Does it mean that when you setup texworks encoding to be UTF-8, and
> > subsequently load an ISO-8859-1 file texworks doesn't recognize the
> > proper encoding?
> >
> > I know it doesn't help you but vim, for instance, recognizes the
> > encoding of a loaded file although I have configured the default
> > encoding as UTF-8
>
> This can't work reliably at all. There is no way to determine which
> 8-bit encoding is being used. I suppose that vim is using ISO-8859-1
> as a fallback encoding if the file contains invalid UTF-8 characters.
>
> An 8-bit encoding can only be determined by heuristics or a priori
> knowledge. There are dependencies between languages and encodings.
> AFAIK Mozilla provided a program which tries to determine an encoding
> using such heuristics. I don't know how reliable it is, especially if
> files are small.
>
Of course, vim does some sort of heuristics (I think emacs does it
similarly), and thus it could fail. But most of the time it works fine.
> Phil, the best solution is to convert all the nasty 8-bit files to
> UTF-8 using iconv.
>
This is surely the best advice because texworks doesn't try to detect
the encoding (just tested).
--
Manfred
More information about the texworks
mailing list