[texworks] Use of BOM in XeTeX and TeXworks

Manfred Lotz ml_news at posteo.de
Tue Jul 16 05:58:12 CEST 2019


Hi Reinhard,

On Tue, 16 Jul 2019 00:31:16 +0200
Reinhard Kotucha <reinhard.kotucha at web.de> wrote:

> On 2019-07-15 at 21:43:50 +0200, Manfred Lotz wrote:
> 
>  > Nowadays, usage of BOM for UTF-8 is neither required nor
>  > recommended.  
> 
> Hi Manfred,
> UTF-8 BOMs were never required. 

Exactly. I should have said 'was never required'.

>  UTF-8 is a prefix encoding and was
> designed to be able to synchronize with a character stream which has
> no concept of "beginning of file".  Suppose that you paste something
> from another file.  Neither can one expect that this file has a UTF-8
> BOM at all nor can one expect that any program supports it.
> 

BOM for UTF-8 isn't really necessary. When authors upload a package to
CTAN, and we detect UTF-8 with BOM we usually ask them to remove the
BOM, especially for README files. It doesn't happen very often but it
happens from time to time.

>  > Does it mean that when you setup texworks encoding to be UTF-8, and
>  > subsequently load an ISO-8859-1 file texworks doesn't recognize the
>  > proper encoding?
>  >
>  > I know it doesn't help you but vim, for instance, recognizes the
>  > encoding of a loaded file although I have configured the default
>  > encoding as UTF-8  
> 
> This can't work reliably at all.  There is no way to determine which
> 8-bit encoding is being used.  I suppose that vim is using ISO-8859-1
> as a fallback encoding if the file contains invalid UTF-8 characters.
> 
> An 8-bit encoding can only be determined by heuristics or a priori
> knowledge.  There are dependencies between languages and encodings.
> AFAIK Mozilla provided a program which tries to determine an encoding
> using such heuristics.  I don't know how reliable it is, especially if
> files are small.
> 

Of course, vim does some sort of heuristics (I think emacs does it 
similarly), and thus it could fail. But most of the time it works fine.


> Phil, the best solution is to convert all the nasty 8-bit files to
> UTF-8 using iconv.
> 

This is surely the best advice because texworks doesn't try to detect
the encoding (just tested).


-- 
Manfred




More information about the texworks mailing list