[texworks] Use of BOM in XeTeX and TeXworks
Reinhard Kotucha
reinhard.kotucha at web.de
Tue Jul 16 00:31:16 CEST 2019
On 2019-07-15 at 21:43:50 +0200, Manfred Lotz wrote:
> Nowadays, usage of BOM for UTF-8 is neither required nor recommended.
Hi Manfred,
UTF-8 BOMs were never required. UTF-8 is a prefix encoding and was
designed to be able to synchronize with a character stream which has
no concept of "beginning of file". Suppose that you paste something
from another file. Neither can one expect that this file has a UTF-8
BOM at all nor can one expect that any program supports it.
> Does it mean that when you setup texworks encoding to be UTF-8, and
> subsequently load an ISO-8859-1 file texworks doesn't recognize the
> proper encoding?
>
> I know it doesn't help you but vim, for instance, recognizes the
> encoding of a loaded file although I have configured the default
> encoding as UTF-8
This can't work reliably at all. There is no way to determine which
8-bit encoding is being used. I suppose that vim is using ISO-8859-1
as a fallback encoding if the file contains invalid UTF-8 characters.
An 8-bit encoding can only be determined by heuristics or a priori
knowledge. There are dependencies between languages and encodings.
AFAIK Mozilla provided a program which tries to determine an encoding
using such heuristics. I don't know how reliable it is, especially if
files are small.
Phil, the best solution is to convert all the nasty 8-bit files to
UTF-8 using iconv.
Some years ago I compiled iconv for Windows:
http://ms25.ddns.net/w32/iconv/iconv.zip
The ZIP file has to be extracted in a directory which is in PATH.
Usage:
iconv --from-code=ISO-8859-1 --to-code=UTF-8 [--output=<outfile>] <infile>
If you omit --output, the input file will be replaced and creating a
backup before is quite useful.
iconv supports zillions of encodings, try iconv --list.
Regards,
Reinhard
--
------------------------------------------------------------------
Reinhard Kotucha Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover mailto:reinhard.kotucha at web.de
------------------------------------------------------------------
More information about the texworks
mailing list