[OS X TeX] Input encoding question
Peter Dyballa
Peter_Dyballa at Web.DE
Fri Feb 20 14:29:12 CET 2009
Am 20.02.2009 um 10:44 schrieb Jonathan Kew:
> On 20 Feb 2009, at 07:49, Peter Dyballa wrote:
>
>> Just because *some* software can handle it, it's not reason
>> enough. Files grow big because some (statistically: quite all)
>> characters are represented by more than one byte,
>
> No, the vast majority of characters in real-world TeX files would
> be represented by 1 byte in UTF-8, because they are ASCII
> characters -- either English content or markup.
And therefore UTF-8 is generally not the best recommendation: 7 or 8
bit encodings are fine enough for this.
>
>> And LaTeX and ConTeXt are mostly 8 bit applications with a 7 bit
>> core.
>
> The "core" is 8-bit since TeX 3.0; I don't think we need be
> concerned about the old 7-bit version.
Well, math, that most obviously is Unicode, and math fonts are pure 7
bit. And then there is LICR, the LaTeX Internal Character
Representation. An 8-bit input like ï, independent of its position in
any encoding, becomes \"\i. 7 bit (internal four bytes in memory),
nothing more. TeX 3 has learned to operate on 8-bit fonts, which are
not real but "virtual." And TeX 3 has learned to apply a text input
or math input encoding on a TeX file's contents to transform it into
LICR, TeX process-able objects. (Something similiar to this could
already be done in pre-TeX 3, on a – on many different – local basis.)
Using the old 7-bit TeX codes like \"\i has the great advantage that
it is meaning the same in many, many input encodings. Including UTF-x.
--
Greetings
Pete
Never be led astray onto the path of virtue
More information about the macostex-archives
mailing list