[l2h] Re: Guillemets and OE
Alan J. Flavell
Alan J. Flavell" <flavell@a5.ph.gla.ac.uk
Wed, 15 Sep 1999 12:38:54 +0100 (BST)
On Thu, 9 Sep 1999, Ross Moore wrote:
> > > 2. More than one year ago I asked why \OE was not translated as Œ
> > > and \oe as œ . Ross's answer was that these characters wrer not in
> > > HTML standard.
These numerical character references are explicitly _undefined_. (They
are not illegal, but they _are_ undefined - which leads to confusion
because formal HTML validators do not, indeed cannot, reject them).
What _would_ be technically feasible, would be to advertise the
document as being in a Windows encoding, and to include 8-bit
characters (_not_ numerical character references) with these values.
BUT the WWW standards certainly do not mandate that clients must
accept this proprietary character coding. Indeed, unless and until
Windows-1252 is registered at IANA, it would seem to be technically
improper to reference this coding in a charset attribute. I'm
surely not recommending this - just stating that it exists as a
technically well-defined option.
> > That is true: There is æ and Æ but no œ ;-(
Which is curious, as the ae and AE characters are technically not
ligatures, in spite of their entity name! This whole area is a
minefield!!
By the way, the guillemets (like << and >>) are in iso-8859-1, as
someone already remarked: it is the _single_ angle-quotes that are
elsewhere (U+2039 and 203A, decimal 8249 and 8250).
> You could get them using -html_version 4.0,unicode
> This works, but again it is up to the browser to give the right characters.
The Russian Apache server documentation shows various effective ways
of on-the-fly mapping of character codings according to what the
client can accept. However, I don't seriously propose this as part of
latex2html, I'm merely calling attention to it for interest's sake.
> > Nothing you will ever be able to deal with when producing mixed image/text
> > pages...
Not so! As Ross already knows...
> (Unfortunately, NS4 stuffs up so badly that the whole page is mangled.)
Sadly true.
> Netscape 4.5 certainly recognises Œ and œ
> though it may depend in the stated charset.
The way to pacify Netscape is to compose everything in us-ascii, and
then pretend to Netscape that it's utf-8. Then it understands not
only the Latin-1 &entity; representations, but also as many of the
&#number; representations as it's going to. Which indeed includes
your two test cases. See my notes at
http://ppewww.ph.gla.ac.uk/~flavell/charset/quick.en.html#cons
But of course this only works if you have appropriate fonts. MS's
free multi-wotsit "web font pack" can be recommended for WinPCs, and
presumably the corresponding one for Macs. For X/unix I don't myself
have any advice to offer, sorry.
(If it's any use, I have a unicode test repertoire using the &#number;
notations, starting at http://ppewww.ph.gla.ac.uk/~flavell/unicode/ ).
J.Korpela also called attention to a splendid information page at
http://www.eki.ee/letter/
If you don't play the trick with the utf-8 advertisement, then there
are a few &#bignumber; references that Netscape 4 manages to get
right: I _suspect_ they are the characters that are in the
Windows-1252 range 128-159 decimal. So you might be able to get your
oe/OE that way too, but I haven't tested it, and it wouldn't help
when you wanted Greek letters, mathematical symbols etc.
cheers