[l2h] Re: Guillemets and OE

Alan J. Flavell Alan J. Flavell" <flavell@a5.ph.gla.ac.uk
Wed, 15 Sep 1999 12:38:54 +0100 (BST)


On Thu, 9 Sep 1999, Ross Moore wrote:

> > > 2. More than one year ago I asked why \OE was not translated as &#140;
> > > and \oe as &#156; . Ross's answer was that these characters wrer not in
> > > HTML standard.

These numerical character references are explicitly _undefined_. (They
are not illegal, but they _are_ undefined - which leads to confusion
because formal HTML validators do not, indeed cannot, reject them).

What _would_ be technically feasible, would be to advertise the
document as being in a Windows encoding, and to include 8-bit
characters (_not_ numerical character references) with these values.
BUT the WWW standards certainly do not mandate that clients must
accept this proprietary character coding.  Indeed, unless and until
Windows-1252 is registered at IANA, it would seem to be technically
improper to reference this coding in a charset attribute.   I'm 
surely not recommending this - just stating that it exists as a
technically well-defined option.

> > That is true: There is &aelig; and &AElig; but no &oelig; ;-(

Which is curious, as the ae and AE characters are technically not
ligatures, in spite of their entity name!  This whole area is a
minefield!!

By the way, the guillemets (like << and >>) are in iso-8859-1, as
someone already remarked: it is the _single_ angle-quotes that are
elsewhere (U+2039 and 203A, decimal 8249 and 8250).

> You could get them using   -html_version 4.0,unicode  
> This works, but again it is up to the browser to give the right characters.

The Russian Apache server documentation shows various effective ways
of on-the-fly mapping of character codings according to what the
client can accept.  However, I don't seriously propose this as part of
latex2html, I'm merely calling attention to it for interest's sake.

> > Nothing you will ever be able to deal with when producing mixed image/text
> > pages...

Not so!  As Ross already knows...

> (Unfortunately, NS4 stuffs up so badly that the whole page is mangled.)

Sadly true.

> Netscape 4.5 certainly recognises  &#338; and  &#339; 
> though it may depend in the stated charset.

The way to pacify Netscape is to compose everything in us-ascii, and
then pretend to Netscape that it's utf-8.  Then it understands not
only the Latin-1 &entity; representations, but also as many of the
&#number; representations as it's going to.  Which indeed includes
your two test cases.  See my notes at
http://ppewww.ph.gla.ac.uk/~flavell/charset/quick.en.html#cons

But of course this only works if you have appropriate fonts.  MS's
free multi-wotsit "web font pack" can be recommended for WinPCs, and
presumably the corresponding one for Macs.  For X/unix I don't myself
have any advice to offer, sorry.

(If it's any use, I have a unicode test repertoire using the &#number;
notations, starting at http://ppewww.ph.gla.ac.uk/~flavell/unicode/ ).

J.Korpela also called attention to a splendid information page at
http://www.eki.ee/letter/

If you don't play the trick with the utf-8 advertisement, then there
are a few &#bignumber; references that Netscape 4 manages to get
right: I _suspect_ they are the characters that are in the
Windows-1252 range 128-159 decimal.  So you might be able to get your
oe/OE that way too, but I haven't tested it, and it wouldn't help
when you wanted Greek letters, mathematical symbols etc.

cheers