[l2h] Confused about Unicode support

Alan J. Flavell Alan J. Flavell" <flavell@a5.ph.gla.ac.uk
Sat, 3 Jul 1999 11:28:51 +0100 (BST)


On Sat, 3 Jul 1999, Ross MOORE wrote:

> This does the following:
>  1. creates the entity name  e.g.  Aogon
>  2. tries to find this in the current $CHARSET and gets the &#<num>;

We're all agreed now that step 2 is technically wrong, even though it
could give an impression of working in older browsers.

> It looks to me as though step 2 is wrong.
> Perhaps the entity should be searched for in just iso-8859-1
> and/or iso-10646 listings ?
> That is an easy-enough change to make.

This is a hard call to make, w.r.to Netscape particularly.  Because it
doesn't understand most non-Latin-1 entity names, there might be some
benefit in representing them as &#number; - BUT when
charset=iso-8859-2 etc. then it isn't going to render the &#bignumber;
entities either, in general.  (It seems to render them OK if they
happen to be in the Windows 8-bit repertoire, but otherwise not).

A reader seeing &entityname; displayed literally, versus an empty box
standing for &#bignumber; - I don't know which is worse :-(

> Another (perhaps better) possibility is to:
> 
>  1.  look first in iso-8859-1 ; if found, use  &#<num>;
>  2.  look in $CHARSET ; use  \<octal-num> if found
> 	unless $CHARSET =~/unicode|utf/;

That looks good to me.

> > still think that those should be Unicode numbers (regardles selected
> > charset),
> > at least then they are displayed correctly.
> 
> Are they ? 
> My tests reveal this, only when  utf-8  is given as the charset.

MSIE does this fine (when properly set up with fonts etc.) and has
been doing so since around version 3.01, as well as several minority
browsers.

Netscape is the chief problem, up till now.  Roll on NS5 ;-)

Opera doesn't try to support unicode, so I think we can leave it out
of this discussion, along with other browsers with that kind of
character code support (they'd have to make do with l2h's older
options, using images).

I don't believe that at this stage, we ought to be considering an
option to l2h that creates wrong HTML in order to give an impression
of working on wrong browsers.  [I don't think anyone actually was
proposing that, I'm just arguing defensively ;-) ]

best regards