[l2h] Debian Bug#355728: latex2html: please use typographic quotation marks

David Nebauer davidnebauer at switch.com.au
Sat Jul 29 15:47:53 CEST 2006


I've been testing the advice given by Ross Moore regarding curly 
quotation marks (2006/04/18) and "passing though" raw unicode (2003/07/06):

-------------------------------------------------------------------------
Curly quotation marks (2006/04/18):
===================================

$USE_CURLY_QUOTES = 1;
set this in an initialization file.
Also set the following
$USE_UTF=1;

OR

execute the job with options such as:
latex2html -split 0 -html_version 4.0,latin1,unicode,utf8 myfile.tex

4.0     = satisfy HTML 4.0 recommendations (4.1 might work for HTML 4.01)
latin1  = input encoding
unicode = use unicode code-points in the output
utf8    = use byte-sequences, rather than entity numbers (or names)
          whenever appropriate.

Raw unicode (2003/07/06):
=========================

You may need to specify on the commandline something like:

  latex2html -html_version 4.0,unicode  ...other-options...  <filename>
or
  latex2html -html_version 4.0,unicode,utf8  ......
or even
  latex2html -html_version 4.0,unicode,unicode  ......

Basically, the problem will be that you do *not* want LaTeX2HTML
to assign special meaning to upper-8-bit codes and translate them
into something else.
-------------------------------------------------------------------------


In my testing I had three goals:
  1. Output single quote marks as curly characters,
  2. Output double quote marks as curly characters, and
  3. Output raw unicode as unicode, e.g., —äß (em dash, a umlaut and 
scharfe s).

Here are the results of my testing (display in monospace to align columns):


   initialisation file         html-version options     single  double  raw
   variable(s)                                          quotes  quotes  
unicode
   -------------------------   ----------------------   ------  ------  
-------
1.                                                      `'      ``''    
rubbish .1
2. USE_CURLY_QUOTES                                     `'      “”      
rubbish .2
3. USE_CURLY_QUOTES  USE_UTF                                 ** ERROR 
**        .3
4. USE_CURLY_QUOTES            latin1,unicode,utf8      `'      “”      
rubbish .4
5. USE_CURLY_QUOTES            latin1,unicode,unicode   `'      “”      
unicode .5
6. USE_CURLY_QUOTES  USE_UTF   latin1,unicode,utf8      `'      “”      
rubbish .6
7. USE_CURLY_QUOTES  USE_UTF   latin1,unicode,unicode        ** ERROR 
**        .7
8.                   USE_UTF                                 ** ERROR 
**        .8
9.                   USE_UTF   latin1,unicode,utf8      `'      ``''    
rubbish .9
A.                   USE_UTF   latin1,unicode,unicode        ** ERROR 
**        .A
B.                             latin1,unicode,utf8      `'      ``''    
rubbish .B
C.                             latin1,unicode,unicode   `'      ``''    
unicode .C

* Runs that errored terminated prematurely with the message: "Undefined 
subroutine &main::convert_to_utf8 called at /usr/bin/latex2html line 
7462."  The latex2html version is '2002-2-1 (1.71)'.

In case this email's encoding gets screwed up in transmission, the runs 
that resulted in curly double quotes were 2, 4, 5 and 6.

Some observations/conclusions:
 - No method gave curly single quotes.
 - The only method that output curly double quotes was the init file 
variable "USE_CURLY_QUOTES".
 - The only method that output raw unicode was html_version options 
"unicode,unicode".
 - The init file variable USE_UTF caused a fatal error unless 'utf8' was 
included
   as a 'html_version' option.

I'm curious to know two things.  Firstly, is there is a way to get curly 
single quote output from latex2html?  Secondly, I couldn't find 
documentation anywhere on USE_CURLY_QUOTES and USE_RTF after checking 
the manual, perldoc, man and info files.  Are there any other such 
undocumented variables and, if so, where can I read up on them?

Regards,
David.



More information about the latex2html mailing list