[l2h] LaTeXCleaner

Hassan Monzavi Hassan Monzavi <hmonzavi@scoter.pharmacy.ualberta.ca>
Wed, 17 Mar 1999 11:00:03 -0700 (MST)


# LaTeX and HTML are both mark up languages, but also different due to their
# interpritation by their browsers and their presentation.  Some ideas in one
# language is critical and not very important in the other.  In addition, the 
# latex2html convertor is in a development stage and still is not perfect.  So I
# think it's not a bad idea to clean our tex files before passing them to
# latex2html.
# The following is a script based on sed editor and helps to do some clean up. 
# I am not an expert in Latex, not very good in sed and just work with
# latex2html for less than one week.  So, please feel free to modify the code a
# share your ideas and experiences to improve this script and make it smarter.
# If you are interested you can save whole thing as say "LaTeXCleaner" and then 
# use it like:
# cat <tex input file> | sed -f LaTeXCleaner > <tex out put file>
# <tex input file> stands for your original tex file and
# <tex out file> stands for the name you choose for cleaned tex file.
#
#                                          Hassan Monzavi, 14 march 1999.
#
# This part will replace all slide environments with section environment, by
# eliminating of the \end{slide}s, \begin{slide}s and \begin{center} after them. 
# It also tries to make a title for the section. Still way to go!
s/\\documentclass.*{seminar}/\\documentclass[10pt]{article}/
/\\begin{slide}/{
N
s/\n/ /
}
/\\begin{slide}/{
N
s/\n/ /
}
s/\\begin{slide} .*/&\}/g
s/\\begin{slide} /\\section\{/g
s/[\\]\{1,\}\}/\}/g
/\\end{center}/{
N
s/\n/ /
}
s/\\end{center} \\end{slide}//g
s/\\end{slide}//g
#
# This part eliminates the font size modifying commands.  Using the same font 
# size doesn't yield the same effect on a dvi or ps browser compare to an HTML 
# browser. I believe it's a good idea to use the same font size for the text and
# may be a slightly different size (say smaller) for caption in Tables and 
# Figures.
s/\\Huge{}//g
s/\\huge{}//g
s/\\LARGE{}//g
s/\\Large{}//g
s/\\large{}//g
s/\\normalsize{}//g
s/\\small{}//g
s/\\footnotesize{}//g
s/\\scriptsize{}//g
s/\\tiny{}//g
s/\\caption{/&\\small{}/
s/\\end{table}/\\normalsize{} &/
s/\\end{figure}/\\normalsize{} &/
#
# This part will eliminates all centerings and create a center environment for
# tables and figures. 
s/\\begin{center}//
s/\\end{center}//
s/\\centering//
s/\\begin{table}\[.*\]/\\begin{table}/g
s/\\begin{figure}\[.*\]/\\begin{figure}/g
s/\\begin{table}/& \\begin{center}/g
s/\\begin{figure}/& \\begin{center}/g
s/\\end{table}/\\end{center} &/g
s/\\end{figure}/\\end{center} &/g
#
# pifonts are not supported by latex2html converter and this part eliminates
# instances of using these fonts.  However, I tried to support for "degree" 
# which probably is the most pisymbol used. 
s/\\usepackage{pifont}//g
s/\\Pisymbol{psy}{176}/\\footnotesize{}\$\^o\$\\normalsize{}/g
s/\\Pisymbol{psy}{[0-9]\{1,\}}//g
#
# This section converts "dinglist" to "itemize".  Using dinglist causes all
# list convert to a picture which the size of fonts in this picture might be
# different from the font is used for other parts of document and is a source
# of inconsistency in HTML document. So, it's a good idea to do this
# convertion.
s/{dinglist}{[0-9]\{1,\}}/{itemize}/g
s/{dinglist}/{itemize}/g
#
# hhline package is not supported by latex2html converter and this part replace
# instances of using this package by \hline.
s/\\usepackage{hhline}//g
s/\\hhline{.*}/\\hline/g
# \cfrac is not supported by latex2html converter and this part replace it with
# \frac.  I agree this part is somewhat useless!
s/\\cfrac{/\\frac[1.3pt]{/g
#
# This part readjust the size of figures in epsfig environment.  This size is
# not bad for a letter or even an A4 page.
/\\epsfig{fi.*\.ps,width=/{
s/width=.*}/width=65mm}/
}
# 
# Oldgerman fonts are not supported by latex2html converter and this part 
# eliminates the stuffs belong to it. 
s/\\usepackage{oldgerm}//
s/\\gothfamily{}//g
s/\\frankfamily{}//g
s/\\swabfamily{}//g
s/\\rmfamily{}//g
#
# This part eliminates the pagebreak commands.
s/\\pagebreak//
# This eliminates blank lines from the tex file.  Many of above commands could
# make such lines.
/^$/d