ltx2texi.el (was Texinfo/Info/HTML version)

Ulrik Vieth TWG-TDS@SHSU.edu
Mon, 29 Jan 1996 10:36:05 +0100


Once again, here's an improved version of ltx2texi.el to convert
the LaTeX version of the TDS draft into Texinfo (and from there
to Info and HTML).  [One intermediate version went only to Karl
since I didn't want to bother you all.]

This version now does about 98% of the conversion automatically
(including the header, nodes and menus).  In the end, it leaves
you in a buffer containing the converted file for further editing
if necessary.  You'll still have to save the file manually, but
that's probably better than always trying to save it, no matter
want may have gone wrong in the conversion process.

Since it is customary for Elisp files to follow a certain template 
for header lines, this file now carries a copyright notice (unlike
tdsguide.cls).  I've inserted `TeX Users Group' as the copyright
holder, just as it says in TDS draft.  I hope that'll be all right.
The `Maintainer:' line now says `TWG-TDS', which is supposed to
mean ``whoever is currently editing the TDS draft and nees to hack
the converter to fix some problem or to achieve a certain effect''.
I don't intend to do any more about it.  I'll leave it to others
now to improve it or develop it further, if needed.  

Cheers, Ulrik.

P.S. Joachim: If you like, you might put it up in tds/misc as well.
Either simply as ltx2texi.el in tds/misc, or else in a subdirectory
ltx2texi-0.5 if you prefer to record it by version number like
you did for tdsguide.cls.  Any earlier versions aren't worth
archiving.  I haven't kept them myself as I was developping it.


;;; ltx2texi.el --- convert LaTeX to Texinfo (and Info and HTML)

;; Copyright (C) 1996 TeX Users Group

;; Author: Ulrik Vieth <vieth@thphy.uni-duesseldorf.de>
;; Maintainer: TWG-TDS
;; Version: 0.5
;; Keywords: TDS, LaTeX, Texinfo, Info, HTML, tex, maint

;;; This file is *not* part of GNU Emacs.

;;; Commentary:

;; Description:
;;
;; This file provides a limited LaTeX-to-Texinfo conversion function
;; `ltx2texi-convert' which is primarily intended to convert the LaTeX
;; source of the TDS draft document `tds.tex' that uses a couple of
;; special markup tags defined in the doucment class `tdsguide.cls'.
;; It is definitely *not* suitable to be used as a general-purpose
;; LaTeX-to-TeXinfo converter and is not intended to be used as such.
;;

;; Usage:
;;
;; 0. load or autoload this file, e.g: (load-library "ltx2texi")
;;
;; 1. go to a directory containing `tds.tex', e.g. in dired mode
;;
;; 2. run M-x ltx2texi-convert -- this does all the conversion and
;;    leaves you in a buffer `tds.texi' containing the converted file
;; 
;; 3. edit the buffer if necessary and save it -- you will be asked
;;    for a file name (which should be `tds.texi', of course)
;; 
;; 4. run M-x makeinfo-buffer or invoke makeinfo and/or texi2html
;;    from the shell -- don't try texi2dvi, it may work all right,
;;    but it's not recommended since it will only cause confusion
;;    with the DVI file created from the original LaTeX source!
;;

;; Bugs:
;;
;; * whitespace (esp. blank lines) before and after environments
;;   will be inconsistent (should be fixed in the LaTeX version!)
;;
;; * whitespace (esp. indentation) in tdsSummary environments
;;   will be inconsistent (should be fixed in the LaTeX version!)
;;
;; * labels are references are assumed to coincide with names
;;   of @nodes -- this is not always true, e.g. for MF vs. \MF{}
;;   
;; * the last comma in the list of contributors should be a period
;;
;; * the list of contributors should be handled in a better way
;;   (currently: LaTeX tabbing -> @quotation -> HTML <blockquote>)
;;
;; * the contents of the top node may get lost in the texi2html
;;   conversion under some circumstances, or else it may end up
;;   in the wrong split file -- this seems to be a texi2html bug
;;   

;; History:
;;
;; v 0.0 -- 1996/01/23  UV  created
;; v 0.1 -- 1996/01/24  UV  first rough version, posted to twg-tds
;; v 0.2 -- 1996/01/24  UV  added some commentary and doc strings
;; v 0.3 -- 1996/01/25  UV  modularized code, handle header and trailer,
;;                          call texinfo routines for @nodes and @menus
;; v 0.4 -- 1996/01/27  UV  slightly touched-up and some doc fixes,
;;                          improved handling of legalnotice header 
;; v 0.5 -- 1996/01/28  UV  more documentation added, code freeze
;;


;;; Code:

(require 'texinfo)	; needed to update @nodes and @menus

;; file name variables

(defvar ltx2texi-source-file "tds.tex"
  "File name of TDS LaTeX source to be converted.")

(defvar ltx2texi-target-file "tds.texi"
  "File name of TDS Texinfo source to be created.")

(defvar ltx2texi-filename "tds.info"
  "File name of Info file to be inserted in Texinfo header.")

;; translation tables

(defvar ltx2texi-logos-alist
  '(("\\TeX{}"       . "TeX")		; no need to use "@TeX{}"
    ("{\\TeX}"       . "TeX")		; when only doing Info
    ("{\\LaTeX}"     . "LaTeX")
    ("{\\LaTeXe}"    . "LaTeX2e")
    ("{\\AmS}"       . "AMS")
    ("{\\AMSTeX}"    . "AMS-TeX")
    ("\\MF{}"        . "METAFONT")
    ("\\MP{}"        . "MetaPost")
    ("\\BibTeX{}"    . "BibTeX")
    ("{\\iniTeX}"    . "INITEX")
    ("{\\iniMF}"     . "INIMF")
    ("{\\iniMP}"     . "INIMP")
    ("{\\PS}"        . "PostScript")
    ("{\\copyright}" . "@copyright{}")
    )
  "List of TeX logos and their replacement text after conversion.")

(defvar ltx2texi-logos-regexp-1 "\\(\\\\[A-Za-z]+{}\\)"
  "Regexp for TeX logos to be converted using `ltx2texi-logos-alist'.")

(defvar ltx2texi-logos-regexp-2 "\\({\\\\[^}]+}\\)"
  "Regexp for TeX logos to be converted using `ltx2texi-logos-alist'.")

;;

(defvar ltx2texi-tags-alist
  '(("\\emphasis"    . "@emph")
    ("\\citetitle"   . "@cite")
    ("\\literal"     . "@file")
    ("\\replaceable" . "@var")
    ("\\command"     . "@code")			; defined, but not used
    ;; ("\\application" . "@r")			; unnecessary, but ...
    ;; ("\\abbr"        . "@sc")		; unnecessary, but ...
    )
  "List of markup tags and their replacement text after conversion.")

(defvar ltx2texi-tags-regexp "\\(\\\\[a-z]+\\)"
  "Regexp for markup tags to be converted using `ltx2texi-tags-alist'.")

;;

(defvar ltx2texi-env-alist
  '(("\\begin{ttdisplay}"           . "@example")
    ("\\end{ttdisplay}"             . "@end example")
    ("\\begin{tdsSummary}"          . "@example")
    ("\\end{tdsSummary}"            . "@end example")
    ("\\begin{enumerate}"           . "@enumerate")
    ("\\end{enumerate}"             . "@end enumerate")
    ("\\begin{enumerate-squeeze}"   . "@enumerate")
    ("\\end{enumerate-squeeze}"     . "@end enumerate")
    ("\\begin{itemize}"             . "@itemize @bullet")
    ("\\end{itemize}"               . "@end itemize")
    ("\\begin{itemize-squeeze}"     . "@itemize @bullet")
    ("\\end{itemize-squeeze}"       . "@end itemize")
    ("\\begin{description}"         . "@table @samp")
    ("\\end{description}"           . "@end table")
    ("\\begin{description-squeeze}" . "@table @samp")
    ("\\end{description-squeeze}"   . "@end table")
    ("\\begin{legalnotice}"         . "@titlepage")	; special hack
    ("\\end{legalnotice}"           . "@end titlepage")
    ("\\begin{tabbing}"             . "@quotation")	; special hack
    ("\\end{tabbing}"               . "@end quotation")
    )
  "List of environments and their replacement text after conversion.")

(defvar ltx2texi-env-regexp "\\(\\\\\\(begin\\|end\\){[^}]+}\\)"
  "Regexp for environments to be converted using `ltx2texi-env-alist'.")


;;; some utility functions

(defun ltx2texi-string-replace (x-string x-replace)
  "Searches for occurences of X-STRING, replacing them by X-REPLACE."
  (save-excursion
    (while (search-forward x-string nil t)
      (replace-match x-replace t t)))) 		; use fixed case!

(defun ltx2texi-regexp-replace (x-regexp x-replace)
  "Searches for occurences of X-REGEXP, replacing them by X-REPLACE."
  (save-excursion
    (while (re-search-forward x-regexp nil t)
      (replace-match x-replace t nil)))) 	; use fixed case!


(defun ltx2texi-alist-replace (x-regexp x-alist)
  "Searches for ocurrences of X-REGEXP, replacing them using X-ALIST.
If no match is found in X-ALIST, leaves the original text unchanged."
  (save-excursion
    (let (x-match 
	  x-replace)
      (while (re-search-forward x-regexp nil t)
	(setq x-match (match-string 1))
	(setq x-replace (or (cdr (assoc x-match x-alist)) x-match))
	(replace-match x-replace t t)))))	; use fixed case!


;;; the main conversion function

(defun ltx2texi-convert ()
  "Have a try at converting LaTeX to TeXinfo.  Good luck!"
  (interactive)

  ;; get a buffer to operate on and insert the LaTeX source
  (set-buffer (get-buffer-create ltx2texi-target-file))
  (erase-buffer)
  (insert-file-contents-literally ltx2texi-source-file)
  
  ;; tab characters can mess up tds-summary envrionments,
  ;; so get rid of them as soon as possible
  (untabify (point-min) (point-max))
  (goto-char (point-min))
  
  ;; do the conversion steps for the text body
  (ltx2texi-do-simple-tags)	; mostly general
  (ltx2texi-do-fancy-logos)	; mostly specific to TDS
  (ltx2texi-do-sectioning)	; mostly general
  (ltx2texi-do-markup-tags)	; mostly specific to TDS
  (ltx2texi-do-environments)	; partly specific to TDS
  
  ;; do the conversion for header and trailer
  (ltx2texi-do-header)		; partly specific to TDS
  (ltx2texi-do-trailer)		; partly specific to TDS
  
  ;; standard Texinfo functions
  (texinfo-every-node-update)
  (texinfo-all-menus-update)
  (texinfo-master-menu nil)
  
  ;; all that's left to do is saving the buffer to a file
  ;; -- we simply select the buffer and leave saving it to
  ;; the user in case some manual intervention is needed
  (switch-to-buffer ltx2texi-target-file)
  )


;;; various steps of the conversion process

(defun ltx2texi-do-simple-tags ()
  "First step of \\[ltx2texi-convert].  Not useable by itself."

  ;; literal `@' -- should come before anything else, since it's
  ;; the Texinfo control character.
  (ltx2texi-regexp-replace "\\([^\\\\]\\)@" "\\1@@")
  
  ;; fancy spacing -- should come early before we have many `@'
  
  ;; "\@" -- space factor corrections before sentence end `.'
  (ltx2texi-regexp-replace "\\\\@\\." "@.")
  ;; "\ " -- control space after `.' in the middle of sentences
  (ltx2texi-regexp-replace "\\.\\\\\\([ \n]+\\)" ".@:\\1")
  ;; "\ " -- control space used otherwise
  (ltx2texi-regexp-replace "\\\\\\([ \n]+\\)" "\\1")

  ;; "\," -- thin space used with dimensions like "dpi" or "pt"
  (ltx2texi-regexp-replace "\\\\,\\([a-z]+\\)" "@dmn{\\1}")

  ;; special TeX characters that needn't be quoted in Texinfo:
  (ltx2texi-string-replace "\\_" "_")
  (ltx2texi-string-replace "\\&" "&")
  (ltx2texi-string-replace "\\%" "%")

  ;; special TeX characters that we prefer to transliterate:
  (ltx2texi-regexp-replace "\\\\slash[ ]*" "/")

  ;; we could translate $...$ into @math{...}, but why bother
  ;; when we can transliterate it easily?
  (ltx2texi-string-replace "$" "")
  (ltx2texi-string-replace "\\pm" "+-")

  ;; we could use @w{word1 word2} (this is handled elsewhere),
  ;; but why bother when it'll get lost in texi2html anyway?
  (ltx2texi-string-replace "~" " ")
  )


(defun ltx2texi-do-fancy-logos ()
  "Second step of \\[ltx2texi-convert].  Not useable by itself."

  ;; fancy TeX logos -- these are used in arguments of sections,
  ;; so we have to do them early before doing sectioning commands.
  (ltx2texi-alist-replace ltx2texi-logos-regexp-1 ltx2texi-logos-alist)
  (ltx2texi-alist-replace ltx2texi-logos-regexp-2 ltx2texi-logos-alist)
  
  ;; acronyms -- these are also used in arguments of sections.
  ;; There's no need to use @sc markup, just upcase the argument.
  (save-excursion
    (while (re-search-forward "\\\\abbr{\\([^}]+\\)}" nil t)
      (replace-match (upcase (match-string 1)) nil t)))
  
  ;; applications -- similar to acronyms, so done here as well.
  ;; There's no need to use @r markup either, it's the default!
  (ltx2texi-regexp-replace "\\\\application{\\([^}]+\\)}" "\\1")
  )


(defun ltx2texi-do-sectioning ()
  "Third step of \\[ltx2texi-convert].  Not useable by itself."

  ;; first do @chapter and @appendix by narrowing
  (save-excursion
    (save-restriction
      (narrow-to-region 
       (point-min) (search-forward "\\appendix" nil t))
      (goto-char (point-min))
      (ltx2texi-regexp-replace 
       "\\\\section{\\([^}]+\\)}[ ]*" "@node \\1\n@chapter \\1\n")
      ))
  (save-excursion
    (save-restriction
      (narrow-to-region 
       (search-forward "\\appendix" nil t) (point-max))
      (ltx2texi-regexp-replace 
       "\\\\section{\\([^}]+\\)}[ ]*" "@node \\1\n@appendix \\1\n")
      ))
  
  ;; @section and @subsection are just shifted a level up
  (ltx2texi-regexp-replace 
   "\\\\subsection{\\([^}]+\\)}[ ]*"    "@node \\1\n@section \\1\n")
  (ltx2texi-regexp-replace 
   "\\\\subsubsection{\\([^}]+\\)}[ ]*" "@node \\1\n@subsection \\1\n")
  
  ;; now we no longer need \apendix as a marker
  (ltx2texi-regexp-replace "\\\\appendix[ ]*\n" "")

  ;; \newpage can go as well
  (ltx2texi-regexp-replace "%?\\\\newpage[ ]*\n" "")
  
  ;; \labels are redundant since we have @nodes
  (ltx2texi-regexp-replace "\\\\label{sec:\\([^}]+\\)}[ ]*\n" "")

  ;; \refs now refer to @nodes instead of \labels
  (ltx2texi-regexp-replace "\\\\ref{sec:\\([^}]+\\)}" "@ref{\\1}")

  ;; this might be redundant in Info as well -- not quite sure! 
  ;; (ltx2texi-regexp-replace "\\(Appendix\\|Section\\)[~ ]" "")
  )


(defun ltx2texi-do-markup-tags ()
  "Fourth step of \\[ltx2texi-convert].  Not usable by itself."

  ;; special tags -- for \CTAN, we must used fixed-case replace!
  (ltx2texi-string-replace "\\texmf{}" "@file{texmf}")
  (ltx2texi-string-replace "\\CTAN:"   "@file{@var{CTAN}:}")
  
  ;; convert simple tags without expanding their arguments:
  ;; \emphasis, \citetitle, \literal, \replaceable
  (ltx2texi-alist-replace ltx2texi-tags-regexp ltx2texi-tags-alist)
  
  ;; \\systemitem -- a silly tag with an extra argument that
  ;; isn't printed.  It is used exactly once!
  (ltx2texi-regexp-replace
   "\\\\systemitem{\\([^}]+\\)}{\\([^}]+\\)}" "@file{\\2}")
  
  ;; \path -- here we can't avoid shuffling the argument
  (ltx2texi-regexp-replace "\\\\path|\\([^|]+\\)|" "@file{\\1}") 
  
  ;; After turning \replaceable into @var above we now have to
  ;; turn @var{...} into @file{@var{...}} to get quotation marks
  ;; around file names consistent.  (Read: those extra quotation
  ;; marks inserted automatically by makeinfo in the @file tag.)

  ;; For simplicity we first do the change everywhere and then
  ;; undo it again inside `ttdisplay' environments, where we
  ;; can leave @var by itself as @file isn't used there anyway.

  (ltx2texi-regexp-replace "@var{\\([^}]+\\)}" "@file{@var{\\1}}")
  (save-excursion
    (while (search-forward "\\begin{ttdisplay}" nil t)			     		   
      (save-restriction
	(narrow-to-region
	 (point) (search-forward "\\end{ttdisplay}" nil t))
	(goto-char (point-min))
	(ltx2texi-regexp-replace "@file{@var{\\([^}]+\\)}}" "@var{\\1}")
	)))
  
  ;; eliminate redundant quotation marks around @file
  (ltx2texi-regexp-replace "``\\(@file{[^}]+}\\)''" "\\1")
  (ltx2texi-regexp-replace "`\\(@file{[^}]+}\\)'" "\\1")
  
  ;; ... and combine multiple @file{}s in one line
  (ltx2texi-regexp-replace 
   "@file{\\(.*\\)}@file{\\(.*\\)}@file{\\(.*\\)}" "@file{\\1\\2\\3}")
  (ltx2texi-regexp-replace 
   "@file{\\(.*\\)}@file{\\(.*\\)}" "@file{\\1\\2}")

  ;; ... also simplify cases of nested @file{}s
  (ltx2texi-regexp-replace 
   "@file{\\(.*\\)@file{\\(.*\\)}\\(.*\\)}" "@file{\\1\\2\\3}")

  ;; literal `~' -- if it hasn't been converted to space earlier,
  ;; we can now do the conversion to @w{word1 word2} without
  ;; running the risk of confusion the regexp matcher somewhere.
  ;; Unfortunately @w will get lost again in the HTML conversion
  ;; because &nbsp; or &#160; are not yet standard HTML tags.
  (ltx2texi-regexp-replace 
   "\\([A-Za-z]+\\)~\\([A-Za-z]+\\)" "@w{\\1 \\2}")
  )


(defun ltx2texi-do-environments ()
  "Fifth step of \\[ltx2texi-convert].  Not useable by itself."
  
  ;; convert \begin and \end of environments
  (ltx2texi-alist-replace ltx2texi-env-regexp ltx2texi-env-alist)

  ;; convert \items
  (ltx2texi-string-replace "\\item"  "@item")

  ;; insert newlines after description items where appropriate
  (ltx2texi-regexp-replace
   "@item\\[\\([^]]+\\)\\][ ]*\n" "@item \\1\n")
  (ltx2texi-regexp-replace
   "@item\\[\\([^]]+\\)\\][ ]*"   "@item \\1\n")

  ;; insert newlines after @item @file, also replace @item @file
  ;; by @item @samp -- this is done because it may look nicer
  ;; in the HTML version, but it depends on the style sheet used
  (ltx2texi-regexp-replace 
   "@item @file\\([^,\n]*\\),[ ]*" "@item samp\\1,\n")
  )


;;; handling the header and trailer

(defun ltx2texi-do-header ()
  "Convert LaTeX header to Texinfo.  Used in \\[ltx2texi-convert]."
  (let (title-string
	author-string
	version-string)

    ;; collect information
    (save-excursion
      (re-search-forward "\\\\tdsVersion{\\(.*\\)}" nil t)
      (setq version-string (match-string 1))
      (re-search-forward "\\\\title{\\(.*\\)}" nil t)
      (setq title-string (match-string 1))
      (re-search-forward "\\\\author{\\(.*\\)}" nil t)
      (setq author-string (match-string 1))
      )

    ;; discard information lines
    (ltx2texi-regexp-replace "\\\\title{.*}[ ]*\n"         "")
    (ltx2texi-regexp-replace "\\\\author{.*}[ ]*\n"        "")
    (ltx2texi-regexp-replace "\\\\tdsVersion{.*}[ ]*\n\n"  "")
    
    ;; discard pre-title lines
    (ltx2texi-regexp-replace "%&latex[ ]*\n"               "")
    (ltx2texi-regexp-replace "\\\\NeedsTeXFormat.*\n"      "")
    
    ;; convert \documentclass to \input texinfo
    (ltx2texi-regexp-replace "\\\\documentclass.*\n\n"  "\\\\input texinfo\n")

    ;; insert Texinfo header lines
    (save-excursion
      (goto-char (search-forward "texinfo\n" nil t))
      (insert "@setfilename " ltx2texi-filename "\n")
      (insert "@settitle " title-string "\n\n")
      (insert "@set version " version-string "\n\n")
      )
 
    ;; discard \begin{document} and \maketitle
    (ltx2texi-regexp-replace "\\\\begin{document}[ ]*\n\n" "")
    (ltx2texi-regexp-replace "\\\\maketitle[ ]*\n\n"       "")

    ;; copy contents of `legalnotice' environments
    (save-excursion
      (let ((begin (search-forward "@titlepage\n\n" nil t))
	    (end (progn
		   (search-forward "@end titlepage" nil t)
		   (match-beginning 0))))
	(copy-region-as-kill begin end)
	))

    ;; insert stuff for title page before `legalnotice' environment
    ;; -- \begin{legalnotice} has been converted to @titlepage
    (save-excursion
      (goto-char (search-forward "@titlepage\n" nil t))
      (insert "@title " title-string "\n")
      (insert "@subtitle Version @value{version}\n")
      (insert "@author " author-string "\n\n")
      (insert "@page\n@vskip 0pt plus 1filll\n")
      )

    ;; insert stuff for @node Top and master menu after `legalnotice'
    ;; -- \end{legalnotice} has been converted to @end titlepage
    (save-excursion
      (goto-char (search-forward "@end titlepage\n\n" nil t))
      (insert "@ifinfo\n")
      (insert "@node Top\n@top " title-string "\n\n")
      
      ;; insert contents of `legalnotice' copied above
      (yank)	; should include a "\n\n" at the end

      (insert "@menu\n@end menu\n")
      (insert "@end ifinfo\n\n")
      )

    ;; discard \tableofcontents -- @contents belongs in the trailer
    (ltx2texi-regexp-replace "\\\\tableofcontents[ ]*\n\n" "")
    ))

(defun ltx2texi-do-trailer ()
  "Convert LaTeX trailer to Texinfo.  Used in \\[ltx2texi-convert]."
  
  ;; The list of contributors is a LaTeX tabbing environment, which 
  ;; is difficult to convert -- we convert it to a normal paragraph
  ;; inside a @quotation environment, so it gets indented a little.

  ;; Conversion of the tabbing environment is already done elsewhere,
  ;; so we just have to remove some redundant tags.

  ;; discard alignment preamble consisting of lines starting with
  ;; \hspace{0.25\linewidth}
  (ltx2texi-regexp-replace "\\\\hspace.*\n" "")

  ;; convert "\>" and "\\" in tabbing environment to comma
  (ltx2texi-regexp-replace "[ ]+\\\\>[ ]*" ", ")

  ;; In the earlier conversion of "\ ", "\\" followed by newline
  ;; was already converted to "\", so we just have to convert
  ;; the remaining instances of "\" followed by space or newline.
  (ltx2texi-regexp-replace "[ ]+\\\\[ ]*" ", ")

  ;; convert \end{document} to @contents and @bye
  (ltx2texi-string-replace "\\end{document}" "@contents\n@bye\n")
  )

;;; ltx2texi.el ends here