Google Summer of Code and TUG

TUG was not involved in SoC in 2009 or 2010 (whether we'll apply in 2011 remains to be seen), but in 2008, three students participated in SoC with TUG. You can see their projects and the code they produced, as well as TUG's announcement for 2008.

All organizational SoC-related discussions for TUG happen on the summer-of-code@tug.org mailing list; feel free to subscribe or peruse the archives.

Project ideas

Project ideas: - Dublin Core metadata and TeX - Hyperlinked syntax highlighting for TeX - New document templates for LaTeX - LaTeX3 microkernel -


1. Dublin Core metadata and TeX

The full text of the proposal is available; a summary follows.

Project summary:

The Dublin Core Metadata Initiative is an open organization engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. They have developed an abstract framework for metadata and several machine-readable representations of metadata statements, among them in the Resource Description Framework (RDF).

One large user of RDF metadata is Adobe, creator of the PDF file format. Adobe's eXtensible Metadata Platform (XMP) allows PDF creators to embed arbitrary metadata into a PDF file. This metadata is visible to Adobe applications and a growing number of other search and archive tools, including Mac OS X's Spotlight. XMP is implemented in an XML representation of RDF.

The key deliverables for this project would be

  1. an implementation of the Dublin Core Abstract Model in TeX
  2. methods to export metadata from the abstract model to external files in various formats, most importantly RDF+XML, maybe also DC- TEXT and N3
  3. in the case of pdflatex, automatic embedding of XMP packets into the product PDF with a default minimum of the XMP expression of the Z39.88 OpenURL COinS fields, both for the document's own metadata and for all references cited and external hyperlinks.
  4. a user-friendly interface for making metadata statements
  5. in the absence of specific author declarations in a pdflatex document, as much metadata should be embedded as can be detected automatically
  6. methods for package authors to declare new metadata element sets and vocabularies, in order for authors to write metadata specific to their field of interest. Personally, I am thinking of Learning Object Metadata; however the mapping of LOM to Dublin Core is problematic.

Project mentors would be Peter Flynn and Matthew Leingang.


2. Hyperlinked syntax highlighting for TeX code

It now common, when listing code on a web page, to provide syntax highlighting. Indeed, this is done on Google code.

This project is to provide syntax highlighting for TeX code – both documents and macros – with an extra feature. Each highlighted command is also hyperlink which offers tooltip help, and which when clicked brings up further documentation.

Two of the leading syntax highlighters are

This project has three parts. The first is providing enhanced syntax highlighting for TeX code. The second is creating a commands database. The third is linking together the first and second parts.

Depending on difficulties encountered this project might be too large. If so, we'd expect the student to do just a part of it.

Project mentor would be Jonathan Fine.


3. New document templates for LaTeX

Lamport provided LaTeX with a number of document templates, book, article, etc. Even today, a large percentage of LaTeX documents use these, resulting in a recognizable "LaTeX-ey look."

Authors also have available classes for specialized use, such as for a journal, for a particular conference, or for a thesis from a particular university. But these classes are quite often often adapted from Lamport's templates, and continue the look. On the other hand, surveying existing classes that do provide a substantially different look, such as koma-script's scr* and the French Mathematical Society's smfart would be a useful part of this project.

This project is to provide alternative templates for broad usage. These may be also suitable for books and articles, or may be suitable for other purposes. Ideally they would come with guidance for potential users as to circumstances suggesting their use (e.g., for books largely without mathematics, or for automatically-generated texts).

Project mentor would be Jim Hefferon.


4. Initial LaTeX3 microkernel

The LaTeX typesetting system has for many years meant LaTeX2e. Recent developments on the successor system, LaTeX3, have focussed mainly on a new low-level programming system for TeX. As this low-level work reaches maturity, applying the new coding ideas to higher level work is becoming possible.

The aim of the LaTeX3 “microkernel” project is to begin to examine how the low-level system can be applied to providing a system which can be used to typeset simple LaTeX2e-like documents without needing to load on top of the current LaTeX2e kernel. As an initial target, the basic document

\documentclass{minimal}
\begin{document}
\emph{Hello World!}
\end{document}
would be used as a test case.

The microkernel described here is not intended to be a complete implementation of the LaTeX2e kernel (latex.ltx). There are a number of as-yet unanswered questions concerning user interface for LaTeX3. The project aims are to build a base stand-alone kernel, which can then be extended slowly to implement more features (most probably from latex.ltx, but possibly taken from LaTeX add-on packages). Additions such as basic sectioning commands and environments (lists, alignment, etc.) would be obvious steps to undertake after successfully producing as system capable of working with the test document. Certain areas can also be ruled out: the New Font Selection Scheme, complex output routines and floating content are all beyond the scope of the project.

Project mentor would be Joseph Wright.



$Date: 2011/01/14 23:09:10 $; TUG home page; join TUG/renew membership; webmaster; facebook; x; mastodon.