This year TUG is participating in Google's Summer of Code program as a “mentoring organization”.
All organizational SoC-related discussions for TUG happen on the summer-of-code@tug.org, mailing list; feel free to subscribe.
This page has all the SoC-related information for TUG: our project ideas, a suggested project application template, information on submitting more ideas for the list, and a few general links.
The first three projects listed here were funded to go forward. The final proposals are listed on TUG's summary page at Google.
The most recent variants of TeX (XeTeX, LuaTeX) allow direct UTF-8 input and are therefore thought of as Unicode-enabled’. Nevertheless, they lack many capabilities that Unicode requires: for example, none of the various TeX extensions (also known as ‘engines’) handle combining characters correctly: in some cases, sequences including combining characters will render correctly because the current font supports them, but no provision is made for general handling of Unicode combining characters. Nevertheless, TeX has always had an outstanding tradition of handling complicated diacritic marks (using the \accent primitive) which predates Unicode by almost 15 years; but there has been almost no attempt to put this into relation with Unicode. An interesting addition to TeX could attempt to make that relation explicit while implementing the processes specified by Unicode (in particular normalization, which defines canonical transformations between fully composed and fully decomposed character sequences).
Another example is bidirectional typesetting: although experiments for mixing right-to-left with left-to-right text in TeX were made as early as 1987 (thus again predating Unicode), little effort has been undertaken to make TeX compliant with the Unicode bidi algorithm. Another example is line breaking properties: TeX's hyphenation algorithm is universally thought as very good, but does not take into account Unicode line breaking properties.
Implementing these properties could be done in a number of ways, using either or both of the newest TeX engines (XeTeX or LuaTeX, as already mentioned) and would give rise to a much more fully Unicode-compliant extension of TeX.
Proposed by the potential student; mentor would be Eric Muller.
texshow (sources, ConTeXt command list) is a web-based tool for displaying ConTeXt documentation. It is in need of significant work:
Proposed by the potential student; mentors would be Hans Hagen and Taco Hoekwater.
MathTran is a public web service, that translates TeX-notation mathematics into high-quality bitmap images, primarily for inclusion on web pages. This development of MathTran was funded by JISC and The Open University. At present MathTran serves about 30,000 images a day. MathTran is similar in many ways to Google Charts.
There is already some JavaScript code that helps us use MathTran on web pages, but there is not nearly enough, and what we have is somewhat scrappy and dependent on the browser: see http://www.mathtran.org/js/mathtran_img.js and http://www.mathtran.org/js/mathtran.js.
Much more could be done with better JavaScript, and that's why the mathtran-javascript project has been set up.
This project is a good opportunity for someone who is interested in writing high-quality low-level JavaScript. No knowledge of TeX is required, although an interest in mathematics or related areas would be helpful.
Mentor would be Jonathan Fine. Continuing information available on Jonathan's MathTran blog.
Implement a C and/or Lua binding of the TeX Live package management system. Details.
Proposed by the potential student; mentors would be Norbert Preining and Karl Berry.
Improve Dublin Core metadata support in TeX. This requires significant experience in programming TeX macros, although changes in the TeX implementation should not be needed. Details.
Mentors would be Matthew Leingang and Peter Flynn.
Student proposals are submitted via the Google Summer of Code web site. When proposing your project, please make sure you include the following information. However, before submitting anything, it is wise to talk with the existing maintainers and other developers first, to establish contact, be sure there are no unexpected questions or issues, etc.
We are not necessarily expecting each of these questions to be individually answered in a proposal, but the information should be present.
When we read this section of your proposal, we will be trying to figure out how well you understand what needs to be done. We're more likely to accept proposals from students who show us that they know what needs to be done.
What will you be working on, and how long will each part of the work take? What objective results will be visible at each stage? How will you know if you are ahead or behind schedule? If you are unable to complete the project, are the results from part-way through still useful? How?
How will everybody know whether things are on-track at the halfway evaluation point?
Please mention any periods during the summer when you won't actually be available to work on the project (though remember, the Summer of Code project is expected to be your main activity).
(Thanks to the GNU Project for allowing us to use their text as a starting point.)
More ideas are certainly welcome (the sooner the better, though of course not after the student application deadline of March 31). There are plenty of good possibilities in the TeX world. However, it is critical that ideas be accompanied by a willing mentor (and ideally a backup mentor) who can commit to the nontrivial amount of time needed to work with the student, pinging them to be sure there are no stoppers, etc.
In addition, please make sure that the description of your idea contains enough information (perhaps in the form of pointers to other information or mailing lists) for students to be able to research the feasibility of their implementing the idea.
Send project ideas to summer-of-code@tug.org, and feel free to subscribe to that list.