[tug-summer-of-code] A couple of project proposals
Scott Pakin
scott at pakin.org
Mon Feb 16 01:41:45 CET 2009
Karl Berry wrote:
> Think of them as "project ideas", not proposals. We're not applying for
> a grant -- in general, there is no need to go into great detail. Part
> of what makes a GSoC project successful is the student being invested in
> learning about the area. If they can't be bothered to flesh out a
> project proposal, it's highly unlikely they'll succeed in writing any
> useful code.
>
> To sum up, your original paragraph written to the list, plus some of the
> subsequent details, looked pretty reasonable to me as an "idea".
All right, then what do you (plural, meaning everyone on this list)
think of the following "idea"?
The LaTeX typesetting system (http://www.latex-project.org/)
provides commands for typesetting thousands of different symbols
needed to prepare documents in the fields of linguistics,
mathematics, music, engineering, physics, and many others. A
challenge for someone writing a document is to find the LaTeX name
for a given glyph. Currently, the best solution is to refer to
the Comprehensive LaTeX Symbol List
(http://www.ctan.org/tex-archive/info/symbols/comprehensive/), a
collection of symbol tables organized into ad hoc categories and
indexed by LaTeX symbol name. The problem with this approach is
that different users associate different names to the same glyph,
making searching difficult. Consider, for example, trying to find
the LaTeX name for a circle with a dot in the middle. An
astronomer may search for "sun"; a mathematician may search for
"circumference"; a linguist may search for "click consonant"; a
mapmaker may search for "city center"; someone writing about
alchemy may search for "gold". In fact, an entire Wikipedia page
is devoted to listing the various meanings for this symbol
(http://en.wikipedia.org/wiki/Circle_with_a_point_at_its_centre).
Non-English speakers are at a further disadvantage because most
LaTeX symbols are named by English speakers.
We believe that a great aid to LaTeX users would be a Web-based
symbol-search tool based on text recognition. That is, we imagine
a Web page at which a user could *draw* a symbol then be shown a
list of the LaTeX symbols (commands and rendered output) that best
match the user's drawing. A student would need to evaluate the
numerous options for recognizing hand-drawn symbol, find a
suitable internal representation for the thousands of LaTeX
symbols and associated metadata, construct a suitable user
interface to interact with the symbol recognizer, and ensure that
the resulting software is maintainable, especially given the
frequency with which new symbols are added to LaTeX.
This is no doubt a challenging idea to implement. However, it is
bound to be an exciting, rewarding experience because of the
abundance of technologies involved: TeX/LaTeX, text recognition,
various Web technologies, and probably multiple programming
languages. There is much to learn, and TUG is eager to mentor a
student with the interest and abilities to pull off a
handwriting-to-LaTeX-symbol project.
-- Scott
More information about the summer-of-code
mailing list