[tug-summer-of-code] A couple of project proposals

Scott Pakin scott at pakin.org
Mon Feb 16 01:41:45 CET 2009


Karl Berry wrote:
> Think of them as "project ideas", not proposals.  We're not applying for
> a grant -- in general, there is no need to go into great detail.  Part
> of what makes a GSoC project successful is the student being invested in
> learning about the area.  If they can't be bothered to flesh out a
> project proposal, it's highly unlikely they'll succeed in writing any
> useful code.
>
> To sum up, your original paragraph written to the list, plus some of the
> subsequent details, looked pretty reasonable to me as an "idea".

All right, then what do you (plural, meaning everyone on this list)
think of the following "idea"?

     The LaTeX typesetting system (http://www.latex-project.org/)
     provides commands for typesetting thousands of different symbols
     needed to prepare documents in the fields of linguistics,
     mathematics, music, engineering, physics, and many others.  A
     challenge for someone writing a document is to find the LaTeX name
     for a given glyph.  Currently, the best solution is to refer to
     the Comprehensive LaTeX Symbol List
     (http://www.ctan.org/tex-archive/info/symbols/comprehensive/), a
     collection of symbol tables organized into ad hoc categories and
     indexed by LaTeX symbol name.  The problem with this approach is
     that different users associate different names to the same glyph,
     making searching difficult.  Consider, for example, trying to find
     the LaTeX name for a circle with a dot in the middle.  An
     astronomer may search for "sun"; a mathematician may search for
     "circumference"; a linguist may search for "click consonant"; a
     mapmaker may search for "city center"; someone writing about
     alchemy may search for "gold".  In fact, an entire Wikipedia page
     is devoted to listing the various meanings for this symbol
     (http://en.wikipedia.org/wiki/Circle_with_a_point_at_its_centre).
     Non-English speakers are at a further disadvantage because most
     LaTeX symbols are named by English speakers.

     We believe that a great aid to LaTeX users would be a Web-based
     symbol-search tool based on text recognition.  That is, we imagine
     a Web page at which a user could *draw* a symbol then be shown a
     list of the LaTeX symbols (commands and rendered output) that best
     match the user's drawing.  A student would need to evaluate the
     numerous options for recognizing hand-drawn symbol, find a
     suitable internal representation for the thousands of LaTeX
     symbols and associated metadata, construct a suitable user
     interface to interact with the symbol recognizer, and ensure that
     the resulting software is maintainable, especially given the
     frequency with which new symbols are added to LaTeX.

     This is no doubt a challenging idea to implement.  However, it is
     bound to be an exciting, rewarding experience because of the
     abundance of technologies involved: TeX/LaTeX, text recognition,
     various Web technologies, and probably multiple programming
     languages.  There is much to learn, and TUG is eager to mentor a
     student with the interest and abilities to pull off a
     handwriting-to-LaTeX-symbol project.

-- Scott


More information about the summer-of-code mailing list