[tug-summer-of-code] A couple of project proposals

Tue Jun 30 20:04:57 CEST 2009

On Tue, Jun 30, 2009 at 16:10, Daniel Kirsch wrote:
>
>> I trained a few symbols on your site and noticed that many of the
>> symbols are accented letters, just because there are so many of them:
>> \acute{a}, \acute{b}, \acute{c}, ..., \acute{z}, \hat{a}, \hat{b},
>> ..., \hat{z}, etc.  Maybe you could bias the training requests towards
>> some of the more obscure or hard-to-name symbols?
>
> I have now integrated the training into the searching wich makes more
> sense anyway.

Thanks. That makes much more sense.

> The old training is still available and will always
> offer the symbols with the least samples to be trained.

It would make sense to ask the user if he wants to find:
a) a standalone symbol (\alpha, \times, \int, ...)
b) or an accent

It makes no sense to treat a caron or macron or acute accent ... on
each and every letter separately.

In case a) you may offer an empty box and in case b) you draw an
imaginary symbol (could be a gray circle) in the background, so that
user can place an accent on it. User is probably only interested in
knowing the command for placing accent. I don't think that you really
need to recognize the letter along with the accent. It makes the task
more difficult and it leaves you with many more choices at the end.

Even if you cannot recognize the symbol properly, you can still
display "caron", "circumflex", "tilde" and "macron" on the same page
and the approximate information will still be useful. If you need to
recognize the letter along with it, not only will be the recognition
less accurate, but say that you would recognize the letter as being
"o", "O", "a", "q" or "Q" - > you would be left with 20 possible
outcomes which would make it difficult to display all the possible
choices.

>> Have you already trained the program on the typeset versions of the
>> symbols, or do you require handwritten input?
>
> Everything is based on handwritten input. That was a performance
> decision. I have experimented with analyzation of image data and found
> it to be too slow (in ruby with rmagick at least).

I don't know what technology you use in the background, so I don't
understand what exactly makes the process slow. You should be able to
extract at least the "main strokes/features" (something like a graph
showing where lines are connected or disconnected) from typeset data.

>> Once again, good job!  I hope you manage to get the program trained on
>> lots of symbols in the near future.
>
> I would really like to support the Comprehensive List of LaTeX Symbols
> but as already noted I am not very experienced with LaTeX. The System
> should work with any kind of hand-drawn symbol but right now my
> problem is that I don't know how to get all these symbols rendered for
> the web. I am using MathTeX (http://www.forkosh.com/mathtex.html) to
> render the Symbols.

You should not worry. This step is easier than any other you have made
so far. Since the number of symbols is limited, you could simply
pre-generate them all (for example create dvi or png and then convert
using dvitopng or ghostscript to convert from PDF to PNG).

Mojca