[tug-summer-of-code] Project ideas

Sun Mar 8 13:02:04 CET 2009

On Sun, Mar 8, 2009 at 10:01 AM, Taco Hoekwater <taco at elvenkind.com> wrote:

>
> Hi,
>
> Scott Pakin wrote:
>
>> Karl Berry wrote:
>>
>>>    I also have an idea to propose. In short, the idea is parallel
>>>    version of TeX. My plan is to use OpenMP for parallelization. It
>>>    should make TeX run faster without much more complication in the
>>>    code base. Please give me some thoughts.
>>>
>>> The basic TeX code is monolithic and relies almost entirely on global
>>> variables.  There have been some discussions of parallelizing certain
>>> small parts of the code, such as the output routine, but even this is
>>> fraught with extreme difficulty.  I am not sure what can be usefully
>>> parallelized.
>>>
>>
>> While it's true that TeX is monolithic and relies almost entirely on
>> global variables, the real problem for parallelism as I see it is
>> that TeX implements an extremely sequential state machine.  Each
>> character must be read and processed before proceeding with the next
>> character.  This is because earlier processing can affect later
>> processing (cf.  \catcode).  One can't even process pages in
>> parallel because it is not known in advance where the page breaks
>> will wind up.
>>
>
> After a lot of talk and discussion with lots of people, I have
> collected some ideas where parallelization seems possible,
> at least in theory (I am talking about pdftex|luatex. None of
> this applies to tex82, and I don't know enough about xetex).
>
>
> A list of ideas, more or less in descending order of potential
> gain and increasing order of implementation complexity:
>
> * The backend can run in a separate thread. This is the big one,
> I estimate that up to a 20% speedup is possible. All other
> points below will score lower than this except for specially
> crafted documents (and a lot lower, like maybe 2-3% gain in
> 'good' cases).
>
> * Some types of images require conversion or extensive parsing
> (like inclusion of pdf pages). These could be prefetched by a
> separate thread.
>
> * Font subsetting at the end of the run could happen in parallel
> for multiple fonts.
>
> * Processing can usually continue after a \font\cs assignment. And even
> if the font turns out to be needed immediately, at least the guard
> code can be simple: you only have to block the \cs and \fontdimen
> access to it.
>
> * The paragraph builder typically does two runs over a paragraph:
> first without hyphenation, then with. That second run is not always
> needed, but it could be done in a second thread in any case, thus
> saving time when it was indeed necessary.
>
> * Input reading can continue during the display math list
> conversion, because the result of that conversion is not needed
> until the next display or \par or one of a few primitives
> (\unskip, \prevgraf, and maybe a few more) is encountered.
>
> * The same may be true for input reading during paragraph breaking,
> but there the list of blocking events is much longer.
>
>
> The problem: I fear that for someone not intimate with tex-like
> engines and their complex build systems, the simplest of these
> (image prefetching or font subsetting) will take the full gsoc period,
> and may not even get done in that time. Worse still, we could end up
> with some patch that only works on linux, nowhere else. I definately
> cannot mentor such a project because it would in all likelihood eat
> up so much time that I could do nothing else in the period.
>
> Best wishes,
> Taco
>
>
> Thank you for your thoughts. It seems to be not worth the effort since the
best case is only 20% speed up.

Kittipat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://tug.org/pipermail/summer-of-code/attachments/20090308/91840263/attachment.html