# [texworks] Improving Syntax Highlighting

Chris Jefferson chris at bubblescope.net
Tue May 8 16:48:32 CEST 2012

On 08/05/12 11:42, Stefan Löffler wrote:
> Hi,
>
> On 2012-05-06 12:50, Chris Jefferson wrote:
>
>> This implies a number of limitations. The big one is no multi-line
>> user regular expressions, sorry. Specific multi-line things can be
>> custom written in C++ obviously.
>>
>> The things I would most like, in order of preference, are:
>>
>> 1) Matching of maths ( both  and  ).
>> 2) Ability to highlight specific \begin{x} ... \end{x} sections.
>> 3) Highlighting of parts of regular expressions (for example, in
>> \textbf{XYZ}, make the XYZ bold).
> What would be nice here would be some form of delimiter matching. E.g.,
> correctly match something like \section{A {B} C}. This doesn't work with
> reg-exps alone, but I recently found that Gtk-source-view
> (http://projects.gnome.org/gtksourceview/documentation.html) can do it.
> As I understand it, it includes the possibility to give two regular
> expressions: one for the beginning, and one for the end of the
> to-be-matched string. Since I guess something like that will be needed
> for \begin/\end section matching anyway, I thought I'd mention this.
> To that end, I guess we should think about supporting some more
> sophisticated configuration files in the long run (e.g., XML based).

this.

Perhaps rather than regular expressions, some kind of latex-aware
tokeniser might be a better approach.

For example, given something like:

I like \textbf{Lots of $x$ and $y$ and \textit{z} }

This would be tokenised into (note: I would go and look what proper
latex tokenisation looks like!)

'I' 'like' '\textbf' '{' 'Lots' 'of' '$' 'x' '$' 'and' '$' 'y' '$' 'and'
'\textit' '{' 'z' '}' '}'

Then make a stack of the current state, and as we scan along we 'push'
and 'pop' things on and off this stack. That would handle nested
expressions nicely, and would (I believe) make things like not
highlighting inside a verbatim easier.

In this mode, rather than giving a regular expression, you would state
how you wanted (for example) inside a textbf, or inside math mode, or
inside a tabular, to be formatted. You could also state how classes of
tokens (numbers, {}, \commands) were coloured.

The biggest problem with this is that is would be totally different to
what came before, and would be very latex-dependant. I (for example)
don't know what is up in the world of luatex, and other tex variants.

I might have a play with this, and see what it looks like and how the
code looks.