[OS X TeX] Regular Expression needed...
Peter Dyballa
Peter_Dyballa at Web.DE
Mon Nov 8 01:04:18 CET 2010
Am 08.11.2010 um 00:06 schrieb Herbert Schulz:
> Can someone supply a regex that will find repeated words (e.g.,
> repeated repeated) in a file? This is for use with TeXShop's OgreKit
> Find. It would also be nice to be able to have a replace regex to
> leave only one of the repeats.
This isn't that easy...
You're searching for non-word constituent followed by at least one
word constituent followed by one non-word constituent followed by a
repetition of this group. This can be described as:
\W\(\w+\)\W\1
It's possible to replace these by character classes. Presumingly. When
we're going to replace one of the word repetitions we have to think of
the non-word constituent between them. Or before the first
("original") word. So we could try:
\W\(\w+\)\(\W\)\1 -> \2\1
The non-word constituent before the first word is erased. The first
word is saved and the non-word constituent following is also saved.
These two are then, contrariwise, used as substitution. Or:
\(\W\)\(\w+\)\W\2 -> \1\2
which could be simplified, as Einstein wants it, to:
\(\W\w+\)\1 -> \1
It should work... (although I don't know if that's OgreKit's dialect)
It won't work with " word, word.". Here two non-word constituents
separate the repeated word from its first appearance. It will work
with " end end.". It will fail with " The the ". In the latter case
character classes might work.
It can also fail at the beginning of the line! (Maybe at its end as
well.)
--
Greetings
Pete
Email is a wonderful thing for people whose role in life is to be on
top of things. But not for me; my role is to be on the bottom of
things. What I do takes long hours of studying and uninterruptible
concentration.
– Donald Knuth
More information about the macostex-archives
mailing list