[OS X TeX] [OT] Need Perl Regex for...

Alan Munn amunn at msu.edu
Sat Sep 20 23:01:35 CEST 2008


At 3:41 PM -0500 9/20/08, Herbert Schulz wrote:
>On Sep 20, 2008, at 1:17 PM, Alan Munn wrote:
>
>>At 12:11 PM -0500 9/20/08, Herbert Schulz wrote:
>>>Howdy,
>>>
>>>Suppose I have a sentence like
>>>
>>>Here are some words <fnameA>.<fnameB>.<ext> and more text afterward.
>>>
>>>where <fnameA> and <fnameb> may have spaces/tabs in them and you 
>>>may assume <ext> has no spaces/tabs. Can one of you Perl experts 
>>>out there (I know you're there!) give me a Perl regex that would 
>>>pick out only the <fnameA>.<fnameB.<ext> part of the line. Can it 
>>>be generalized to include multiple <fname> sections separated by 
>>>`.'
>>
>>Unless the "Here are some words" part is some sort of fixed string 
>>that you could identify, I don't think there's any way to 
>>distinguish a word that is part of the "Here are some words" part 
>>from a word that is part of <fnameA>,  if <fnameA> is allowed to 
>>contain spaces.
>>
>>I.e. if fnameA = My file.ext
>>
>>how can you tell whether "My" in  "Here are some words My file.ext 
>>" belongs to the filename or not?
>>
>>If spaces are prohibited, then the regex
>>
>>(:?[\S]*?\.)+[\w]{3}
>>
>>will pick out sequences of <fnameA>.<fnameB>.ext for arbitrary 
>>numbers of <fname> assuming ext is always 3 characters.  But I 
>>don't see a way around the spaces problem. (But I'm prepared to be 
>>amazed by someone else's answer!)
>>
>>Alan
>
>
>Howdy,
>
>You can assume that the first part is fixed, so that isn't the real 
>problem. Also, the second part , with its leading `.' is optional 
>and may repeat: e.g., <fnameA>.<ext>, <fnameA>.<fnameB>.<ext>, 
><fnameA>.<fnameB>.<fnameC>.<ext>, etc. The final part is NOT fixed 
>and may not exist in some situations.

Now it looks like you've changed your original request.  But assuming 
I understand what you want, the following will do, where DELIM is 
whatever the fixed first part is.  After evaluating the  regex, \1 
will contain the string which contains a list of filenames of the 
sort FnameA.FnameB.ext for arbitrary numbers of FnameA (separated by 
.) and each filename.ext separated by a comma or a comma and a space 
(or even a space); Filenames themselves can have spaces.

DELIM ((:?(:?.*?\.)+[\w]{3}[, ]*)+)

>
>Good Luck,

Sometimes you're automatic text produces funny results.  I'm not sure 
I need the luck here!

Alan



More information about the macostex-archives mailing list