[nzlug] Hard regexp question
Martin D Kealey
martin at kurahaupo.gen.nz
Mon Aug 6 16:40:09 NZST 2007
On Sun, 5 Aug 2007, Nick 'Zaf' Clifford wrote:
> So now we need some way to say:
> anything followed by two backslashes and a quote, or anything but a
> backslash and a quote.
> Which is
> (\\\\|[^\\])"
>
> So now, if my reasoning is right, the following regex:
> /".*[^ ].*(\\\\|[^\\])"/
> Will correctly match the following strings:
> "Hello world" (the whole string)
> "Hello horse," the maniac whispered. "What's your name?" (match the
> Hello and the What's your name separately as two matches)
> "In programming, we escape \"strings\" like this" (match the whole
> string containing quotes)
> "Now we can end strings with \\" (Match the whole string)
>
> *Runs off to try it out*
>
> Whoo hoo! It worked!
It will also match
"Here is a quote with a backslash in front \\\"
and if not greedy, will therefore fail to match
"Here is a string with a backslash in front \\\"a string\""
What you really need is to check that *every* character is not a backslash,
or is a backslash followed by another character:
/"
(
( \\. | [^"] )*
( \\. | \S )
( \\. | [^"] )*
)
"/x
Or another way: check that it starts with any number of spaces, followed by
something that is neither a space nor the closing quote. Then check that the
entire thing starts and ends with quotes and contains any backslash (quoted)
characters and any other characters that aren't quotes.
/"
(
(?> \s* [^ "] )
( \\. | [^"] )+
)
"/x
Note we don't check that the each character is NOT a backslash; this can be
guaranteed, since the alternation won't backtrack, and we avoid the singular
exception of a trailing backslash at the end of input because we insist that
it finishes with a double quote. In general though you'd probably want to
write
/"
(
(?> \s* [^ "] )
( \\. | [^"\\] )+
)
"/x
Note the trailing /x, which make the whole thing a bit less of a write-only
language (hopefully!).
-Martin
More information about the NZLUG
mailing list