I was trying to have a multiline regex match some xml to verify that it had been updated correctly. I chose regex just because I thought it would be simpler than using an xml parsing library (other than SimpleXml, and I wasn't sure I liked the idea of using SimpleXml in the SimpleXml tests...).
For example, I was trying to match xml like this:
<root> <node>test1</node> <node>test2</node> </root>with a regex like this:
Regex.IsMatch(xmlString, "<node>test1</node>.*<node>test2</node>", RegexOptions.MultiLine);It should match, but it's not matching. I tried all kinds of variations throwing in end of line and start of line matchers, etc and nothing worked until I found this:
Regex.IsMatch(xmlString, "<node>test1</node>.*\s*<node>test2</node>", RegexOptions.MultiLine);For giggles I tried it with .*.* but that doesn't work. The only pattern I found that worked was .*\s* and I really don't understand why. So if you can explain why, I'd love to hear it!
update:
Thanks commenters!
Turns out there were 3 things I thought I understood about regex that I didn't:
#1: As explained on regexlib.com \s matches any white-space character including \n and \r. So that's actually all I needed. No .* required, and no Multiline option required.
#2: Multiline doesn't change the behavior of .* to make it match newlines like I thought. It only affects $ and ^, as explained in msdn here.
#3: Singleline is the option that changes the behavior of .* to make it match \n.
So, the final regex I needed was simply:
Regex.IsMatch(xmlString, "<node>test1</node>\s*<node>test2</node>");
'.' will not match a newline, but '\s' will (regardless of Multiline mode)
ReplyDeleteSee http://regexlib.com/CheatSheet.aspx
You meant "RegexOptions.Singleline."
ReplyDeletesource