Thursday, January 26, 2012

Weird .NET Regex

I was working on a test for SimpleXml and encountered a really weird regex behavior.

I was trying to have a multiline regex match some xml to verify that it had been updated correctly.  I chose regex just because I thought it would be simpler than using an xml parsing library (other than SimpleXml, and I wasn't sure I liked the idea of using SimpleXml in the SimpleXml tests...).

For example, I was trying to match xml like this:
<root>
  <node>test1</node>
  <node>test2</node>
</root>
with a regex like this:
Regex.IsMatch(xmlString, "<node>test1</node>.*<node>test2</node>", RegexOptions.MultiLine);
It should match, but it's not matching.  I tried all kinds of variations throwing in end of line and start of line matchers, etc and nothing worked until I found this:
Regex.IsMatch(xmlString, "<node>test1</node>.*\s*<node>test2</node>", RegexOptions.MultiLine);
For giggles I tried it with .*.* but that doesn't work.  The only pattern I found that worked was .*\s* and I really don't understand why.  So if you can explain why, I'd love to hear it!

update:
Thanks commenters!

Turns out there were 3 things I thought I understood about regex that I didn't:
#1: As explained on regexlib.com \s matches any white-space character including \n and \r.  So that's actually all I needed.  No .* required, and no Multiline option required.
#2: Multiline doesn't change the behavior of .* to make it match newlines like I thought.  It only affects $ and ^, as explained in msdn here.
#3: Singleline is the option that changes the behavior of .* to make it match \n.

So, the final regex I needed was simply:
Regex.IsMatch(xmlString, "<node>test1</node>\s*<node>test2</node>");

Monday, January 23, 2012

SimpleXml

I released my very first Open Source project this weekend! It's called SimpleXml.  It's a tiny, single file, 180 line dynamic xml parsing library.  Really it's just a simple wrapper around XElement.

The source, issues, and docs are hosted on bitbucket: https://bitbucket.org/kberridge/simplexml/
And it's published to nuget: https://nuget.org/packages/SimpleXml

SimpleXml was inspired by PowerShell's xml support.  There have been a number of times I've wanted to do some small simple xml reading/writing job in C# and really wished I had the simplicity of powershell's xml api.  Now I do!

You can checkout the bitbucket page for more examples, but here's a simple one:
dynamic x = "<root><child>myvalue</child></root>".AsSimpleXml();
Assert.AreEqual("myvalue", x.root.child);
It doesn't get much easier than that!

Hopefully this will prove useful for someone, but my main motivation for creating it was just to have the experience of creating and open sourcing something simple from scratch.  It would be awesome to have the full experience of people forking the repo, and submitting pull requests too!

Saturday, January 14, 2012

CodeMash 2.0.1.2

Josh Schramm did a CodeMash recap.  And in the spirit of maximizing the value of your keystrokes (as presented by Scott Hanselman at CodeMash and blogged about by Jeff Atwood), I thought I'd do the same.

This year was my third CodeMash.  Every year I enjoy my time at CodeMash more than the last.  It was nearly 2x larger this year, but the "feel" of it didn't seem to change at all.

Precompiler
Vital Testing - Jim Weirich
Jim gave a good introductory talk to some of the elements of TDD that are hard.  He asked everyone to rate themselves in these categories, then focus on which categories they wanted to work on while TDDing some katas.

I really enjoyed his insight and perspective to TDD even though it was pretty basic.  But this session was still one my favorite of the entire conference because Ben Lee spent it teaching me Erlang.  We did the Greed Dice Game kata and came up with this (nearly complete) solution in Erlang.  Erlang totally blew my mind and renewed my interest in functional languages.  I hadn't programmed in this style since College, so it was really awesome to get exposed to it again.

Day 1
Keynote
Keynote was good.  Ted Neward basically talked about being Pragmatic in how you approach building big systems.  He walked a fine line between saying that Enterprise needs to be simplified without ever saying that Enterprise is a bad word.  Also, he swore alot.

Here are some of the recommendations I really liked:
  1. Resist the temptation of the familiar
  2. Reject the "Goal of Reuse"
  3. Befriend the uncomfortable truth
    1. be cynical
    2. question the assumptions 
    3. look for hidden costs
    4. investigate the implementations
  4. Eschew the "best practice"
  5. Embrace the "perennial gale of creative destruction" (AKA, you will have to learn new things)
  6. Context matters: create an evaluation function of your own for new tech
  7. Attend to goals
Inside the Microsoft Web Stack of Love - Scott Hanselman
Hanselman is an amazing presenter.  His room was overflowing 15minutes before he was even scheduled to talk, but he kept everyone entertained by first typing funny stuff into notepad, and then playing YouTube videos.

He did a bunch of demos of a bunch of stuff and made one umbrella point: that MS wants to unify all the tools under ASP.NET and encourage devs to combine these tools as needed.  For example, create an app that uses both MVC, WebForms, Signal R, and Web API.  I was most impressed with Web API, which as far as I could gather is just the new WCF REST stuff.  WCF is a really bad word in my office, because WCF was really awful.  We like to say it takes x time to write a WCF service, and 2x time to configure it. But the new Web API looks alot like MVC, but without all the attributes!  So it's even cleaner!

Mastering Change With Mercurial - Kevin Berridge
I was very happy with how my talk went.  I probably spent 30+ hours preparing and practicing this talk on one of my favorite subjects: DVCS.  It was a combination of drawings, screenshots, and screen capture videos.  The most memorable part of it for me was how many questions I got.  People were very interested in Mercurial Queues in particular, which is a pretty complicated topic.  So I was glad I'd presented it in a way that obviously aroused people's curiosity enough to want to understand it better.

Functional Alchemy - Mark Rendle
Mark showed a bunch of different functional techniques implemented in C#.  The most memorable were a Memoize implementation, a .AsAsync extension method, and a clever Try Catch trick to DRY up catch blocks.  Just about everything he showed I intend to use at some point in our work projects.

I was able to ask him after his talk if there were any performance concerns with depending on lamba expressions so heavily in C#.  His answer was fascinating.  He said in .NET 3.5, the cost could be non-trivial, but in .NET 4, you could practically wrap every expression in a lambda if you wanted to.

Effective Data Visualization - David Giard
Visualization is a concept I've been really excited about recently, but haven't started to dive into much yet.  This was a fun talk with lots of examples of different visualizations.  And it presented many of Edward Tufte's rules: Lie factor (change in data/change in visual representation), Data-ink ratio (data ink/non-data ink).

Day 2
Dealing with Information Overload - Scott Hanselman
I went to this talk just to be entertained, but I think I will actually get something out of it.  Scott recommended making a list of all your data "inputs" and ranking them in terms of priority to YOU.  Stuff like work email, home email, twitter, facebook, google reader, and even TV.

C# Stunt Coding - Bill Wagner
I learned a couple new things in this talk about .NET's Expression object.  And the first example literally applied to the code I had open in my lap at that very moment, which was such a wonderful coincidence!  It also made me realize I need to spend some time digging through the framework.  They've added so much new stuff in 3.5 and 4.0 and I never took the time to really study the additions, as I figured I'd discover them eventually.

Capability vs Suitability - Gary Bernhardt
I actually was sitting in Applied F# waiting for it to start when Corey Haines tweeted that Bernhardt's talk was so great everyone really needed to attend.  And if you know me, then you know Bernhardt is kind of my hero.  So it was taking all my will power to stay in the F# talk, even w/ how pumped up I was about functional from my previous Erlang experience.  So once Corey Haines piled on, my will power lost out and I switched to Bernhardt.

It was an interesting talk with some cool history.  I think the biggest take away for me was his discussion of Activity vs. Productivity.  If you type really fast, you are productive at making characters appear on the screen, but that doesn't necessarily mean you are Getting Things Done any faster than the next guy.  So, Activity != Getting Things Done.  And I suspect I suffer from that.

His point with the Activity thing was that when you see a lot of activity, like in Ruby, that doesn't mean they're really accomplishing a lot of practical work.  He went so far as to say Java is probably where the real work is getting done.  While the Ruby people are running around, being active, making lots of noise.

His broader point was when you see all that activity, it's probably an indication that there is something new happening.  That they are pushing the capability boundaries.  And when you look at the history of our industry those expansions of capability are usually followed by contractions to suitability.

It was a pretty thought provoking talk.  Not least of all because I think it's an over simplification that I don't fully agree with, but I haven't been able to put my finger on it yet.

Conversation - With Everyone
I go to most of the sessions at CodeMash 'cause I'd feel guilty if I didn't.  But what makes CodeMash worth my time is really the conversations with so many people doing so many different things with so many different tools on so many different platforms.  It's like what I try to do at Burning River Devs * 1000.  And it's what renews my energy for the rest of the year to keep fighting all the technical, process, and people related battles that come with building software.