My main point of focus at work lately has been promoting maintainable code. One of the key tenets is readable code. The single responsibility principle and a low cyclomatic complexity are important, but if you are still using cryptic, prefixed, acronymed, and highly abbreviated identifiers, it is still going to be a chore for the reader to decipher. My slogan: "let's take the code out of source code".
I was just listening to Roy Osherove talk about regular expressions on .NET Rocks. A recurring theme brought up was how hard regular expressions are to deal with. Not necessarily creating them - you can do a lot by just knowing the basics - but dealing with them after they've been written. As they mentioned on the show, your source code ends up looking like a cartoon character swearing, which is the likely response you'll get from the poor maintenance developer that has to deal with it. Regular expressions are often referred to as a "write-only" language.
It got me thinking that this was a problem worth solving. Regular expressions are too powerful to ignore. For a certain set of problems, a regular expression can eliminate a LOT of potentially error-prone code. I cannot justify advocating avoiding regular expressions, no matter how much I value source readability. So what if we could make regular expressions readable?
Inspired by the Ayende's Rhino.Mocks syntax, I created a library that provides a better way to define regular expressions in your source code. The easiest way to describe it is to show it in action. Suppose we want to check for social security numbers. You might write code like this:
Regex socialSecurityNumberCheck = new Regex(@"^\d{3}-?\d{2}-?\d{4}$");
Using ReadableRex (not settled on the name yet...), it would look like:
Regex socialSecurityNumberCheck = new Regex(Pattern.With.AtBeginning
.Digit.Repeat.Exactly(3)
.Literal("-").Repeat.Optional
.Digit.Repeat.Exactly(2)
.Digit.Repeat.Exactly(4)
.AtEnd);
You could argue that the second example is actually harder to read, because the reader is bogged down with the details of how a social security number check is performed. It may be a bad example, because the algorithm for detecting a SSN is both well-known (in the US, at least) and unlikely to change. Consider a situation where the expected match is not well-known, and very likely to change: screen scraping HTML. In that case, being able to read through the algorithm, and easily identify which parts need to change becomes very important. To illustrate, I dug up some old code that was used to scrape basketball scores from espn.com. It's a good example of an ugly pattern that had to be maintainable, since the HTML layout could change at any time.
const string findGamesPattern = @"<div\s*class=""game""\s*id=""(?<gameID>\d+)-game""(?<content>.*?)<!--gameStatus\s*=\s*(?<gameState>\d+)-->";
Using ReadableRex:
Pattern findGamesPattern = Pattern.With.Literal(@"<div")
.WhiteSpace.Repeat.ZeroOrMore
.Literal(@"class=""game""").WhiteSpace.Repeat.ZeroOrMore.Literal(@"id=""")
.NamedGroup("gameId", Pattern.With.Digit.Repeat.OneOrMore)
.Literal(@"-game""")
.NamedGroup("content", Pattern.With.Anything.Repeat.Lazy.ZeroOrMore)
.Literal(@"<!--gameStatus")
.WhiteSpace.Repeat.ZeroOrMore.Literal("=").WhiteSpace.Repeat.ZeroOrMore
.NamedGroup("gameState", Pattern.With.Digit.Repeat.OneOrMore)
.Literal("-->");
I think this would be much easier to maintain. Note that this library doesn't actually perform an regular expression operations - it simply provides another way to define regular expression patterns. You still need to use the System.Text.RegularExpression.Regex object with the pattern you create. Since the Pattern type has an implicit conversion to System.String, so you can easily pass it to the the methods/constructors on Regex.
What do you think? Download the code or just the assembly DLL, give it a try, and tell me what you think. None of the method/property names are set in stone, so the syntax may change, but the approach will remain the same.
Powered by: newtelligence dasBlog 2.1.8209.14743
Special thanks to LosTechies.com
Site theme based on the essence design by Jelle Druyts
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.