Monday, June 11, 2012

Word Ladder in F#

Anthony Coble sent me his solution to this in Haskell before his Haskell talk at Burning River Developers.  The problem was to find a Word Ladder between two words.  A word ladder is a chain of words, from a start word to an end word, in which each word in the chain is one letter different from the word before it and is a real word.  The goal is to find the shortest ladder between the start and end word.  It's a neat problem, made even neater by the fact that it was invented by Lewis Carroll!  So I couldn't resist giving it a try in F#.  And fortunately, I don't know Haskell so I couldn't understand Anthony's solution, so I had to figure it out all on my own.

Speaking of which, this is a fun problem.  So if you want to give a try, you should stop reading here and come back after you've solved it!

I found this problem to be surprisingly difficult.  I kept coming up with ideas but inevitably they were over simplified and wouldn't work.  I was visualizing it as a tree.


It's an interesting problem because it has three different cases to consider.  It's normal for a problem to have two cases, like the first item in the list, and every other item.  But at least with the way I approached it, this problem had three cases: the first node (which has no parent node), the nodes in the first level of the tree (which all have the same parent), and every level after that (where there are many different parents).  This kept causing me to come up with ideas that only worked for the first level, or ideas that worked for one of those levels and not the others...

I knew that I needed a breadth first search but I was really struggling with how to implement it while also keeping track of the path to each node.  Usually a breadth first search effectively just loops over all the nodes in each level, and then calls the next level recursively.  But I need to know what the parent of each node is, and what that parent's parent is, etc up to the root.  My solution to this was to represent each node as a tuple containing the list of it's full path from root to that node, and the word of that node.  This is what I passed through the recursive calls, therefore I always knew the full path to each node.  This was simple, but has the downside of wasting memory since it duplicates a significant portion of the parent list on each node.

Another interesting element of this solution is that I prune words out of the tree that I've already seen, like the two red nodes in the picture above.  This means I don't have to worry about "infinite loops" (they're not really loops, but you get what I mean) in the graph, so I can search until I run out of nodes.  And its safe, because I need to find the shortest ladder, so if I've already encountered a word earlier in the graph, that same word can't be part of the solution later in the graph.

The git repo with my solution is here: https://github.com/kberridge/word-ladder-kata-fs
And here's the code:
let words = System.IO.File.ReadAllLines("four-char-dictionary.txt")
let isWordInDict word =
words |> Seq.exists (fun w -> w = word)
let filterToDict words =
words |> List.filter isWordInDict
let filterNotSeen (seen : Set<string>) words =
words |> List.filter (fun w -> not(seen.Contains(w)))
let findChildren (word : string) = [
let generateCandidates prefix char postfix =
['a'..'z'] |> List.filter (fun i -> i <> char)
|> List.map (fun l -> prefix + l.ToString() + postfix)
yield! generateCandidates "" (word.[0]) (word.[1..])
for i in 1..3 do
yield! generateCandidates (word.[0..(i-1)]) (word.[i]) (word.[(i+1)..])
]
let findValidUniqueChildren seen = findChildren >> filterNotSeen seen >> filterToDict
type node = string List * string // parent List * word
let rec buildNodes (words : string List) (parents : string List) (nodes : node List) =
match words with
| [] -> nodes
| w :: tail -> buildNodes tail parents ((parents, w) :: nodes)
let rec findLadderWorker (searchNodes : node List) (endword : string) (seen : Set<string>) =
let queuechildren word parents searchNodes =
let children = findValidUniqueChildren seen word
let newparents = word :: parents
let childnodes = buildNodes children newparents []
let newSearchNodes = searchNodes @ childnodes
findLadderWorker newSearchNodes endword (seen + (Set.ofList children))
let testnode (node : node) (searchNodes : node List) =
match node with
| parents, word when word = endword -> word :: parents |> List.rev
| parents, word -> queuechildren word parents searchNodes
match searchNodes with
| [] -> []
| node :: others -> testnode node others
let findLadder startword endword =
if startword = endword then
[endword]
else
let children = findValidUniqueChildren Set.empty startword
let seen = Set.ofList (startword :: children)
let startNodes = buildNodes children [startword] []
findLadderWorker startNodes endword seen
let args = System.Environment.GetCommandLineArgs()
if args.Length < 3 then // 3 is a hack to support both fsi and fsc
printfn "You forgot to include start and end word arguments"
else
let startword = args.[(args.Length - 2)]
let endword = args.[(args.Length - 1)]
let ladder = findLadder startword endword
printfn "%A" ladder
view raw word-ladder.fsx hosted with ❤ by GitHub


Here are some notes on some of the things I found interesting about this code:

  • The array slicing code in the findChildren function was fun.
  • And using the range operator in the generateCandidates ['a'..'z'] was fun too.
  • The findValidUniqueChildren function is an excellant example of something I've been struggling with in F#.  What parameters does this function take?  What does it return?  It's NOT easy to figure this out, is it?  This is also the first time I've used function composition for real!
  • Notice in the queuechildren method how I'm using the concatenate list operator: "let newSearchNodes = searchNodes @ childnodes"?  The Programming F# book says if you find yourself using this it probably means you're doing something in a non-idiomatic way...  I suppose I could have written buildNodes to append the nodes to a provided list, but that seemed awkward.
  • The match of findLadderWorker is pretty typical, but demonstrates another little pattern I keep finding.  For the second match, I created a function to call since it's longer than 1 line, so I had to make up a name.  I went with "testnode" which I don't really like, but I had to name it something!
Here's Anthony Coble's Haskell solution.  His is very different and shows a great example of a different way to approach the problem.


If you solve it, in any language, send me the link to your solution and I'll add it to the post!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.