- Randomized depth-first search
- Functional Program Design in Scala
- Gist for fable.io/repl
- Online demo
- Gist with bits and bobs
First book I ever read on computer science was “Algorithms + Data Structures = Programming“ by Niklaus Wirth. For a very long time, my favorite one was “Introduction To Algorithms“ by Thomas Cormen. You can see the pattern, right?
It the era of JSON-over-HTTP neither algorithms nor data structures and considered necessary for programming. Actually, it is considered a success if someone choses the right data structure, not mentioning implementing it or, at least, understanding how it works. It is kind of understandable as it is no longer required to understand B+ Trees to use SQL database.
But it helps. And it is fun.
Breadth-First Search is tree traversal algorithm. Because tree is a graph without cycles, by just tracking visited nodes (and preventing cycles) we can use it to build spanning tree of any graph.
I touched the topic of spanning trees quite some time ago while writing about maze generation (which can be implemented as “random spanning tree”).
Breadth-First Search can be expressed with three steps:
- take first node from the queue, mark it as visited and return it
- append all unvisited child nodes to the end of the queue
- rinse and repeat
This is neither the most generic implementation of BFS nor the (functionally) purest. For example, both
HashSet are mutable data structures and it uses
while loop instead of recurrence, but in this case it is probably most pragmatic one. There is no problem with imperative code in F# (multi-paradigm, functioanl first) as long as it is properly encapsulated. You can read more about this on Eirik Tsarpalis’ blog.
You can see I’m a fan of micro DSLs which allow me express “the essence” without any technical noise. It makes implementation a little bit longer sometimes, but in my opinion, much more readable. There are multiple ways to make this implementation slighlty bit shorter (first step would be using
tap/tee function in few places), but they don’t improve readability so I’ll leave it as it is.
Let’s talk about this function then.
The types are properly inferred by F# compiler and there is no need to declare them explicitly but in this case it would be actually beneficial to do it (adding explicit type declaration on public interfaces is generally a good idea):
idof: 'node -> 'id: a function which will take the node and generate something which can be considered node’s id; if node is its own id it would be sufficient to use built-in
idfunction (which is
fun x -> x); please note, that it is actually ids which are stored in
HashSetand used to determine visited nodes;
fanout: 'node -> 'node seq(to be more precise
'node -> 'nodes when 'nodes :> seq<'node>): a function which will take a single node and return all adjacent nodes (in tree they would be just child nodes)
node: 'node: a starting node
Let’s test it with some simple tree defined as
Map<int, int list>.
I would use example from wikipedia:
Map.find (which is
int -> int list) matches almost exactly signature required by
'node -> 'node seq) but let’s make it a little bit more resilient by using
It will try to find
tree and return list of its children. If no children are found it will return empty list (
Now we can call it:
and… (drum roll):
Yay! Exactly as expented. You can add cycles (for example:
2, [5, 6, 1]) to test if cycles are properly eliminated.
NOTE: To run it in fable.io/repl you will need to polyfill
Queue, for example:
One of the examples of tree traversal algorithms is listing directories. I never really cared which tree traversal algorithm, DFS or BFS, is used for that. I was actually suprised that Unix
find works differently than PowerShell
find uses DFS while
gci uses BFS.
Let’s assume I want to do something like
Get-ChildItem (or like
find, but with BFS). How can I implement this using
First question is: what is the node? The node in this case is directory. Directory access in .NET can be done with
DirectoryInfo class from
System.IO namespace. What is
fanout function? This function will take take a directory and return list of child directories. Last thing left is
idof function returning something which can be used to uniquely identify directory. Let’s say full directory name is good enough in this case (although, something like “inode” or “files system object id” would be much better, as it would resolve potential sym-link cycles).
node defined as follows:
we can run it with:
…and get list of all child directories.
Let’s refactor this into a function:
path as input and returns sequence of folder names (
scanFolders: string -> string seq).
We can run it with, for example:
Please note, that this time we really needed to use
DirectoryInfo is an object without structural equality so
DirectoryInfo(".") != DirectoryInfo("."). To compare them we needed something else, something comparable and
DirectoryInfo.FullName look like a good-enough choice.
It would be actually safer to use slightly modified version of
fanout to suppress exceptions in directories which we have no access to:
Sometimes, apart from getting there (visiting nodes) it is important to know “how we got there” (keeping a history of decisions). We do not want to rewrite
bfs, we would like to reuse it. Let’s call this new method
trace. So far,
bfs is taking
fanout function (
fanout: 'node -> 'node seq) and returns a sequence of nodes (
This paragraph, describing transformation from
trace is probably the most convoluted one in this post. I tried to rewrite it many times but I couldn’t make it simpler (my fault). It is very important though, so please read multiple times until you think you understand what happens here.
trace to return not only a sequence of nodes (
'node seq) as
bfs does but also a history of actions which led to this node:
('node * 'action list) seq. Look again, instead of sequence of
'node we want a sequence of
'node * 'action list. We don’t know nor care what
'action is. We want it to be provided from
fanout. The new
fanout will not only return a sequence of child nodes (
'node -> 'node seq) but also an
action which caused this transition. New
fanout will be
'node -> ('node * 'action) seq - taking node and returning sequence of child nodes and actions leading to them. We will reuse
bfs by providing little adapter functions for
node and new
idof are relatively simple. New starting node is a node with empty action history while new
idof will just ignore action history (as how we got to given node does not affect node’s identity):
fanout is a little bit more complicated. The node passed to
bfs is no longer just node, but also contains action history (
'node * 'action list) so
fanout expected by
bfs should be
'node * 'action list -> ('node * 'action list) seq, but it isn’t. Our new
trace) is (as mentioned before)
'node -> ('node * 'action) seq. So, we will need another adapter:
Now, types match again:
'node -> ('node * 'action) seq while
'node * 'action list -> ('node * 'action list) seq.
Let me try to explain one more time:
fanout' will take a pair of
actions (leading to this node), it will use
node (action history is irrelevant at this point) returning sequence of nodes and actions (
n, a) and create new states with those new nodes and new action prepended to history (
fun (n, a) -> (n, a :: actions)).
trace function is:
Please note that new actions are prepended to action list (
a :: actions) so actions will be returned in reversed order. It is not a problem, it is just worth remembering.
Let’s run it using the
tree we used before.
The only question is what we want the
action to be in this case? There is no left or right, as the only decision is down. Let’s use source node as action, so the history of actions will be actually history of nodes.
We can run it now:
and get the all the nodes with traversal history:
We can of course process it further, prepending final node to history and then reversing order:
to transform it to paths:
This is still not finished thought:
bfs while actually it does not care if underlying algorithm is BFS. It would work absolutely perfectly with any other algorithm (DFS, for example). Let’s extract it then:
As you can see,
trace does not call
bfs directly anymore, it expects function of matching signature to be passed to it.
bfs is a perfect candidate, and works perfectly:
But now, we could use it with (hypothetical)
dfs by just changing it to:
Please note, that with one trivial adapter function (
idof'), one relatively simple adapter function (
fanout') and injectable traversal algorithm (
dfs or… whatever matches the signature) we have completely new functionality (tracing). Composite Reuse Principle, Single Responsibility Principle, Open/Closed Principle, Interface Segregation Principle and Dependency Inversion Principle. All of those in 5 lines of code.
The idea of doing Bloxorz solver comes from Functional Program Design in Scala course. It is a second week assignment. In does not explicitly mentions BFS as it tries to teach lazy evaluations rather than graph algorithms, but I find a great example how to use BFS.
Unfortunately, it is quite hard to explain how it works, so you will need to play Bloxorz yourself on, for example: http://www.bloxorz.org.uk/ (you will also need to enable Flash). If it doesn’t work just google “bloxorz” and you may find multiple sites allowing you to play. If it is year 2050, Flash no longer works, and nobody ported Bloxorz to ES2050, but YouTube still works you can check what it was about here:
If you have a feel how Bloxorz work we can start. We will address only the simplest scenarios: no weak squares, no block splitting and no dynamic environment. While weak squares could be actually quite easy to implement, block splitting and dynamic environment would bring a lot of complexity to solution.
As I already hinted we will use BFS to solve Bloxorz puzzles. Knowning that we need to establish what
node will be, what
idof should return and how
fanout should work.
The node is the state/position of the block. The child/adjacent nodes are the valid nodes which can be reached by moving block in any direction. Valid node is the node where both parts of the block are on valid squares. The solution is the path (history of actions) leading to target block position while target block position is block “standing” on top of the goal. Yeah, that’s why you need to play Bloxorz for a while to get a feeling what all those things mean.
Let’s start with a little bit of domain:
For functional purists: I have to admit that this domain model violates make illegal state unrepresentable principle. For example, we can define split
Bloxor while we said we are not handling them. We can also construct a
World when one or both (starting and target) positions are illegal. Although, I decided that domain design for Bloxor problem is not really a main topic here.
So, we have
Position which represents X and Y coordinates in Bloxor space. We have
Bloxor which is a pair of positions (as it consists of two pieces). We have
World which has starting position (
A), target position (
B) and a function answering the question: is given position legal (takes
Position and returns
Let’s define infinite
Please note that
IsValid always returns
true, which means any position is legal, we can roll our bloxor wherever we want. We will implement more complex worlds later but this one will allow us to have something to test our algorithm with.
Let’s define some functions around bloxor behaviour and state. First,
makeBloxor will allow us to create a standing bloxor at given position. This is a lot of text to create tuple of points
(p, p) but let’s make it official:
The other helper function would help us determine what is current bloxor orientation:
I assume you are familiar with active patterns, but even if you are not, please imagine three methods (
IsVertical) so similar that thay share single body… Well, it does not make it any clearer. I guess “Multi-Case Active Patterns” section on Chris Smith’s blog explains it better.
This active patterns deconstructs bloxor and recognizes its orientation: standing, horizontal or vertical. It may also throw an exception when bloxor is invalid (including split or just inverted). It requires some discipline how bloxor is constructed but as I said before, MISU principle is not a main concern here.
Next step is to define how bloxor moves. This is really messy function (all those
shiftY -2 -1 and
shiftX -1 -1) but describes all possible moves (
West) from all posible startings orientations (
IsVertical). It decides how bloxor coordinates should be adjusted (
let move = match bloxor, direction with ...) and returns new bloxor position (
The rules were slowly reconstructed from this picture:
We are ready to write solver function. The
solveWorld function will take
World is the problem to solve) and return
Move list option. We could alias
Move list as
Path and then returned type would be
Path option. It returns
option as it possible that solution cannot be found.
Let’s go slowly through DSL for this function:
isValid: bloxor position (deconstructed into
b) is valid when both pieces are valid;
isFinal: bloxor position (deconstructed into
b) is final when both pieces are in final position (meaning: we found the solution);
validMoves: try to move bloxor in all directions (
[North; South; East; West] |> Seq.map moveBloxor) creating pairs of new bloxor positions and move directions (note:
action) but filter out all invalid moves (
We are ready now solve bloxorz! The whole function is:
node: bloxor initial position at
idof: bloxor position unique identifier - it is just a string with coordinates of both pieces; because of structural equality in F# this is not really needed (built in
idwould be sufficient) but it wouldn’t work in Fable (
HashSetis not fully compatible);
fanout: it is actually
validMoves- for every bloxor position returns a sequence of new valid bloxor positions with appropriate moves (
Bloxor -> (Bloxor * Move) seq) which is exactly when we wanted for
'node -> ('node * 'action) seq).
So, the last expression is actually running it (
trace bfs idof fanout node) which returns a sequence of all paths, and finding first one (shortest!) leading to solutions (
Seq.tryFind/isFinal) and taking only the history part (
fun (_, actions) -> actions) and reverses it (
List.rev). If you’re fan of point-free style, this can be also expressed as:
Let’s try it first with inifinite world:
which is the correct answer: solution was found (
Some) and the path was
Let’s try a longer path, for example from
(0, 0) to
(9, 9). We expect it be 6 times
South then 6 times
East (or the other way around: 6 times
East then 6 times
which confirms our expectations.
As a last step, we can add ability to define more complex worlds. Let’s assume we would like to define world using ASCII, for example:
Note, space represents empty sqaure while
B are initial position and target position respectively.
We need to create a map of characters (
Map<int * int, char>) first:
This does not look too attractive as, in general, nested loops with indexes are not the prettiest constuct in functional languges. The general idea here is to iterate over lines (
world |> Seq.mapi) then over characters in line (
l |> Seq.mapi), flatten nested list (
Seq.collect) and create a map
(x, y) -> c (
Having such a map we can determine initial and target positions:
and define a function which will test if given position is valid:
Having these three bits of information we can construct a
Let’s put all those things together:
For OO fans I would like to emphasize, that you can think about
parseWorld as two constructors of two different objects implementing the same interface (
So, let’s try this example:
I can bet that’s exactly what we need (you can go to http://www.bloxorz.org.uk/ and use
918660 as stage code).
I wrote simple Fable app which shows results as animation:
- check it here
- or go to gist and paste it into fable.io/repl (you will be able to play with sources a little bit)
…you might ask. Good question. You need to watch this first:
We have solution space (two containers), we have allowed moves (fill, empty, transfer) and a goal (one container contains exactly 4 gallons). Well, this problem is identical to Bloxorz, isn’t it?
Let’s start with domain:
Jug: we have two jugs:
Action: possible actions are emptying a jug (
Empty), filling a jug (
Fill), or transferring water from one jug to another (
State: state of the system with current and maximum levels;
Let me try to draw it:
We can add some methods to
State to make working with this record a little bit easier:
Get: gets jug level;
Max: gets jug capacity;
Empty: empties given jug;
Fill: fills given jug;
Transfer: transfers water from one jug to another (until source jug is empty or target jug is full);
These three last methods (
Transfer) map directly to
Having all this we can attempt to save New York, and in principle, it is the same operation as solving Bloxorz:
We can identify visited states with
idof to avoid running in circles, we can transition to new states with
fanout by executing all the possible actions (
[Empty A; Empty B; Fill A; Fill B; Transfer (A, B); Transfer (B, A)]), and we can check if we reached our goal (
isDone). The last step is actually to scan solution space (
trace bfs idof fanout state), terminating search when appropriate (
Seq.tryFind) and reversing action history (as it is upside-down).
Hmm, let’s replay all actions to see if it really worked. To do that I will actually use some dreaded imperative code:
which generates readable description what has happened:
and, yes, in fact we end up with 4 gallons in jug B. New York is saved!
Please note, functional purists might say I should not use imperative loop with mutable variable, and use
Seq.scan |> Seq.iter instead. To be honest, I did that at first, but this imperative loop is actually much more readable even if not pure enough.
Let’s try something bigger:
Let’s see it it crashes, or hangs up when there is no solution:
Nope, it works just fine and returns
I really like the idea that three seemingly different problems (scanning folders, pushing Bloxorz through the hole and saving New York) are actually the same problem, and all of them can be solved with the same function and a little bit of “adaptation”.
I also really liked the idea that it is possible to present 4 (out of 5) SOLID principles (plus CRP) in 5 lines of code and the fact that they come quite naturally in FP. Oh irony! OO principles are intrinsic to FP.
Other than that? Well… I don’t think my day-to-day JSON-over-HTTP activities require a lot of “Saving New York”, but when they do, I’m ready :-)