Long Refactoring: Backtracking

Now that I have cleaned the loop up somewhat, we can continue with the process of refactoring the code. This is a continuation to the article series I started here.

This next step really moves beyond refactoring. I have identified that the intended backtracking was not actually implemented in the code. Instead, The whole board is wiped and restarted many times, causing a decent slowdown in execution. My refactoring process thus far has allowed me to understand the code well enough to attempt and improvement.

Lets get a before time hack:

time python tree_sudoku.py
...
real	0m8.289s
user	0m8.195s
sys	0m0.020s

Considering the problem set and computer power available, I think this should be almost instantaneous.  8 Seconds indicates a problem.

The Bug I identified earlier is that the retreat function does not reset the board to its original state. First I will re-enable the failing test:

diff --git a/test_tree_sudoku.py b/test_tree_sudoku.py
index 462b75b..c18de03 100755
--- a/test_tree_sudoku.py
+++ b/test_tree_sudoku.py
@@ -60,7 +60,7 @@ def test_advance():
     node.write(test_board)
     assert (test_board[0][3] == '9')
     back_node = node.retreat()
-    # assert (test_board[0][3] == '0')
+    assert (test_board[0][3] == '0')
     assert (node.value == "9")
     back_node.write(test_board)
     assert (test_board[0][2] == '3')

In order to return the old value, we need to record it. This is probably the first time we see the real benefit of extracting the write method: we change it once and get the change executed everywhere it is called.

diff --git a/tree_sudoku.py b/tree_sudoku.py
index de5c7fe..a161a99 100755
--- a/tree_sudoku.py
+++ b/tree_sudoku.py
@@ -202,6 +202,7 @@ class Tree_Node:
         return new_node
 
     def retreat(self):
+        self.board[self.row][self.col] = self.old_value
         node = self.last_node
         node.next_node = None
         return node
@@ -213,6 +214,8 @@ class Tree_Node:
         return self.value
 
     def write(self, board):
+        self.board = board
+        self.old_value = board[self.row][self.col]
         board[self.row][self.col] = self.value
 
     def check_solved(self, board):

Putting the board in the parameter list to remember it is a hack: this dependency should be in the constructor. However, you can see that the board is changed when we restart the loop. But maybe now it doesn’t have to be….lets see. If I make the following change:

diff --git a/tree_sudoku.py b/tree_sudoku.py
index a161a99..dadb004 100755
--- a/tree_sudoku.py
+++ b/tree_sudoku.py
@@ -39,9 +39,9 @@ class SudokuSolver:
         index = 0
         head_node = Tree_Node(None, index)
         curr_node = head_node
+        test_board = copy.deepcopy(original_board)
         while True:
             filler = head_node
-            test_board = copy.deepcopy(original_board)
             filler.write(test_board)
             while filler.next_node:
                 filler = filler.next_node

Things break. Specifically, the retreat function attempts to retreat beyond the start of the first node, and we get an exception. Hmmm.

OK, Before I tackle that, I see some simplification I can do. The if and the while statement around retreat are doing the same check. Remove the if.

diff --git a/tree_sudoku.py b/tree_sudoku.py
index a161a99..e5209c8 100755
--- a/tree_sudoku.py
+++ b/tree_sudoku.py
@@ -53,10 +53,9 @@ class SudokuSolver:
                 curr_node = curr_node.advance(test_board)
                 curr_node.check_solved(test_board)
             else:
-                if len(curr_node.possible_values) == 0:
-                    # backtrack
-                    while len(curr_node.possible_values) == 0:
-                        curr_node = curr_node.retreat()
+                # backtrack
+                while len(curr_node.possible_values) == 0:
+                    curr_node = curr_node.retreat()
                 curr_node.next()
         return self.build_solution_string(head_node)

OK, back to the looping. Note that the looping is based on the the array of possible_values set in the constructor of the Tree_Node. This holds the set of values that are not yet tried.

But it seems like they should get reset when we retreat. Retreat means we have Exhausted all the possibilities forward of the point that the node points to, and we are going to try the next available option in a previous node. The reason this works is we create a new node each time we Advance…this is wasteful, but not the problem at hand.

Why does Retreat stop when the deepcopy happens each time, but not when it happens once? This would be hard to debug interactively, but we can spew log information and see if it helps generates a hypothesis. Here are the first and last couple lines of output:

write  row = 0, col = 0 value = 9 
write  row = 0, col = 0 value = 9 
write  row = 0, col = 0 value = 8 
write  row = 0, col = 0 value = 8 
write  row = 0, col = 0 value = 7 
write  row = 0, col = 0 value = 7 
write  row = 0, col = 0 value = 6 
write  row = 0, col = 0 value = 6 
write  row = 0, col = 0 value = 5 
write  row = 0, col = 0 value = 5 
write  row = 0, col = 0 value = 5 
write  row = 0, col = 1 value = 9 

....

retreat row = 0, col = 1 value = 1 old value = 1
write  row = 0, col = 0 value = 4 
write  row = 0, col = 0 value = 4 
write  row = 0, col = 0 value = 3 
write  row = 0, col = 0 value = 3 
write  row = 0, col = 0 value = 2 
write  row = 0, col = 0 value = 2 
write  row = 0, col = 0 value = 1 
write  row = 0, col = 0 value = 1 
retreat row = 0, col = 0 value = 1 old value = 1

That first set of entries looks wrong: once it writes a value, it should immediately move to the next cell. Instead we see that both row and col stay the same.

I just looked back over the original code to see if I changed the semanitcs of row/col from what the longer version used: No. They always used the board_spot value attached to the node in the constructor. Here it is from the very first version, pre-refactoring:

test_board[int(current_board_filling_node.board_spot[0])][int(current_board_filling_node.board_spot[1])] = current_board_filling_node.value

So pre-calculating row and col is just an optimization, not a change in logic.

The check in the while loop that causes it to exit is :

while len(curr_node.possible_values) == 0:

This array is reduced in the call to check_solved.

    def check_solved(self, board):
        if board[self.row][self.col] != '0':
            self.value = board[self.row][self.col]
            self.possible_values = []

If that function is never called for the first node, it will still have a length left. That function gets called in the advance function. Ig gets called immediately after that function as well, which means we can probably remove one of the changes. Lets revert to the HEAD and do that clean up.

diff --git a/tree_sudoku.py b/tree_sudoku.py
index e5209c8..61a8d6a 100755
--- a/tree_sudoku.py
+++ b/tree_sudoku.py
@@ -51,7 +51,6 @@ class SudokuSolver:
                 if curr_node.index + 1 >= MAX:
                     break
                 curr_node = curr_node.advance(test_board)
-                curr_node.check_solved(test_board)
             else:
                 # backtrack
                 while len(curr_node.possible_values) == 0:

It works.

OK, back to the loop issue. That deep copy offends me. It should not be necessary if we are cleaning up after ourselves. But the thing that puzzles me is the loop:

            while filler.next_node:
                filler = filler.next_node
                filler.write(test_board)

The nodes it is looping through have values in them because we wrote to them last pass. In delete we removed all nodes past the high-water mark. So the first node we hit should be one that

But I just noticed another duplicate write…Let me remove that before progressing.

diff --git a/tree_sudoku.py b/tree_sudoku.py
index 61a8d6a..cb920ed 100755
--- a/tree_sudoku.py
+++ b/tree_sudoku.py
@@ -46,7 +46,6 @@ class SudokuSolver:
             while filler.next_node:
                 filler = filler.next_node
                 filler.write(test_board)
-            filler.write(test_board)
             if self.box_index.is_value_valid(test_board, curr_node):
                 if curr_node.index + 1 >= MAX:
                     break

OK. So lets looks at the line that throws the exception:

               # backtrack
                while len(curr_node.possible_values) == 0:
                    curr_node = curr_node.retreat()
                curr_node.next()

curr_node.next() pops the value out of the array of potential solutions.

The error seems to be when we go off the beginning of the list: this feels like it should be an error condition: we were given an unsolvable problem. I think, though, that something must be resetting the list

Maybe we can fix it by checking for None. I don’t like it, as I don’t know why we are going off the list, but maybe it is sufficient, for now, to not go off the list. It seems like the next() call should fail for the same reason…but it doesn’t.

diff --git a/tree_sudoku.py b/tree_sudoku.py
index cb920ed..939c1da 100755
--- a/tree_sudoku.py
+++ b/tree_sudoku.py
@@ -54,6 +54,9 @@ class SudokuSolver:
                 # backtrack
                 while len(curr_node.possible_values) == 0:
                     curr_node = curr_node.retreat()
+                    if curr_node is None:
+                        break
+
                 curr_node.next()
         return self.build_solution_string(head_node)

maybe the break statement is breaking out of multiple levels of the loop? But then our tests should fail. Lets try warning if we try to retreat past the beginning.

diff --git a/tree_sudoku.py b/tree_sudoku.py
index cb920ed..1cf2817 100755
--- a/tree_sudoku.py
+++ b/tree_sudoku.py
@@ -39,9 +39,9 @@ class SudokuSolver:
         index = 0
         head_node = Tree_Node(None, index)
         curr_node = head_node
+        test_board = copy.deepcopy(original_board)
         while True:
             filler = head_node
-            test_board = copy.deepcopy(original_board)
             filler.write(test_board)
             while filler.next_node:
                 filler = filler.next_node
@@ -201,6 +201,9 @@ class Tree_Node:
     def retreat(self):
         self.board[self.row][self.col] = self.old_value
         node = self.last_node
+        if node is None:
+            print("off the rails row = %d col = %d" % (self.row, self.col))
+            print (self.board)
         node.next_node = None
         return node

Yep, that reports:


off the rails row = 0 col = 0
[['2', '2', '3', '2', '2', '2', '6', '2', '0'], ['9', '0', '0', '3', '0', '5', '0', '0', '1'], ['0', '0', '1', '8', '0', '6', '4', '0', '0'], ['0', '0', '8', '1', '0', '2', '9', '0', '0'], ['7', '0', '0', '0', '0', '0', '0', '0', '8'], ['0', '0', '6', '7', '0', '8', '2', '0', '0'], ['0', '0', '2', '6', '0', '9', '5', '0', '0'], ['8', '0', '0', '2', '0', '3', '0', '0', '9'], ['0', '0', '5', '0', '1', '0', '3', '0', '0']]

Something about the dirty board is causing it to go off the rails. The puzzle starts out like this: 003020600900305. What should be clear is that the 0s did not get replaced in the places where there are 2s in the first row. Obviously, that many 2s in a row should not have gotten that far, either.

AHA….I was replacing the value every time write was called. But what needs to happen is to reset it to the board original value. While the Node gets deleted each time, if the board stays around, there is no way to reset it to its original value. The hacky way to enforce this is:

 
--- a/tree_sudoku.py
+++ b/tree_sudoku.py
@@ -191,6 +191,7 @@ class Tree_Node:
         self.next_node = None
         self.value = '9'
         self.index = index
+        self.old_value = None
 
     def advance(self, test_board):
         new_node = Tree_Node(self, self.index + 1)
@@ -212,7 +213,8 @@ class Tree_Node:
 
     def write(self, board):
         self.board = board
-        self.old_value = board[self.row][self.col]
+        if self.old_value is None:
+            self.old_value = board[self.row][self.col]
         board[self.row][self.col] = self.value
 
     def check_solved(self, board):

With that change, we can persist the change that we only deepcopy the board once.

If that is correct, then we can also do away with all the board filling done in the early stages of the algorithm. What happens if we remove all the filling, and just use curr_node to write the value it is holding?

--- a/tree_sudoku.py
+++ b/tree_sudoku.py
@@ -36,16 +36,12 @@ class SudokuSolver:
             self.solved_board_strings[key] = return_string
 
     def tree_to_solution_string(self, original_board):
         index = 0
         test_board = copy.deepcopy(original_board)
         head_node = Tree_Node(None, index)
         curr_node = head_node
         while True:
-            filler = head_node
-            filler.write(test_board)
-            while filler.next_node:
-                filler = filler.next_node
-                filler.write(test_board)
+            curr_node.write(test_board)
             if self.box_index.is_value_valid(test_board, curr_node):
                 if curr_node.index + 1 >= MAX:
                     break

It works. How long does it take:

real	0m0.776s
user	0m0.741s
sys	0m0.020s

Less than a second per run. We cut the time it takes to run this algorithm by a factor of 10. This makes sense now that we understand how much extra work it was doing on the first pass. Speeding up the time to test means you are less likely to get distracted on each cycle and drop out of the zone.

This article is a bit longer than I anticipated, because I fooled myself in my initial implementation of the retreat function. In retrospect, I could have written a unit test for retreat, and, if I had made it exhaustive, I might have found the problem. That is likely the case for all of the internal functions I wrote here: unit tests for inner functions tell you that they do what you expect.

Note that the code I produced here does not line up with the final code in the original article series. I seem to have lost that, and there are some pieces I do not exactly know what I did in some of the helper functions. I hope there is enough context to figure out what you would do if you were to continue on this refactoring yourself.

I have posted all of the steps of the refactoring to github.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.