While the main solving function is now cleaned up, I am not very happy with the code that checks the validation on each cell.
I am not going to go through each step here, as the pattern is well established now. Pull out of the function anything that is a distractor. Once you have the heart of the algorithm, clean it up. Then put things back together.
Here is what the is_value_valid function looks like before refactoring:
def is_value_valid(self, board, node): return self.value_valid(board, node.row, node.col) def value_valid(self, board, row_index, column_index): row = possible_values() column = possible_values() square = possible_values() for number in board[row_index]: if number == '0': continue if number in row: row.remove(number) else: return False for a_row in range(DIM): number = board[a_row][column_index] if number == '0': continue if number in column: column.remove(number) else: return False box_indexes = self.table[ self.find_box_of_index( str(row_index) + str(column_index))] for index in box_indexes: row = int(index[0]) column = int(index[1]) number = board[row][column] if number == '0': continue if number in square: square.remove(number) else: return False return True |
What is peering out at us here is three very similar checks to see if a number is duplicated in a row, a column, or a box.
Here is the end state of is_value_valid:
def is_value_valid(board, node): if not is_set_valid(board, node.row, node.col, column_generator): return False if not is_set_valid(board, node.row, node.col, row_generator): return False return is_set_valid(board, node.row, node.col, box_generator) |
The only thing that differs when checking a row, a column, or a box is the iterator used to figure out what value is next. We use the term set to to collect all three of those into a common abstraction: each one of those grouping contain a set of the digits 1 to 9 with no repeats. Here is the code for is_set_valid:
def is_set_valid(board, row_index, column_index, generator): box = possible_values() for (row, column) in generator(row_index, column_index): number = board[row][column] if number == '0': continue if number in box: box.remove(number) else: return False return True |
This is the code that was repeated in the original value_valid method function. Here are the three generators:
def column_generator(row, col): for i in range(0, DIM): yield (i, col) def row_generator(row, col): for i in range(0, DIM): yield (row, i) def box_generator(row, col): row_mod = row % BASIS start_row = row - row_mod col_mod = col % BASIS start_col = col - col_mod for i in range(0, BASIS): for j in range(0, BASIS): yield (start_row + i, start_col + j) |
Note that they have the same parameter list. This is an implementation of the strategy design pattern: vary the algorithm that is passed into the validation function.
I made this transformation in many little steps. Here are the comments in the git log for each of them.
- rename square to box
- extract method is_row_valid
- extract method is_col_valid
- move is_row/col_valid method calls to is_value_valid method
- rename value_valid is is_box_valid
- move method is_box_valid to top level function
- move is_value_valid to top level function
- replace BoxIndex table lookup with generator
- remove class BoxIndex
- remove unnecessary index variable
- replace board_index_table lookup with function index_to_row_col
- remove BoardIndexTable
- common parameter list for is_x_valid functions
- pass generator in to is_set_valid
- use generators for row and column validation
- remove class BoxIndex
- remove validators for row and column that now use common code
I was not able to see this path from the start. Instead, I had to first simplify the value_valid method until it only did one thing. This is an analogue to disassembling a machine before I can work on its innards: I had to make sure I kept track of the parts I was going to want when I got to reassembling it. As the process progresses, I started making a bunch of smaller functions, with the hope of organizing the code in a better way at the end.
The major code reductions happened in the middle of the process. The original code did a couple of table lookups to convert from coordinates to cells. These lookups made use of the BoardIndexTable and the BoxIndex. Since the lookups could be easily replaced with mathematical computations, removing them reduced the amount of code significantly, and made the overall flow of the program easier to understand…at least for me. I ended up adding test_index_to_row_col. This test allowed me to isolate the new behavior I was adding to perform the math.
def test_index_to_row_col(): (row, col) = tree_sudoku.index_to_row_col(0) assert (row == 0) assert (col == 0) (row, col) = tree_sudoku.index_to_row_col(80) assert (row == 8) assert (col == 8) |
In other code, it might be a major performance benefit to do a table lookup instead of a more complex computation. Again, the strategy pattern could help here to allow you to swap in which ever approach makes sense in a given situation.
In extracting the functions, I ended up with code that was significantly less object-based than I started with. This function decomposition we driven by the need to pull behavior apart that did not belong together. I think there is still an object model in there waiting to emerge, but the code needs further teasing apart before it can be put back together in a different way.