Long Refactoring: Resetting the class model

With the inclusion of the Generators, the main functions seem to be in the right shape. What seems off is the naming of things. For example, we have a couple functions that take a parameter named “board” but it turns out they mean different things. The names of classes do not seem to align with what they are doing.

Lets continue with the renaming of things and see where it leads us.

But first: we can remove the exceptions to the flake8 testing, and, with the removal of some white space, we can run with all the pep8 rules intact. Now, on to renaming.

I am not going to go through each commit, but will try to provide enough context so you could follow the logic in the commits in github.

This is the last article in my Long Refactoring series. You can read the first article here.


Lets start with a simple one: since the collection is called board_strings, lets make the singular match:

diff --git a/tree_sudoku.py b/tree_sudoku.py
index 95d1026..b4fa5bf 100755
--- a/tree_sudoku.py
+++ b/tree_sudoku.py
@@ -62,8 +62,8 @@ class SudokuSolver:
 
     def strings_to_board_dict(self, board_strings):
         return_dict = {}
-        for index, board in enumerate(board_strings):
-            return_dict[str(index)] = build_board(board)
+        for index, board_string in enumerate(board_strings):
+            return_dict[str(index)] = build_board(board_string)
         return return_dict
 
 
@@ -80,8 +80,8 @@ def print_board(board):
             print('-' * 21)
 
 
-def build_board(board):
-    rows = re.findall(r"\d{9}", board)
+def build_board(board_string):
+    rows = re.findall(r"\d{9}", board_string)
     board_list = []
     for row in rows:
         row_list = []

This lets us distinguish between the strings pulled out of the file via the csv parser and the board we generate later as a list of lists…the Pythonic way of making a multidimensional array.

Here I made a decision that I later changed: I decided I wanted a Board class, and I introduced it. I did that by just creating the class in one commit, and in the next started moving code to that class. This is because I identified that the SudokuSolver class was really managing the list of puzzles, and I wanted to separate the logic for solving a single puzzle from the puzzle-list management. The build_board function became the __init__ function for the Board class.

Here is where my refactoring took an interesting turn: I decided that as part of determining what the different parameters meant, I should annotate them with type. To do this, I turned to the MyPy utility, and also type annotations as handled by python. First I added a type checker to the tox.ini managed environment:

diff --git a/tox.ini b/tox.ini
index 1252034..a69ac26 100644
--- a/tox.ini
+++ b/tox.ini
@@ -4,13 +4,16 @@
 # and then run "tox" from this directory.
 
 [tox]
-envlist = pep8,py312
- 
+envlist = pep8,type,py312
 skipsdist = True
 
 
 [testenv:pep8]
 commands = flake8
+
+[testenv:type]
+deps = mypy
+commands = mypy tree_sudoku.py
 
 [flake8]
 filename= *.py

With this change, I could start annotating type information:

diff --git a/tree_sudoku.py b/tree_sudoku.py
index 4b2ac5b..b79e782 100755
--- a/tree_sudoku.py
+++ b/tree_sudoku.py
@@ -10,14 +10,15 @@ import re
 import copy
 import time
 
+from typing import List
 
 BASIS = 3
 DIM = BASIS * BASIS
 MAX = DIM * DIM
 
 
-def import_csv():
-    list_of_boards = []
+def import_csv()->List[str]:
+    list_of_boards: List[str] = []
     with open('sample_sudoku_board_inputs.csv', 'r') as file:
         reader = csv.reader(file)
         for row in reader:
@@ -26,11 +27,11 @@ def import_csv():
 
 
 class Board:
-    def __init__(self, board_string):
+    def __init__(self, board_string: str):
         rows = re.findall(r"\d{9}", board_string)
         self.board_list = []
         for row in rows:
-            row_list = []
+            row_list: List[str] = []
             row_list[:0] = row
             self.board_list.append(row_list)
 
@@ -77,7 +78,7 @@ class SudokuSolver:
         return return_dict
 
 
-def print_board(board):
+def print_board(board: Board):
     for index1, row in enumerate(board.board_list):
         if index1 == 0 or index1 == 3 or index1 == 6:
             print('-' * 21)

Continuing on with this change required reordering class definitions in the file so that classes were defined before they were referenced, something the type-checking required, even if Python does not.

I also finally did they Pythonic thing and put the main behavior in a function. I could have removed the timing functions at this point, but instead I just made their output readable.

--- a/tree_sudoku.py
+++ b/tree_sudoku.py
@@ -184,10 +184,17 @@ class Tree_Node:
             self.possible_values = []
 
 
-start = time.time()
-solver = SudokuSolver(import_csv())
-for key, solution in solver.solved_board_strings.items():
-    print(f"Board: {key}")
-    print_board(solver.strings_to_board_dict([solution])['0'])
-
-end = time.time()
+def main():
+    start = time.time()
+    solver = SudokuSolver(import_csv())
+    for key, solution in solver.solved_board_strings.items():
+        print(f"Board: {key}")
+        print_board(solver.strings_to_board_dict([solution])['0'])
+    end = time.time()
+    print("start time = %f" % start)
+    print("end   time = %f" % end)
+    print("duration = %f" % (end - start))
+
+
+if __name__ == '__main__':
+    main()

I ended up pulling all of the logic out of the SudokuSolver class, and removing the class. Logic I wanted to keep, like the solve method I moved to the Board class. I also moved build_solution_string to Tree_node. Not all of these moves were to their final locations. I later moved build_solution_string from Tree_node to Board. In the process, the parameter list suggested to me a couple things: the Tree_Node should take a pointer to the Board.board_list in its constructor, and the two classes were named wrong. I eventually renamed Board to Solver after I had removed the original SudokuSolver class. The Tree_Node class really was representing a cell in the sudoku puzzle, so I renamed Tree_Node to Cell.

With that rename done, I added a board parameter to the Cell constructor. This was actually one of the larges refactorings I performed that was not merely reordering functions. This change required the modification of the Cell constructor, and implied that the functions that were later setting the board member variable should not longer do so, and no members of Cell should then take board in as a parameter. This required modifying the test_advance code as well. Even still, I think it is an understandable commit. You could make the argument that this one should have been split into two commits, and I probably would concede.

I also noticed that the next method was poorly named, as it implied movement, the way that advance and retreat did. Since all that function does is pop items off a list, I named it pop.

To complete the refactoring, I moved the two functions is_value_valid and is_set_valid to the Cell class.

I will include the final state of the python file below. I don’t think I will spend anymore time refactoring this, as I think it is in an understandable state now.

The class model now has two classes: a Solver takes a puzzle and solves if by iterating Cell by Cell through the board, advancing when the state is potentially solvable, and retreating when it detects a violation of the constrains. Is it still a tree? I am not certain it really ever was, but it still follows the rules of descent and backtracking.

#!/usr/bin/python
 
# turn each csv row into a board
# find what values can go into what spot
# create a tree trying to put in each value
# if value can not be possible, end stem, go back up the tree
# return the branch when tree is 81 layers deep, the board is filled
import csv
import re
import copy
import time
 
from typing import List
 
BASIS = 3
DIM = BASIS * BASIS
MAX = DIM * DIM
 
 
def import_csv() -> List[str]:
    list_of_boards: List[str] = []
    with open('sample_sudoku_board_inputs.csv', 'r') as file:
        reader = csv.reader(file)
        for row in reader:
            list_of_boards.append(str(*row))
    return list_of_boards
 
 
class Solver:
    def __init__(self, board_string: str):
        rows = re.findall(r"\d{9}", board_string)
        self.board_list = []
        for row in rows:
            row_list: List[str] = []
            row_list[:0] = row
            self.board_list.append(row_list)
 
    def build_solution_string(self, head_cell):
        return_string = ''
        curr_cell = head_cell
        return_string += str(curr_cell.value)
        while (curr_cell.next_cell):
            curr_cell = curr_cell.next_cell
            return_string += str(curr_cell.value)
        return return_string
 
    def solve(self):
        test_board = copy.deepcopy(self.board_list)
        head_cell = Cell(test_board, None, 0)
        curr_cell = head_cell
        while True:
            curr_cell.write()
            if curr_cell.is_value_valid():
                if curr_cell.index + 1 >= MAX:
                    break
                curr_cell = curr_cell.advance()
            else:
                # backtrack
                while len(curr_cell.possible_values) == 0:
                    curr_cell = curr_cell.retreat()
                curr_cell.pop()
        return self.build_solution_string(head_cell)
 
 
class Cell:
    def __init__(self, board, last_cell, index):
        self.board = board
        self.possible_values = possible_values()
        self.value = self.possible_values.pop()
        (self.row, self.col) = index_to_row_col(index)
        self.last_cell = last_cell
        self.next_cell = None
        self.index = index
        self.old_value = None
 
    def advance(self):
        new_cell = Cell(self.board, self, self.index + 1)
        new_cell.check_solved()
        self.next_cell = new_cell
        return new_cell
 
    def retreat(self):
        self.board[self.row][self.col] = self.old_value
        cell = self.last_cell
        cell.next_cell = None
        return cell
 
    def pop(self):
        self.value = self.possible_values.pop()
 
    def __str__(self):
        return self.value
 
    def write(self):
        if self.old_value is None:
            self.old_value = self.board[self.row][self.col]
        self.board[self.row][self.col] = self.value
 
    def check_solved(self):
        if self.board[self.row][self.col] != '0':
            self.value = self.board[self.row][self.col]
            self.possible_values = []
 
    def is_set_valid(self, generator):
        box = possible_values()
        for (row, column) in generator(self.row, self.col):
            number = self.board[row][column]
            if number == '0':
                continue
            if number in box:
                box.remove(number)
            else:
                return False
        return True
 
    def is_value_valid(self):
        if not self.is_set_valid(column_generator):
            return False
        if not self.is_set_valid(row_generator):
            return False
        return self.is_set_valid(box_generator)
 
 
def strings_to_board_dict(board_strings):
    return_dict = {}
    for index, board_string in enumerate(board_strings):
        return_dict[str(index)] = Solver(board_string)
    return return_dict
 
 
def print_board(solver: Solver):
    for index1, row in enumerate(solver.board_list):
        if index1 == 0 or index1 == 3 or index1 == 6:
            print('-' * 21)
        for index, char in enumerate(row):
            print(char, '', end='')
            if index == 2 or index == 5:
                print('| ', end='')
        print('')
        if index1 == 8:
            print('-' * 21)
 
 
def possible_values():
    values = []
    for index in range(1, DIM + 1):
        values.append('%d' % index)
    return values
 
 
def column_generator(row, col):
    for i in range(0, DIM):
        yield (i, col)
 
 
def row_generator(row, col):
    for i in range(0, DIM):
        yield (row, i)
 
 
def box_generator(row, col):
    row_mod = row % BASIS
    start_row = row - row_mod
    col_mod = col % BASIS
    start_col = col - col_mod
    for i in range(0, BASIS):
        for j in range(0, BASIS):
            yield (start_row + i, start_col + j)
 
 
def index_to_row_col(index):
    col = int(index % DIM)
    row = int((index - col) / DIM)
    return (row, col)
 
 
def main():
    start = time.time()
    board_strings = import_csv()
    boards_dict = strings_to_board_dict(board_strings)
    solved_board_strings = dict()
    for key, value in boards_dict.items():
        return_string = value.solve()
        solved_board_strings[key] = return_string
 
    for key, solution in solved_board_strings.items():
        print(f"Board: {key}")
        print_board(strings_to_board_dict([solution])['0'])
    end = time.time()
    print("start time = %f" % start)
    print("end   time = %f" % end)
    print("duration = %f" % (end - start))
 
 
if __name__ == '__main__':
    main()

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.