With the inclusion of the Generators, the main functions seem to be in the right shape. What seems off is the naming of things. For example, we have a couple functions that take a parameter named “board” but it turns out they mean different things. The names of classes do not seem to align with what they are doing.
Lets continue with the renaming of things and see where it leads us.
But first: we can remove the exceptions to the flake8 testing, and, with the removal of some white space, we can run with all the pep8 rules intact. Now, on to renaming.
I am not going to go through each commit, but will try to provide enough context so you could follow the logic in the commits in github.
This is the last article in my Long Refactoring series. You can read the first article here.
Lets start with a simple one: since the collection is called board_strings, lets make the singular match:
diff --git a/tree_sudoku.py b/tree_sudoku.py index 95d1026..b4fa5bf 100755 --- a/tree_sudoku.py +++ b/tree_sudoku.py @@ -62,8 +62,8 @@ class SudokuSolver: def strings_to_board_dict(self, board_strings): return_dict = {} - for index, board in enumerate(board_strings): - return_dict[str(index)] = build_board(board) + for index, board_string in enumerate(board_strings): + return_dict[str(index)] = build_board(board_string) return return_dict @@ -80,8 +80,8 @@ def print_board(board): print('-' * 21) -def build_board(board): - rows = re.findall(r"\d{9}", board) +def build_board(board_string): + rows = re.findall(r"\d{9}", board_string) board_list = [] for row in rows: row_list = [] |
This lets us distinguish between the strings pulled out of the file via the csv parser and the board we generate later as a list of lists…the Pythonic way of making a multidimensional array.
Here I made a decision that I later changed: I decided I wanted a Board class, and I introduced it. I did that by just creating the class in one commit, and in the next started moving code to that class. This is because I identified that the SudokuSolver class was really managing the list of puzzles, and I wanted to separate the logic for solving a single puzzle from the puzzle-list management. The build_board function became the __init__ function for the Board class.
Here is where my refactoring took an interesting turn: I decided that as part of determining what the different parameters meant, I should annotate them with type. To do this, I turned to the MyPy utility, and also type annotations as handled by python. First I added a type checker to the tox.ini managed environment:
diff --git a/tox.ini b/tox.ini index 1252034..a69ac26 100644 --- a/tox.ini +++ b/tox.ini @@ -4,13 +4,16 @@ # and then run "tox" from this directory. [tox] -envlist = pep8,py312 - +envlist = pep8,type,py312 skipsdist = True [testenv:pep8] commands = flake8 + +[testenv:type] +deps = mypy +commands = mypy tree_sudoku.py [flake8] filename= *.py |
With this change, I could start annotating type information:
diff --git a/tree_sudoku.py b/tree_sudoku.py index 4b2ac5b..b79e782 100755 --- a/tree_sudoku.py +++ b/tree_sudoku.py @@ -10,14 +10,15 @@ import re import copy import time +from typing import List BASIS = 3 DIM = BASIS * BASIS MAX = DIM * DIM -def import_csv(): - list_of_boards = [] +def import_csv()->List[str]: + list_of_boards: List[str] = [] with open('sample_sudoku_board_inputs.csv', 'r') as file: reader = csv.reader(file) for row in reader: @@ -26,11 +27,11 @@ def import_csv(): class Board: - def __init__(self, board_string): + def __init__(self, board_string: str): rows = re.findall(r"\d{9}", board_string) self.board_list = [] for row in rows: - row_list = [] + row_list: List[str] = [] row_list[:0] = row self.board_list.append(row_list) @@ -77,7 +78,7 @@ class SudokuSolver: return return_dict -def print_board(board): +def print_board(board: Board): for index1, row in enumerate(board.board_list): if index1 == 0 or index1 == 3 or index1 == 6: print('-' * 21) |
Continuing on with this change required reordering class definitions in the file so that classes were defined before they were referenced, something the type-checking required, even if Python does not.
I also finally did they Pythonic thing and put the main behavior in a function. I could have removed the timing functions at this point, but instead I just made their output readable.
--- a/tree_sudoku.py +++ b/tree_sudoku.py @@ -184,10 +184,17 @@ class Tree_Node: self.possible_values = [] -start = time.time() -solver = SudokuSolver(import_csv()) -for key, solution in solver.solved_board_strings.items(): - print(f"Board: {key}") - print_board(solver.strings_to_board_dict([solution])['0']) - -end = time.time() +def main(): + start = time.time() + solver = SudokuSolver(import_csv()) + for key, solution in solver.solved_board_strings.items(): + print(f"Board: {key}") + print_board(solver.strings_to_board_dict([solution])['0']) + end = time.time() + print("start time = %f" % start) + print("end time = %f" % end) + print("duration = %f" % (end - start)) + + +if __name__ == '__main__': + main() |
I ended up pulling all of the logic out of the SudokuSolver class, and removing the class. Logic I wanted to keep, like the solve method I moved to the Board class. I also moved build_solution_string to Tree_node. Not all of these moves were to their final locations. I later moved build_solution_string from Tree_node to Board. In the process, the parameter list suggested to me a couple things: the Tree_Node should take a pointer to the Board.board_list in its constructor, and the two classes were named wrong. I eventually renamed Board to Solver after I had removed the original SudokuSolver class. The Tree_Node class really was representing a cell in the sudoku puzzle, so I renamed Tree_Node to Cell.
With that rename done, I added a board parameter to the Cell constructor. This was actually one of the larges refactorings I performed that was not merely reordering functions. This change required the modification of the Cell constructor, and implied that the functions that were later setting the board member variable should not longer do so, and no members of Cell should then take board in as a parameter. This required modifying the test_advance code as well. Even still, I think it is an understandable commit. You could make the argument that this one should have been split into two commits, and I probably would concede.
I also noticed that the next method was poorly named, as it implied movement, the way that advance and retreat did. Since all that function does is pop items off a list, I named it pop.
To complete the refactoring, I moved the two functions is_value_valid and is_set_valid to the Cell class.
I will include the final state of the python file below. I don’t think I will spend anymore time refactoring this, as I think it is in an understandable state now.
The class model now has two classes: a Solver takes a puzzle and solves if by iterating Cell by Cell through the board, advancing when the state is potentially solvable, and retreating when it detects a violation of the constrains. Is it still a tree? I am not certain it really ever was, but it still follows the rules of descent and backtracking.
#!/usr/bin/python # turn each csv row into a board # find what values can go into what spot # create a tree trying to put in each value # if value can not be possible, end stem, go back up the tree # return the branch when tree is 81 layers deep, the board is filled import csv import re import copy import time from typing import List BASIS: int = 3 DIM: int = BASIS * BASIS MAX: int = DIM * DIM def import_csv() -> List[str]: list_of_boards: List[str] = [] with open('sample_sudoku_board_inputs.csv', 'r') as file: reader = csv.reader(file) for row in reader: list_of_boards.append(str(*row)) return list_of_boards class Cell: def __init__(self, board, last_cell, index: int): self.board = board self.possible_values = possible_values() self.value = self.possible_values.pop() (self.row, self.col) = index_to_row_col(index) self.last_cell = last_cell self.next_cell = None self.index = index self.old_value = None def advance(self): new_cell = Cell(self.board, self, self.index + 1) new_cell.check_solved() self.next_cell = new_cell return new_cell def retreat(self): self.board[self.row][self.col] = self.old_value cell = self.last_cell cell.next_cell = None return cell def pop(self): self.value = self.possible_values.pop() def __str__(self): return self.value def write(self): if self.old_value is None: self.old_value = self.board[self.row][self.col] self.board[self.row][self.col] = self.value def check_solved(self): if self.board[self.row][self.col] != '0': self.value = self.board[self.row][self.col] self.possible_values = [] def is_set_valid(self, generator) -> bool: box = possible_values() for (row, column) in generator(self.row, self.col): number = self.board[row][column] if number == '0': continue if number in box: box.remove(number) else: return False return True def is_value_valid(self): if not self.is_set_valid(column_generator): return False if not self.is_set_valid(row_generator): return False return self.is_set_valid(box_generator) class Solver: def __init__(self, board_string: str): rows = re.findall(r"\d{9}", board_string) self.board_list = [] for row in rows: row_list: List[str] = [] row_list[:0] = row self.board_list.append(row_list) def build_solution_string(self, head_cell: Cell): return_string = '' curr_cell = head_cell return_string += str(curr_cell.value) while (curr_cell.next_cell): curr_cell = curr_cell.next_cell return_string += str(curr_cell.value) return return_string def solve(self): test_board = copy.deepcopy(self.board_list) head_cell = Cell(test_board, None, 0) curr_cell = head_cell while True: curr_cell.write() if curr_cell.is_value_valid(): if curr_cell.index + 1 >= MAX: break curr_cell = curr_cell.advance() else: # backtrack while len(curr_cell.possible_values) == 0: curr_cell = curr_cell.retreat() curr_cell.pop() return self.build_solution_string(head_cell) def strings_to_board_dict(board_strings): return_dict = {} for index, board_string in enumerate(board_strings): return_dict[str(index)] = Solver(board_string) def print_board(solver: Solver): for index1, row in enumerate(solver.board_list): if index1 == 0 or index1 == 3 or index1 == 6: print('-' * 21) for index, char in enumerate(row): print(char, '', end='') if index == 2 or index == 5: print('| ', end='') print('') if index1 == 8: print('-' * 21) def possible_values() -> List[str]: values = [] for index in range(1, DIM + 1): values.append('%d' % index) return values def column_generator(row: int, col: int): for i in range(0, DIM): yield (i, col) def row_generator(row: int, col: int): for i in range(0, DIM): yield (row, i) def box_generator(row: int, col: int): row_mod = row % BASIS start_row = row - row_mod col_mod = col % BASIS start_col = col - col_mod for i in range(0, BASIS): for j in range(0, BASIS): yield (start_row + i, start_col + j) def index_to_row_col(index: int): col = int(index % DIM) row = int((index - col) / DIM) return (row, col) def main(): start = time.time() board_strings = import_csv() boards_dict = strings_to_board_dict(board_strings) solved_board_strings = dict() for key, value in boards_dict.items(): return_string = value.solve() solved_board_strings[key] = return_string for key, solution in solved_board_strings.items(): print(f"Board: {key}") print_board(strings_to_board_dict([solution])['0']) end = time.time() print("start time = %f" % start) print("end time = %f" % end) print("duration = %f" % (end - start)) if __name__ == '__main__': main() |
Edit: I added more type checking and reordered the classes so Cell is before Solver.