A Long Refactoring: Introduction

Congratulations, you got your code to run! You are done! Ship it!. Just don’t expect to be able to read it next month. You need to maintain it. You need to add new features. It is a mess.

Give yourself a well deserved break. Then come back to it.

I have a piece of code written for that very common job interview question: write a Sudoku solver. Jarrett, A Code Platoon student I am mentoring, wrote it for his project, and he got it to work. I think it is an ideal candidate for a refactoring story. Over the next few posts, I am going to work through how I would tackle this.

It works. At 180 lines, it is complex enough that the mind cannot hold on to the entire structure at once. Here is the code in its current form.

In order to run the code, you need some sample data. I’ll post that afterwards

Here is tree_sudoku.py

#turn each csv row into a board
#find what values can go into what spot
#create a tree trying to put in each value
#if value can not be possible, end stem, go back up the tree
#retrun the branch when tree is 81 layers deep, the board is filled
import csv
import re
import copy
import time
class SudokuSolver:
    def __init__(self):
        self.board_strings = self.import_csv()
        self.boards_dict = self.strings_to_board_dict(self.board_strings)
        self.box_index_table = self.fill_box_index_table()
        self.board_index_table = self.fill_board_index_table()
        self.solved_board_strings = []
        for key, value in self.boards_dict.items():
            print(f"Board: {key}")
            self.solved_board_strings.append([self.tree_to_solution_string(value)])
    def tree_to_solution_string(self, original_board):
        index = 0
        head_node = Tree_Node(index, self.board_index_table[index])
        current_node = head_node
        while index < 81:
            current_board_filling_node = head_node
            test_board = copy.deepcopy(original_board)
            test_board[int(current_board_filling_node.board_spot[0])][int(current_board_filling_node.board_spot[1])] = current_board_filling_node.value
            while current_board_filling_node.next_node:
                current_board_filling_node = current_board_filling_node.next_node
                test_board[int(current_board_filling_node.board_spot[0])][int(current_board_filling_node.board_spot[1])] = current_board_filling_node.value
            if self.value_valid(test_board, int(current_node.board_spot[0]), int(current_node.board_spot[1])):
                index += 1
                if index >= 81:
                    continue
                new_node = Tree_Node(index, self.board_index_table[index], current_node)
                if test_board[int(new_node.board_spot[0])][int(new_node.board_spot[1])] != '0':
                    new_node.value = test_board[int(new_node.board_spot[0])][int(new_node.board_spot[1])]
                    new_node.possible_values = []
                current_node.next_node = new_node
                current_node = new_node
            else:
                if len(current_node.possible_values) == 0:
                    while len(current_node.possible_values) == 0:
                        current_node = current_node.last_node
                        current_node.next_node = None
                        index -= 1
                    current_node.next()
                else:
                    current_node.next()
 
        return_string = ''
        current_node = head_node
        return_string += str(current_node.value)
        while(current_node.next_node):
            current_node = current_node.next_node
            return_string += str(current_node.value)
        self.print_board(self.strings_to_board_dict([return_string])['0'])
        return return_string
    def import_csv(self):
        list_of_boards = []
        with open('sample_sudoku_board_inputs.csv', 'r') as file:
            reader = csv.reader(file)
            for row in reader:
                list_of_boards.append(str(*row))
        return list_of_boards
 
    def strings_to_board_dict(self, board_strings):
        return_dict = {}
        for index, board in enumerate(board_strings):
            rows = re.findall(r"\d{9}", board)
            board_list = []
            for row in rows:
                row_list = []
                row_list[:0] = row
                board_list.append(row_list)
            return_dict[str(index)] = board_list
        return return_dict
 
    def print_board(self, board):
        for index1, row in enumerate(board):
            if index1 == 0 or index1 == 3 or index1 == 6:
                print('-' * 21)
            for index, char in enumerate(row):
                print(char, '', end='')
                if index == 2 or index == 5:
                    print('| ', end = '')
            print('')
            if index1 == 8:
                print('-' * 21)
 
    def value_valid(self, board, row_index, column_index):
        row = ['1', '2', '3', '4', '5', '6', '7', '8', '9']
        column = ['1', '2', '3', '4', '5', '6', '7', '8', '9']
        square = ['1', '2', '3', '4', '5', '6', '7', '8', '9']
        for number in board[row_index]:
            if number == '0':
                continue
            if number in row:
                row.remove(number)
            else:
                return False
        for a_row in range(9):
            number = board[a_row][column_index]
            if number == '0':
                continue
            if number in column:
                column.remove(number)
            else:
                return False
        box_indexes = self.box_index_table[self.find_box_of_index(str(row_index) + str(column_index))]
        for index in box_indexes:
            row = int(index[0])
            column = int(index[1])
            number = board[row][column]
            if number == '0':
                continue
            if number in square:
                square.remove(number)
            else:
                return False
 
        return True
 
    def find_box_of_index(self, index):
        box = 'box not found'
        for each_box in self.box_index_table:
            if index in self.box_index_table[each_box]:
                box = each_box
                break
        return box
    def fill_box_index_table(self):
        boxes = {}
        box_center = [1, 1]
        box_number = 0
        for _row_of_boxes in range(3):
            for _each_box in range(3):
                box_list = []
                for i in range(-1, 2):
                    box_list.append(str(box_center[0] + i) + str(box_center[1] - 1))
                    box_list.append(str(box_center[0] + i) + str(box_center[1]))
                    box_list.append(str(box_center[0] + i) + str(box_center[1] + 1))
                boxes[box_number] = box_list
                box_number += 1
                box_center[1] += 3
            box_center[0] += 3
            box_center[1] -= 9
        return boxes
 
    def fill_board_index_table(self):
        return_list = []
        for row in range(9):
            for column in range(9):
                return_list.append(str(row) + str(column))
        return return_list
class Tree_Node:
    def __init__(self, index, board_spot, last_node=None, next_node=None):
        self.possible_values = ['1', '2', '3', '4', '5', '6', '7', '8']
        self.index = index
        self.board_spot = board_spot
        self.next_node = next_node
        self.last_node = last_node
        self.value = '9'
    def next(self):
        self.value = self.possible_values.pop()
 
    def __str__(self):
        return self.value
start = time.time()
x = SudokuSolver()
end = time.time()

Here is the sample data you can use to run it. I cut it down to just 3 puzzles. To convert to a sudoku puzzle, break it up into 9 lines of 9 characters each. A 0 indicates a blank spot on the puzzle.

The full file test file has 50 puzzles in it. It takes the code a while to run against. I am going to cut it down to 3, but I want to hold on to the original data as well. I’ll do that using a symlink.

Here is sample_sudoku_board_inputs.csv.short. I can then symlink sample_sudoku_board_inputs.csv to this file.

003020600900305001001806400008102900700000008006708200002609500800203009005010300
200080300060070084030500209000105408000000000402706000301007040720040060004010003
000000907000420180000705026100904000050000040000507009920108000034059000507000000
ln -s sample_sudoku_board_inputs.csv.short sample_sudoku_board_inputs.csv

Lets make sure we can run it;

$ python3 tree_sudoku.py
Board: 0
---------------------
4 8 3 | 9 2 1 | 6 5 7 
9 6 7 | 3 4 5 | 8 2 1 
2 5 1 | 8 7 6 | 4 9 3 
---------------------
5 4 8 | 1 3 2 | 9 7 6 
7 2 9 | 5 6 4 | 1 3 8 
1 3 6 | 7 9 8 | 2 4 5 
---------------------
3 7 2 | 6 8 9 | 5 1 4 
8 1 4 | 2 5 3 | 7 6 9 
6 9 5 | 4 1 7 | 3 8 2 
---------------------
Board: 1
---------------------
2 4 5 | 9 8 1 | 3 7 6 
1 6 9 | 2 7 3 | 5 8 4 
8 3 7 | 5 6 4 | 2 1 9 
---------------------
9 7 6 | 1 2 5 | 4 3 8 
5 1 3 | 4 9 8 | 6 2 7 
4 8 2 | 7 3 6 | 9 5 1 
---------------------
3 9 1 | 6 5 7 | 8 4 2 
7 2 8 | 3 4 9 | 1 6 5 
6 5 4 | 8 1 2 | 7 9 3 
---------------------
Board: 2
---------------------
4 6 2 | 8 3 1 | 9 5 7 
7 9 5 | 4 2 6 | 1 8 3 
3 8 1 | 7 9 5 | 4 2 6 
---------------------
1 7 3 | 9 8 4 | 2 6 5 
6 5 9 | 3 1 2 | 7 4 8 
2 4 8 | 5 6 7 | 3 1 9 
---------------------
9 2 6 | 1 7 8 | 5 3 4 
8 3 4 | 2 5 9 | 6 7 1 
5 1 7 | 6 4 3 | 8 9 2 
---------------------
Finished in 4.777865409851074 seconds

Last thing before we really get started is to check this all in to git.

$ git status
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
[ayoung@ayoungP40 tree_sudoku]$ git init
Initialized empty Git repository in /home/ayoung/Documents/CodePlatoon/tree_sudoku/.git/
[ayoung@ayoungP40 tree_sudoku]$ git add .
[ayoung@ayoungP40 tree_sudoku]$ git commit -m "Initial checking of functioning code and sample data"
[master (root-commit) 5eb15e4] Initial checking of functioning code and sample data
 5 files changed, 478 insertions(+)
 create mode 120000 sample_sudoku_board_inputs.csv
 create mode 100644 sample_sudoku_board_inputs.csv.full
 create mode 100644 sample_sudoku_board_inputs.csv.short
 create mode 100644 tree_sudoku.py
 create mode 100644 tree_sudoku.py.comments

Next up; we’ll take a look through the code and write a test.

EDIT: I have posted all of the steps of the refactoring to github.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.