Five Python Refactoring Tips to Improve Your Code#

Author

Erik Inqwersen

Date

May 23, 2023

Easy ways to improve and simplify your code, with Python examples

Refactoring is the process of improving the quality of code without changing what it does. Taking a little bit of time to refactor your code before committing it can make it much more readable, which makes it easier to maintain. Doing this consistently turns finding bugs and adding new features from a nightmarish hells-cape into a walk in the park.

This article contains 5 easy things you can do straight away to make your code more readable and indeed beautiful.

1. Remove commented out code#

This is the easiest possible refactoring, and it can yield excellent results for your codebase. Commented out code is confusing. It bloats the codebase and makes it difficult to follow the true execution paths of the code. It also gets in the way of genuine comments, which can become lost in a sea of out of date code lines.

# lots of
#
# commented out lines of code
do_something_important()
#
# even more
# commented out code

In examples like this it’s possible to miss vital information as you’re reading the code, since you’re trying to skip over the comments.

If you’re worried that the commented out code might be useful some day, the correct way of storing it is in source control. If you ever need it then it will be there in the commit history. Just give the commit where you delete the comments a suitable name and they will be easy to find.

Best of all, as long as you’re careful to only delete commented out lines, this refactoring can never break your code.

2. Extract magic numbers and strings#

While you’re developing it’s often easier to write string literals and numbers directly into your code. However leaving them there is a recipe for problems down the road.

If the number or string changes later you will need to find every instance of the literal, check if it needs to change and then alter it. Making repeated changes to duplicate code in this way is one of the leading causes of bugs, since it’s easy to miss out one of the usages. Replacing them with a constant means that the literal is stored in one place and only needs to change in that place.

Another problem is that a bare number like the one in this example doesn’t tell you why it has the value or what it is for.

def update_text(text):
    if len(text) > 80:
        wrap(text)

Replacing it with a constant let’s you give it a descriptive name, which makes the code easier to read and understand.

MAX_LINE_LENGTH = 80

def update_text(text):
    if len(text) > MAX_LINE_LENGTH:
        wrap(text)

When it comes to actually making the changes, many IDEs will help you to extract literals with an ‘Extract Constant’ option.

Danger

Avoid using Jupyter notebooks for tasks other than exploration and prototyping. This type of problem discussed on this section is a chronic error that data scientist often make. Jupyter notebooks are not meant to substitute a proper IDE. When I see a program that heavily relies on data, that contains multiple instances of magic numbers and strings, I know for a fact that the developer used Jupyter notebooks for the entire development process of the source. A simple way to avoid making this type of mistake is to use Jupyter notebooks only for exploration and prototyping, and then to move the code into a proper Python module once it’s working using a real IDE.

3. Remove duplication#

DRY (Don’t Repeat Yourself) is one of the foundational principles of software development. Duplication makes code harder to understand, and often leads to bugs when the duplicated code starts to diverge.

Code Hoisting

One easy way to remove duplication is if you notice an opportunity for code hoisting. This is where code is repeated on both branches of a conditional, so can be taken outside.

if sold > DISCOUNT_AMOUNT:
    total = sold * DISCOUNT_PRICE
    label = f"Total: {total}"
else:
    total = sold * PRICE
    label = f"Total: {total}"

By taking the assignment to label outside of the conditional we have removed a duplicate line of code, and made it clearer what the conditional is actually controlling, which is the total.

if sold > DISCOUNT_AMOUNT:
    total = sold * DISCOUNT_PRICE
else:
    total = sold * PRICE
label = f"Total: {total}"

This then allows us to further refactor if we wish, by replacing the conditional with an if expression.

total = sold * DISCOUNT_PRICE if sold > DISCOUNT_AMOUNT else sold * PRICE
label = f"Total: {total}"

Extracting functions

More commonly, code is duplicated in different parts of a function, or in two different functions.

Here you will need to extract out the code into another function, give it a meaningful name, and then call it instead of the extracted code.

class Game:
    # ...

    def was_correctly_answered(self):
        if self.not_penalised():
            print("Answer was correct!!!!")
            self.purses[self.current_player] += 1
            print(
                self.players[self.current_player]
                + " now has "
                + str(self.purses[self.current_player])
                + " Gold Coins."
            )

            winner = self._did_player_win()
            self.current_player += 1
            if self.current_player == len(self.players):
                self.current_player = 0

            return winner
        else:
            self.current_player += 1
            if self.current_player == len(self.players):
                self.current_player = 0
            return True

    def wrong_answer(self):
        print("Question was incorrectly answered")
        print(self.players[self.current_player] + " was sent to the penalty box")
        self.in_penalty_box[self.current_player] = True

        self.current_player += 1
        if self.current_player == len(self.players):
            self.current_player = 0
        return True

This code is taken (slightly modified) from the trivia kata, published under a GPLv3 license. Looking over it one clear piece of duplicated code is the following, which pops up in three places:

self.current_player += 1
if self.current_player == len(self.players):
    self.current_player = 0

This should be extracted into a function. Many IDEs will let you choose snippets of code and extract them automatically, and some such as PyCharm will also scan your code to see which parts can be replaced by a call to the new function. Let’s extract this method, and call it next_player, since it moves the current_player on to the next valid value.

class Game:
    # ...

    def was_correctly_answered(self):
        if self.not_penalised():
            print("Answer was correct!!!!")
            self.purses[self.current_player] += 1
            print(
                self.players[self.current_player]
                + " now has "
                + str(self.purses[self.current_player])
                + " Gold Coins."
            )

            winner = self._did_player_win()
            self.next_player()

            return winner
        else:
            self.next_player()
            return True

    def wrong_answer(self):
        print("Question was incorrectly answered")
        print(self.players[self.current_player] + " was sent to the penalty box")
        self.in_penalty_box[self.current_player] = True

        self.next_player()
        return True

    def next_player(self):
        self.current_player += 1
        if self.current_player == len(self.players):
            self.current_player = 0

There’s a lot more refactoring to be done on this code, but it has definitely improved. It reads more clearly, and if the next_player() function has to change we only need to alter the code in one place instead of three.

As you do more coding and refactoring you will learn to see more and more opportunities to remove duplication. Often you will need to massage multiple pieces of code to make them more similar, then extract out the common elements into functions.

4. Split up large functions#

The longer a function is, the harder it is to read and understand. An important aspect of writing good functions is that they should do one thing, and have a name that reflects exactly what they do - a maxim known as the Single Responsibility Principle. Longer functions are more prone to doing lots of different things.

Opinions vary on how long they should be, but in general the shorter the better. A cast-iron rule is that it should definitely fit on one page of your code editor. Having to scroll up and down to see what a function does adds a lot to the cognitive load needed to understand it.

In order to split up a function you will need to extract parts of it into other functions, as described in the previous section.

Let’s take a look at some code to get an idea of how to proceed:

def make_tea(kettle, tap, teapot, tea_bag, cup, milk, sugar):
    kettle.fill(tap)
    kettle.switch_on()
    kettle.wait_until_boiling()
    boiled_kettle = kettle.pick_up()
    teapot.add(tea_bag)
    teapot.add(boiled_kettle.pour())
    teapot.wait_until_brewed()
    full_teapot = teapot.pick_up()
    cup.add(full_teapot.pour())
    cup.add(milk)
    cup.add(sugar)
    cup.stir()
    return cup

Here we can see that the calls split into three groups - those that call methods on the kettle, the teapot and then the cup. From our domain knowledge (being British may help here) these also correspond to the three stages of making a cup of tea - boiling the kettle, brewing the tea and then pouring it into a cup and serving it.

Let’s start at the end, and extract out the lines that relate to pouring out a cup of tea and serving it. You can do this manually or by using the 'Extract Method' functionality of your IDE.

def make_tea(kettle, tap, teapot, tea_bag, cup, milk, sugar):
    kettle.fill(tap)
    kettle.switch_on()
    kettle.wait_until_boiling()
    boiled_kettle = kettle.pick_up()
    teapot.add(tea_bag)
    teapot.add(boiled_kettle.pour())
    teapot.wait_until_brewed()
    full_teapot = teapot.pick_up()
    return pour_tea(cup, full_teapot, milk, sugar)


def pour_tea(cup, full_teapot, milk, sugar):
    cup.add(full_teapot.pour())
    cup.add(milk)
    cup.add(sugar)
    cup.stir()
    return cup

At this stage we have a mixture of levels of abstraction in our make_tea function - switching on a kettle is a simpler operation than the multiple steps involved in our pour_tea function. Each line in our functions should ideally be at a similar level, making them easier to parse as a coherent narrative. To achieve this let’s go ahead and extract out the other two functions as well.

def make_tea(kettle, tap, teapot, tea_bag, cup, milk, sugar):
    boiled_kettle = boil_water(kettle, tap)
    full_teapot = brew_tea(boiled_kettle, tea_bag, teapot)
    return pour_tea(cup, full_teapot, milk, sugar)


def boil_water(kettle, tap):
    kettle.fill(tap)
    kettle.switch_on()
    kettle.wait_until_boiling()
    return kettle.pick_up()


def brew_tea(boiled_kettle, tea_bag, teapot):
    teapot.add(tea_bag)
    teapot.add(boiled_kettle.pour())
    teapot.wait_until_brewed()
    return teapot.pick_up()


def pour_tea(cup, full_teapot, milk, sugar):
    cup.add(full_teapot.pour())
    cup.add(milk)
    cup.add(sugar)
    cup.stir()
    return cup

Looking at the top-level make_tea function you can now read the three top-level stages of making tea like a little story. If you are interested in the details of any stage you can drill down to the relevant method.

When splitting up a function it’s best to identify logically consistent bits of code that hang together and do one thing. A rule of thumb is that if you can think of a good domain-relevant name for the new function then extracting it is probably a good idea.

5. Move local variable declarations close to their usages#

When reading and understanding code you have to keep the variables you encounter and their state in your short-term memory. Most adults can store around 7 things at once.

It’s therefore quite important to ensure that you have to deal with as few variables as possible when reading any particular piece of code. Keeping variable declarations close and in scope with their usage helps with this. Basically the closer a variable is declared to its usage, the less scanning up and down you have to do when reading the code later.

def assess_fruit(self, fruit):
    happiness = 0
    hunger = time_since_breakfast() / size_of_breakfast()
    some_other_code()
    # work out some things
    do_some_other_things()
    if is_snack_time() and isinstance(fruit, Apple):
        yumminess = fruit.size * fruit.ripeness**2
        happiness += hunger * yumminess
    return happiness

When reading this code you have to keep the hunger variable in mind from the start to the end of the function, since it could be used or altered anywhere.

def assess_fruit(self, fruit):
    happiness = 0
    some_other_code()
    # work out some things
    do_some_other_things()
    if is_snack_time() and isinstance(fruit, Apple):
        hunger = time_since_breakfast() / size_of_breakfast()
        yumminess = fruit.size * fruit.ripeness**2
        happiness += hunger * yumminess
    return happiness

Refactoring to move it into the scope where it is used resolves this issue. Now we only have to think about hunger in the appropriate scope.