Comments and docstrings and type hints, oh my!
Functionality is only half the battle when it comes to software development - especially in data heavy applications and pipelines. Your software or analytics source code should be readable to other humans. Without proper code comments, it would be difficult for anyone, including future you, to understand what a given piece of code is supposed to do and why. Maintainability and extensibility quickly become an uphill battle, sucking valuable time out of your day.
In Python, there are so many ways to improve the readability of code, to make the implicit explicit. In this article, we will go through how to properly use comments, docstrings, and type hints to make Python code easier to understand.
After finishing this article, you will know:
How to effectively use comments in Python
When and how you can replace comments with string literal or docstrings
What type hints are in Python and how they can help you and others to read code
Ready?
Python Code Comments
Nearly all programming languages have some syntax to allow for comments. Comments are able to be ignored by compilers or code interpreters so that they don't interrupt the code itself.
Some languages, like C++, allow for “inline” comments where you use a leading double slash ( / / ) or other character combination like /* to note a comment block. These specific symbols are like instructions to the code compiler to ignore that specific line or block of text.
Python allows for inline comments with a leading hash (#) and comment blocks by enclosing your comment in triple quotes (“““).
Comments are exceptionally easy to add, and you could add a comment to explain every single line of code in your repo. But just because you can doesn’t mean you should. Too many comments can be distracting for the reader. Take the example below:
import math #import the math packagex = 4 #initialize x to 4print(math.sqrt(x)) #print the square root of x
The comments in the above example are merely repeating what the code does. The code isn’t obscured by arcane syntax or confusing equations, so the comments have little value and become clutter in your code. Good comments should be useful, they should tell the reader what the code is doing and why. Let’s check out another example.
# This class provides utility functions to work with Strings
# 1. reverse(s): returns the reverse of the input string
class StringUtils:
def reverse(s):
return ''.join(reversed(s))
The code above is setting up a class and function. The code tells us what the class is supposed to be used for (string based functions) , and what the reverse function does. This is a rather simple example, but you get the idea.
Another common practice is to use comments to keep track of incomplete work or “to-do” items. Often, when we have an idea for possible improvements, extensions, or fixes we may put a to-do comment, like this:
#todo replace NLTK with spaCy lemmatizer for faster performance and better OOV supportfrom nltk.stem import WordNetLemmatizerlemmatizer = WordNetLemmatizer()print("planets :", lemmatizer.lemmatize("planets"))print("corpora :", lemmatizer.lemmatize("corpora"))#a denotes adjective in "pos"print("worse :", lemmatizer.lemmatize("worse", pos ="a"))
This is common enough that most IDE will highlight comments differently when the string TODO
is found. TODO comments are intended to be temporary and shouldn’t replace a proper issue tracking system.
To recap, some commenting best practices:
Comments should explain, but not restate, the code
Use comments to explain code that is otherwise not self explanatory or trivial - e.g. name the algorithm being used, the reason certain parameters were chosen, or the programmers intent or assumptions
Comments should be concise, simple, and follow a consistent style
Docstrings
Python uses string literals (“““) instead delimiters (/*
and */)
,to note multi-line comments. This works because Python interprets a string literal in the code as merely a string declared with no impact. It’s not important for you to understand the inner workings of string literals right now, just know that it is functionally no different to the single line comments we learned about in the last section.
String literals are useful for creating docstrings - multi-line comments that explain functions, classes, or other complex blocks of code like this:
from sklearn.linear_model import LogisticRegressionfrom sklearn.datasets import make_classification
"""X, y = make_classification(n_samples=3500, n_features=7, n_informative=7, n_redundant=0, n_repeated=0, n_classes=2, n_clusters_per_class=1, weights=[0.01, 0.05, 0.94], class_sep=0.75, random_state=123)"""
import picklewith open("dataset.pickle", "wb") as fp:X, y = pickle.load(fp)clf = LogisticRegression(random_state=123).fit(X, y)
The above is some sample code that we can use to experiment on for a machine learning problem. We generated a dataset randomly at the top with make_classification()
, but we might want to use a different dataset with the same process (this is what the pickle part above is for). Instead of removing the block of code, we can comment it out for later use. You shouldn’t keep commented out code in your repo permanently. but this is very convenient when you’re actively working on a script.
String literals in Python have a special purpose if they are directly under a function call. In that case, we call these a “docstring” of the function. Here’s an example:
def square(x):
"""Compute the square of a value
Args:
x (int or float): A numerical value
Returns:
int or float: The square of x
"""
return x * x
Here you can see that first line under the function (def square(x)) is a literal string and it acts just like a comment. It makes the code more readable. But wait, there’s more. We can retrieve docstrings from code programmatically like this:
print("Function name:", square.__name__)print("Docstring:", square.__doc__)
Function name: squareDocstring: Compute the square of a value Args: x (int or float): A numerical value Returns: int or float: The square of x
Since docstrings have this special purpose in Python, there are some conventions on how to write them properly. Generally it’s expected that a docstring will explain the purpose, key arguments, inputs, and outputs of a function, class, or module. There are a few common styles, including the one in the above code, as well as a sale established by NumPy:
def square(x): """Compute the square of a value Parameters ---------- x : int or float A numerical value Returns ------- int or float The square of `x` """ return x * x
If you’re diligent about writing docstrings for your functions, there are even Python tools (like auto doc) that can parse docstrings to automate the creation of your technical documentation. Even if your goal is not to automate parts of the documentation process, having docstrings that effectively describe the nature of functions in your code, the data types of the given function arguments and outputs will make your code much easier to read. Docstrings are just one way to make the implicit assumptions in your code explicit. This will help others as well as future you understand your code.
Type Hints
Python 3.5 brought us native type hint syntax, and everyone is happy about this. As you might have guessed from the name, type hints allow you to note and test the data type. If you haven’t encountered a type hint in the wild, they look like this:
def square(x: int) -> int: return x * x
You can see above that in a function, you can add a type hint to each argument by adding : type
to spell out the intended data types for the given argument. You can also specify the type for the return value of a function by adding -> type
before the ending colon.
But wait, there’s more…
You can use type hints inside of your function as well! Using the convention below, you can add type hints to interim values within a function.
def square(x: int) -> int: value: int = x * x return value
If you’re wondering why you should bother using type hints, keep reading.
Type hints have several benefits, and we’re going to highlight two of them in this article.First, type hints can replace comments when you need to state explicitly the data type being used. Type hints also help static code analyzers understand our code better so they can identify potential issues in the code.
If your data type is more complex, Python offers the typing
package in it’s portfolio of standard libraries. The Typing package supports more complex type hinting like Union[int,float]
to mean int
type or float
type, List[str]
to mean a list that every element is a string, and use Any
to mean anything. Here’s an example of how you might use these:
from typing import Any, Union, Listdef square(x: Union[int, float]) -> Union[int, float]: return x * xdef append(x: List[Any], y: Any) -> None: x.append(y)
Keep in mind that type hints are just that, hints. They are not tests in themselves and it’s possible to write code using type hints that is confusing or nonsensical like the following:
n: int = 3.5
Using type hints can improve the readability of code. However, second benefit we’ll highlight today is that type hints allow static analyzers, like mypy, to identify potential bugs. For example, if you were pro run the above code with mypy, you would get an error.
example.py:1: error: Incompatible types in assignment (expression has type "float", variable has type "int")
Conclusion
This concludes our intro to comments, docstrings, and type hints in Python. Leave your comments and questions in the comments.
If you have a codebase without docstrings, type hints, or other helpful components and could use some help getting your codebase into shape - we would love to help you! Drop us a message or schedule a consultation.
Lastly, if you’re looking for a longer intro to Python check out O’Reilly’s Introduction to Python.