devarena logo
Reading Time: 12 minutes


Last Updated on May 10, 2022

Static analyzers are tools that help you check your code without really running your code. The most basic form of static analyzers is the syntax highlighters in your favorite editors. If you need to compile your code (say, in C++), your compiler, such as LLVM, may also provide some static analyzer functions to warn you about potential issues (e.g., mistaken assignment “=” for equality “==” in C++). In Python, we have some tools to identify potential errors or point out violations of coding standards.

After finishing this tutorial, you will learn some of these tools. Specifically,

  • What can the tools Pylint, Flake8, and mypy do?
  • What are coding style violations?
  • How can we use type hints to help analyzers identify potential bugs?

Let’s get started.

Static Analyzers in Python
Photo by Skylar Kang. Some rights reserved

Overview

This tutorial is in three parts; they are:

  • Introduction to Pylint
  • Introduction to Flake8
  • Introduction to mypy

Pylint

Lint was the name of a static analyzer for C created a long time ago. Pylint borrowed its name and is one of the most widely used static analyzers. It is available as a Python package, and we can install it with pip:

Then we have the command pylint available in our system.

Pylint can check one script or the entire directory. For example, if we have the following script saved as lenet5-notworking.py:

We can ask Pylint to tell us how good our code is before even running it:

The output is as follows:

If you provide the root directory of a module to Pylint, all components of the module will be checked by Pylint. In that case, you will see the path of different files at the beginning of each line.

There are several things to note here. First, the complaints from Pylint are in different categories. Most commonly we would see issues on convention (i.e., a matter of style), warnings (i.e., the code may run in a sense not consistent with what you intended to do), and error (i.e., the code may fail to run and throw exceptions). They are identified by the code such as E0601, where the first letter is the category.

Pylint may give false positives. In the example above, we see Pylint flagged the import from tensorflow.keras.datasets as an error. It is caused by an optimization in the Tensorflow package that not everything would be scanned and loaded by Python when we import Tensorflow, but a LazyLoader is created to help load only the necessary part of a large package. This saves significant time in starting the program, but it also confuses Pylint in that we seem to import something that doesn’t exist.

Furthermore, one of the key feature of Pylint is to help us make our code align with the PEP8 coding style. When we define a function without a docstring, for instance, Pylint will complain that we didn’t follow the coding convention even if the code is not doing anything wrong.

But the most important use of Pylint is to help us identify potential issues. For example, we misspelled y_train as Y_train with an uppercase Y. Pylint will tell us that we are using a variable without assigning any value to it. It is not straightforwardly telling us what went wrong, but it definitely points us to the right spot to proofread our code. Similarly, when we define the variable model on line 23, Pylint told us that there is a variable of the same name at the outer scope. Hence the reference to model later on may not be what we were thinking. Similarly, unused imports may be just that we misspelled the name of the modules.

All these are hints provided by Pylint. We still have to use our judgement to correct our code (or ignore Pylint’s complaints).

But if you know what Pylint should stop complaining about, you can request to ignore those. For example, we know the import statements are fine, so we can invoke Pylint with:

Now, all errors of code E0611 will be ignored by Pylint. You can disable multiple codes by a comma-separated list, e.g.,

If you want to disable some issues on only a specific line or a specific part of the code, you can put special comments to your code, as follows:

The magic keyword pylint: will introduce Pylint-specific instructions. The code E0611 and the name no-name-in-module are the same. In the example above, Pylint will complain about the last two import statements but not the first two because of those special comments.

Flake8

The tool Flake8 is indeed a wrapper over PyFlakes, McCabe, and pycodestyle. When you install flake8 with:

you will install all these dependencies.

Similar to Pylint, we have the command flake8 after installing this package, and we can pass in a script or a directory for analysis. But the focus of Flake8 is inclined toward coding style. Hence we would see the following output for the same code as above:

The error codes beginning with letter E are from pycodestyle, and those beginning with letter F are from PyFlakes. We can see it complains about coding style issues such as the use of (5,5) for not having a space after the comma. We can also see it can identify the use of variables before assignment. But it does not catch some code smells such as the function createmodel()that reuses the variable model that was already defined in outer scope.

Similar to Pylint, we can also ask Flake8 to ignore some complaints. For example,

Those lines will not be printed in the output:

We can also use magic comments to disable some complaints, e.g.,

Flake8 will look for the comment # noqa: to skip some complaints on those particular lines.

Mypy

Python is not a typed language so, unlike C or Java, you do not need to declare the types of some functions or variables before use. But lately, Python has introduced type hint notation, so we can specify what type a function or variable intended to be without enforcing its compliance like a typed language.

One of the biggest benefits of using type hints in Python is to provide additional information for static analyzers to check. Mypy is the tool that can understand type hints. Even without type hints, Mypy can still provide complaints similar to Pylint and Flake8.

We can install Mypy from PyPI:

Then the example above can be provided to the mypy command:

We see similar errors as Pylint above, although sometimes not as precise (e.g., the issue with the variable y_train). However we see one characteristic of mypy above: It expects all libraries we used to come with a stub so the type checking can be done. This is because type hints are optional. In case the code from a library does not provide type hints, the code can still work, but mypy cannot verify. Some of the libraries have typing stubs available that enables mypy to check them better.

Let’s consider another example:

This program is supposed to load a HDF5 file (such as a Keras model) and print every attribute and data stored in it. We used the h5py module (which does not have a typing stub, and hence mypy cannot identify the types it used), but we added type hints to the function we defined, dumphdf5(). This function expects the filename of a HDF5 file and prints everything stored inside. At the end, the number of datasets stored will be returned.

When we save this script into dumphdf5.py and pass it into mypy, we will see the following:

We misused our function so that an opened file object is passed into dumphdf5() instead of just the filename (as a string). Mypy can identify this error. We also declared that the function should return an integer, but we didn’t have the return statement in the function.

However, there is one more error in this code that mypy didn’t identify. Namely, the use of the variable count in the inner function recur_dump() should be declared nonlocal because it is defined out of scope. This error can be caught by Pylint and Flake8, but mypy missed it.

The following is the complete, corrected code with no more errors. Note that we added the magic comment “# type: ignore” at the first line to mute the typing stubs warning from mypy:

In conclusion, the three tools we introduced above can be complementary to each other. You may consider to run all of them to look for any possible bugs in your code or improve the coding style. Each tool allows some configuration, either from the command line or from a config file, to customize for your needs (e.g., how long a line should be too long to deserve a warning?). Using a static analyzer is also a way to help yourself develop better programming skills.

Further reading

This section provides more resources on the topic if you are looking to go deeper.

Articles

Software packages

Summary

In this tutorial, you’ve seen how some common static analyzers can help you write better Python code. Specifically you learned:

  • The strengths and weaknesses of three tools: Pylint, Flake8, and mypy
  • How to customize the behavior of these tools
  • How to understand the complaints made by these analyzers



Source link

Spread the Word!