24 Dec 2021
This is a post in the series on revisiting Python. I’ve been using Python for a long time but most of my learning has been organic and on-demand. By actively studying its parts we might learn to do things in better way, or help us understand why our code does not work the way we would expect in corner cases.
In the previous post we talked about Object Oriented Programming, and in this we’ll discuss the modules system.
This was been tested with Python 3.9 (this post might receive updates when newer versions make it obsolete).
A module has a 1:1 mapping to a file, as described in :
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended.
There’s an exception to this rule which we’ll see later.
When we import a module with
import, it adds an object of the built-in class
module to the current scope:
I don’t know if this distinction is used in any official documentation but to me it’s helpful to categorize a module into:
.sofiles on Linux/Mac). Example:
They’re all abstracted behind a
module but we can see they return different things for
The module object has the property
__name__, that matches the module name. Note that if we alias the module on importing
__name__ contains the original name, for example:
There is also a built-in variable,
__name__, that is the name that corresponds to the module name where it lives, for example, if we have a file called
When we run the file using
python, the variable
__name__ is overridden to
"main", which is leveraged in a common technique to gate code to only run when called as the entry point file. For example, if we have
The message will only be printed if we executed it as
python app.py, but will not if it’s included as a module in some other file.
A module is only imported once, which means that the file corresponding to that module is only executed one time. The modules are cached in the
This means it’s possible to hijack that variable to stub a module. Suppose in
lib.py we have:
And that we have a custom
main.py we can replace the
This can be used for example to mock an entire module during tests.
Combining the knowledge from the two previous sections, it’s possible to access the module object of the current file via :
In  it states:
Each module has its own private symbol table, which is used as the global symbol table by all functions defined in the module
We can see the list of variables defined in a given module using the
Note that this only returns the names. If we’re also interested in the instances corresponding to the names, we can use
inspect.getmembers(), which returns a list of tuples (name, instance):
It’s worth noting that both
inspect.getmembers() are APIs that work with any class instance, and are not specific to modules.
For completion, it’s worth noticing modules also have the
__dict__ attribute, which contains a dictionary of name -> instance:
x.__dict__ seems to be equivalent to
inspect.getmembers(x) but in general this is not always true [3, 4].
There are a few different ways of importing a module and they might have different semantics.
Let’s consider the syntax used so far:
As we said before, this adds the
math object to the current scope, which really is a reference to a structure in memory. This means we can modify the module contents in runtime, for example, we could override
Not only will calls to
math.sin() use the new implementation, code in other modules using
math.sin will also be affected. For example, if we have in
And then in
Needless to say this can lead to very confusing behavior. It could be interesting for mocking in tests though.
Another syntax for importing is via:
This is more or less equivalent to:
Note that if you override the implementation of
sin, it simply updates the memory the variable
sin points to, and not the object
math in any way. So if
math.sin() wasn’t affected.
We can import all definitions from a module with this syntax:
This is convenient but it can also become problematic for more complex codebases, for example, if the same name is defined in multiple modules.
We can import a module from a string via the
We’ll see an use case for this in Lazy Loading Modules below.
When deciding which directory to search for a file corresponding to a module, the modules loader will look for a list of paths in order. This list can be obtained via:
This itself is initialized from the environment variable
PYTHONPATH. When we run Python via a
python file.py, the first entry of
sys.path is the directory of
file.py, which allows for local imports without further setup, for example when we import
main.py like above:
Note that the modules loader will not search the directory recursively, so if we have
foo.py in this structure:
Then importing with in
Will not work. We’ll cover subdirectories in the next section.
A package is a collection of modules like in . It’s roughly a 1:1 map to a directory. It can be divided into regular and namespace packages.
Continuing from the prior example, the correct way to import a file under a directory
sub_dir is via:
In this case
sub_dir is considered a package since it contains all the modules corresponding to files under its directory.
Note that the
. replaces the
/ and that this works with multiple depths. For example to import
sub_dir/another_dir/bar.py we do:
This causes the name of the imported module to be quite long, so we can use the
from ... import syntax:
We can import the subdirectory-only instead of its module, for example:
As we see in the output, this is considered a namespace module/package , which means it doesn’t have an associated file.
Before Python 3.3 it used to be necessary to include a file
__init__.py for files in subdirectories to be “visible”. It’s still possible to do so. For example, we could have the following structure:
The code in
__init__.py is executed when it or any of its subdirectories are imported. Suppose it has:
If we import it in
sub_dir is not a namespace module anymore since it’s tied to
sub_dir/__init__.py, but rather a regular one. Further, the variables in the scope of
__init__.py are available for importing:
As in the case with a single module, we can use the
The behavior will depend on whether
sub_dir has a backing
__init__.py or not. If it does, only the variables in the scope of
__init__.py will be imported.
The package owner can explicitly control what gets imported when doing the
* import by defining the variable
Now, when we do in
We’ll see only the module
baz has been imported even if other variables were defined in
__init__.py. If a
__init__.py is missing, nothing is imported.
We can use a relative import syntax within a package. Suppose we have this structure
bar.py we can import module relative to the package
Note that only the syntax
from <module> import <def> is allowed in this case. The number of dots represent the number of steps up in the file tree take, starting from 0.
We can’t import modules outside the package, so for example, if we had this in
It would fail, because
lib.py is not in the package
When we run
python main.py, all its dependencies are recursively imported and their top level code are executed. For example, consider a file
When we include it in
very_expensive_function() will be executed even if we rarely or never use
main(). In an ideal world the author of
lib.py should not execute expensive code in the top level but this is hard to control.
This can lead to pretty high start up times for scripts and CLI tools which is a common use case for Python. One option is to lazy load these modules as we’ll see next.
The most straightforward way to avoid overhead is by inlining the import as needed, so our
main.py would look like:
which addresses the issue. Recall that module loader only imports modules once so having to inline the import is safe:
In the example above, even if both conditions end up returning true, we’ll only execute the code of
lib.py once. One major downside of this approach is having to add the
import statement in multiple places which can be tedious and error prone.
We now consider an option with lighter syntax.
We can use
importlib.util to define a function that takes the name of a module as a string and returns a module whose file won’t execute until one of its members is accessed.  provides the following implementation (which I’m surprised it’s not part of the API):
Which we can then use as:
The downside of this approach is that we need to load the module name via string which might make static checkers unable to check whether the module actually exists or catch issues during file renaming, etc.
The documentation in  suggests using lazy loaded modules only when strictly necessary:
For projects where startup time is critical, this class allows for potentially minimizing the cost of loading a module if it is never used. For projects where startup time is not essential then use of this class is heavily discouraged due to error messages created during loading being postponed and thus occurring out of context
How do we know which modules to lazy load? We’ll see next.
We can determine which module is taking the most time to import by running our binary with thw
-X importtime flag. As an experiment, I tried profiling a recent project I worked with t-digest:
It prints a tree-like structure as text, which is a bit hard to make sense of. Fortunately there are UI tools to better visualize this trace, for example tuna:
We can save the output of the profiler to a file and visualize it in the browser:
The result is in Figure 1.
As always, focusing in learning one topic and setting time aside to play around led me to learn many new things. Some of them like the “self module” don’t seem immediately useful but knowing they exist could be handy in the future.