Creating a Python module

What is a python module?

You already know it: this is a set of python functions and statements, and this is what you import at the beginning of your python functions.

A module could be a single file

Indeed, a module can simply be a single file:

import fibo

This does not enter the names of the functions defined in fibo directly in the current symbol table though; it only enters the module name fibo there. Using the module name you can access the functions:

fibo.fib_print(1000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 
fibo.fib_list(100)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

When importing a module, several methods are (automatically) defined. Their names are usually prefixed and suffixed by the symbol __, e.g.,

fibo.__name__
'fibo'
fibo.__file__
'/home/jsalmon/Documents/Mes_cours/Montpellier/HAX712X/Courses/Python-modules/fibo.py'

… or a module could be a directory

You can also import a full directory (containing many python files stored in a sub-folder). python looks for a folder located in sys.path list. You have already imported the numpy module, for numerical analysis with python:

import numpy as np
print(np.array([0, 1, 2, 3]).reshape(2, 2))
print(np.array([0, 1, 2, 3]).mean())
[[0 1]
 [2 3]]
1.5

In fact, you have imported the following folder:

np.__path__
['/home/jsalmon/miniconda3/envs/HAX712X/lib/python3.12/site-packages/numpy']

Depending on your installation you might obtain ['/usr/lib/python3.9/site-packages/numpy'] or ['/home/username/anaconda3/lib/python3.7/site-packages/numpy'] if you installed with Anaconda.

More precisely you will get either

>>> np.__file__
'/usr/lib/python3.9/site-packages/numpy/__init__.py'

or

>>> np.__file__
'/home/username/anaconda3/lib/python3.7/site-packages/numpy/__init__.py'

Any (sub-)directory of your python module should contain an __init__.py file!

Useful tips
  • The __init__.py file can contain a list of functions to be loaded when the module is imported. It allows to expose functions to users in a concise way.

  • You can also import modules with relative paths, using ., .., ..., etc.

Reference: Absolute vs Relative Imports in Python by Mbithe Nzomo.

The dir() function

The built-in function dir() is used to find out which names a module defines. It returns a sorted list of strings:

import fibo, numpy
dir(fibo)
['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'fib_list',
 'fib_print']
dir(numpy)
['ALLOW_THREADS',
 'BUFSIZE',
 'CLIP',
 'DataSource',
 'ERR_CALL',
 'ERR_DEFAULT',
 'ERR_IGNORE',
 'ERR_LOG',
 'ERR_PRINT',
 'ERR_RAISE',
 'ERR_WARN',
 'FLOATING_POINT_SUPPORT',
 'FPE_DIVIDEBYZERO',
 'FPE_INVALID',
 'FPE_OVERFLOW',
 'FPE_UNDERFLOW',
 'False_',
 'Inf',
 'Infinity',
 'MAXDIMS',
 'MAY_SHARE_BOUNDS',
 'MAY_SHARE_EXACT',
 'NAN',
 'NINF',
 'NZERO',
 'NaN',
 'PINF',
 'PZERO',
 'RAISE',
 'RankWarning',
 'SHIFT_DIVIDEBYZERO',
 'SHIFT_INVALID',
 'SHIFT_OVERFLOW',
 'SHIFT_UNDERFLOW',
 'ScalarType',
 'True_',
 'UFUNC_BUFSIZE_DEFAULT',
 'UFUNC_PYVALS_NAME',
 'WRAP',
 '_CopyMode',
 '_NoValue',
 '_UFUNC_API',
 '__NUMPY_SETUP__',
 '__all__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__deprecated_attrs__',
 '__dir__',
 '__doc__',
 '__expired_functions__',
 '__file__',
 '__former_attrs__',
 '__future_scalars__',
 '__getattr__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_add_newdoc_ufunc',
 '_builtins',
 '_distributor_init',
 '_financial_names',
 '_get_promotion_state',
 '_globals',
 '_int_extended_msg',
 '_mat',
 '_no_nep50_warning',
 '_pyinstaller_hooks_dir',
 '_pytesttester',
 '_set_promotion_state',
 '_specific_msg',
 '_typing',
 '_using_numpy2_behavior',
 '_utils',
 'abs',
 'absolute',
 'add',
 'add_docstring',
 'add_newdoc',
 'add_newdoc_ufunc',
 'all',
 'allclose',
 'alltrue',
 'amax',
 'amin',
 'angle',
 'any',
 'append',
 'apply_along_axis',
 'apply_over_axes',
 'arange',
 'arccos',
 'arccosh',
 'arcsin',
 'arcsinh',
 'arctan',
 'arctan2',
 'arctanh',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'argwhere',
 'around',
 'array',
 'array2string',
 'array_equal',
 'array_equiv',
 'array_repr',
 'array_split',
 'array_str',
 'asanyarray',
 'asarray',
 'asarray_chkfinite',
 'ascontiguousarray',
 'asfarray',
 'asfortranarray',
 'asmatrix',
 'atleast_1d',
 'atleast_2d',
 'atleast_3d',
 'average',
 'bartlett',
 'base_repr',
 'binary_repr',
 'bincount',
 'bitwise_and',
 'bitwise_not',
 'bitwise_or',
 'bitwise_xor',
 'blackman',
 'block',
 'bmat',
 'bool_',
 'broadcast',
 'broadcast_arrays',
 'broadcast_shapes',
 'broadcast_to',
 'busday_count',
 'busday_offset',
 'busdaycalendar',
 'byte',
 'byte_bounds',
 'bytes_',
 'c_',
 'can_cast',
 'cast',
 'cbrt',
 'cdouble',
 'ceil',
 'cfloat',
 'char',
 'character',
 'chararray',
 'choose',
 'clip',
 'clongdouble',
 'clongfloat',
 'column_stack',
 'common_type',
 'compare_chararrays',
 'compat',
 'complex128',
 'complex256',
 'complex64',
 'complex_',
 'complexfloating',
 'compress',
 'concatenate',
 'conj',
 'conjugate',
 'convolve',
 'copy',
 'copysign',
 'copyto',
 'corrcoef',
 'correlate',
 'cos',
 'cosh',
 'count_nonzero',
 'cov',
 'cross',
 'csingle',
 'ctypeslib',
 'cumprod',
 'cumproduct',
 'cumsum',
 'datetime64',
 'datetime_as_string',
 'datetime_data',
 'deg2rad',
 'degrees',
 'delete',
 'deprecate',
 'deprecate_with_doc',
 'diag',
 'diag_indices',
 'diag_indices_from',
 'diagflat',
 'diagonal',
 'diff',
 'digitize',
 'disp',
 'divide',
 'divmod',
 'dot',
 'double',
 'dsplit',
 'dstack',
 'dtype',
 'dtypes',
 'e',
 'ediff1d',
 'einsum',
 'einsum_path',
 'emath',
 'empty',
 'empty_like',
 'equal',
 'errstate',
 'euler_gamma',
 'exceptions',
 'exp',
 'exp2',
 'expand_dims',
 'expm1',
 'expm1x',
 'extract',
 'eye',
 'fabs',
 'fastCopyAndTranspose',
 'fft',
 'fill_diagonal',
 'find_common_type',
 'finfo',
 'fix',
 'flatiter',
 'flatnonzero',
 'flexible',
 'flip',
 'fliplr',
 'flipud',
 'float128',
 'float16',
 'float32',
 'float64',
 'float_',
 'float_power',
 'floating',
 'floor',
 'floor_divide',
 'fmax',
 'fmin',
 'fmod',
 'format_float_positional',
 'format_float_scientific',
 'format_parser',
 'frexp',
 'from_dlpack',
 'frombuffer',
 'fromfile',
 'fromfunction',
 'fromiter',
 'frompyfunc',
 'fromregex',
 'fromstring',
 'full',
 'full_like',
 'gcd',
 'generic',
 'genfromtxt',
 'geomspace',
 'get_array_wrap',
 'get_include',
 'get_printoptions',
 'getbufsize',
 'geterr',
 'geterrcall',
 'geterrobj',
 'gradient',
 'greater',
 'greater_equal',
 'half',
 'hamming',
 'hanning',
 'heaviside',
 'histogram',
 'histogram2d',
 'histogram_bin_edges',
 'histogramdd',
 'hsplit',
 'hstack',
 'hypot',
 'i0',
 'identity',
 'iinfo',
 'imag',
 'in1d',
 'index_exp',
 'indices',
 'inexact',
 'inf',
 'info',
 'infty',
 'inner',
 'insert',
 'int16',
 'int32',
 'int64',
 'int8',
 'int_',
 'intc',
 'integer',
 'interp',
 'intersect1d',
 'intp',
 'invert',
 'is_busday',
 'isclose',
 'iscomplex',
 'iscomplexobj',
 'isfinite',
 'isfortran',
 'isin',
 'isinf',
 'isnan',
 'isnat',
 'isneginf',
 'isposinf',
 'isreal',
 'isrealobj',
 'isscalar',
 'issctype',
 'issubclass_',
 'issubdtype',
 'issubsctype',
 'iterable',
 'ix_',
 'kaiser',
 'kernel_version',
 'kron',
 'lcm',
 'ldexp',
 'left_shift',
 'less',
 'less_equal',
 'lexsort',
 'lib',
 'linalg',
 'linspace',
 'little_endian',
 'load',
 'loadtxt',
 'log',
 'log10',
 'log1p',
 'log2',
 'logaddexp',
 'logaddexp2',
 'logical_and',
 'logical_not',
 'logical_or',
 'logical_xor',
 'logspace',
 'longcomplex',
 'longdouble',
 'longfloat',
 'longlong',
 'lookfor',
 'ma',
 'mask_indices',
 'mat',
 'matmul',
 'matrix',
 'max',
 'maximum',
 'maximum_sctype',
 'may_share_memory',
 'mean',
 'median',
 'memmap',
 'meshgrid',
 'mgrid',
 'min',
 'min_scalar_type',
 'minimum',
 'mintypecode',
 'mod',
 'modf',
 'moveaxis',
 'msort',
 'multiply',
 'nan',
 'nan_to_num',
 'nanargmax',
 'nanargmin',
 'nancumprod',
 'nancumsum',
 'nanmax',
 'nanmean',
 'nanmedian',
 'nanmin',
 'nanpercentile',
 'nanprod',
 'nanquantile',
 'nanstd',
 'nansum',
 'nanvar',
 'nbytes',
 'ndarray',
 'ndenumerate',
 'ndim',
 'ndindex',
 'nditer',
 'negative',
 'nested_iters',
 'newaxis',
 'nextafter',
 'nonzero',
 'not_equal',
 'numarray',
 'number',
 'obj2sctype',
 'object_',
 'ogrid',
 'oldnumeric',
 'ones',
 'ones_like',
 'outer',
 'packbits',
 'pad',
 'partition',
 'percentile',
 'pi',
 'piecewise',
 'place',
 'poly',
 'poly1d',
 'polyadd',
 'polyder',
 'polydiv',
 'polyfit',
 'polyint',
 'polymul',
 'polynomial',
 'polysub',
 'polyval',
 'positive',
 'power',
 'printoptions',
 'prod',
 'product',
 'promote_types',
 'ptp',
 'put',
 'put_along_axis',
 'putmask',
 'quantile',
 'r_',
 'rad2deg',
 'radians',
 'random',
 'ravel',
 'ravel_multi_index',
 'real',
 'real_if_close',
 'rec',
 'recarray',
 'recfromcsv',
 'recfromtxt',
 'reciprocal',
 'record',
 'remainder',
 'repeat',
 'require',
 'reshape',
 'resize',
 'result_type',
 'right_shift',
 'rint',
 'roll',
 'rollaxis',
 'roots',
 'rot90',
 'round',
 'round_',
 'row_stack',
 's_',
 'safe_eval',
 'save',
 'savetxt',
 'savez',
 'savez_compressed',
 'sctype2char',
 'sctypeDict',
 'sctypes',
 'searchsorted',
 'select',
 'set_numeric_ops',
 'set_printoptions',
 'set_string_function',
 'setbufsize',
 'setdiff1d',
 'seterr',
 'seterrcall',
 'seterrobj',
 'setxor1d',
 'shape',
 'shares_memory',
 'short',
 'show_config',
 'show_runtime',
 'sign',
 'signbit',
 'signedinteger',
 'sin',
 'sinc',
 'single',
 'singlecomplex',
 'sinh',
 'size',
 'sometrue',
 'sort',
 'sort_complex',
 'source',
 'spacing',
 'split',
 'sqrt',
 'square',
 'squeeze',
 'stack',
 'std',
 'str_',
 'string_',
 'subtract',
 'sum',
 'swapaxes',
 'take',
 'take_along_axis',
 'tan',
 'tanh',
 'tensordot',
 'test',
 'testing',
 'tile',
 'timedelta64',
 'trace',
 'tracemalloc_domain',
 'transpose',
 'trapz',
 'tri',
 'tril',
 'tril_indices',
 'tril_indices_from',
 'trim_zeros',
 'triu',
 'triu_indices',
 'triu_indices_from',
 'true_divide',
 'trunc',
 'typecodes',
 'typename',
 'typing',
 'ubyte',
 'ufunc',
 'uint',
 'uint16',
 'uint32',
 'uint64',
 'uint8',
 'uintc',
 'uintp',
 'ulonglong',
 'unicode_',
 'union1d',
 'unique',
 'unpackbits',
 'unravel_index',
 'unsignedinteger',
 'unwrap',
 'ushort',
 'vander',
 'var',
 'vdot',
 'vectorize',
 'version',
 'void',
 'vsplit',
 'vstack',
 'where',
 'who',
 'zeros',
 'zeros_like']


To list every element in your symbol table simply call dir().

Reference: Python doc of the dir function

Namespaces

A namespace is a set of names (functions, variables, etc.). Different namespaces can co-exist at a given time but are completely isolated. In this way, you can control which function you are using.

A namespace containing all the built-in names is created when we start the python interpreter and exists as long we don’t exit.

cos(3)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 cos(3)

NameError: name 'cos' is not defined

You need to import a package for mathematical functions:

import math, numpy as np
print(math.cos(3), np.cos(3))
-0.9899924966004454 -0.9899924966004454

Reference: Python Namespace and Scope tutorial

The Module Search Path

When a module named spam is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path. The variable sys.path is initialized from these locations:

  • The directory containing the input script (or the current directory when no file is specified).

  • The environment variable PYTHONPATH (a list of directory names, with the same syntax as the shell variable PATH).

Reference: Python documentation, The Module Search Path

Checking if a module exists

Find the loader for a module, optionally within the specified path.

import importlib
spam_spec = importlib.util.find_spec("spam")
found = spam_spec is not None
print(found)
False

Now,

import numpy
numpy_spec = importlib.util.find_spec("numpy")
print(numpy_spec)
ModuleSpec(name='numpy', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7739304b8470>, origin='/home/jsalmon/miniconda3/envs/HAX712X/lib/python3.12/site-packages/numpy/__init__.py', submodule_search_locations=['/home/jsalmon/miniconda3/envs/HAX712X/lib/python3.12/site-packages/numpy'])

should return more information and where the loader is.

References:

Lazy import

A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module name is encountered in an import statement.

To force a module to be reloaded, you can use importlib.reload().

Remark: when using ipython (interactive python, an ancestor of the jupyter notebook), one can use the “magic” command %autoreload 2

References: - Python doc on reload - IPython autoreload

“Compiled” python files

To speed up loading modules, python caches the compiled version of each module in the __pycache__ directory under the name module.version.pyc, where the version encodes the format of the compiled file; it generally contains the python version number.

For example, in CPython release 3.3 the compiled version of spam.py would be cached as __pycache__/spam.cpython-33.pyc. This naming convention allows compiled modules from different releases and different versions of python to coexist.

Useful git tip

You should add __pycache__ entry in your .gitignore file to avoid adding a compiled python file to your project.

The python Package Index (Pypi) repository

The python Package Index, abbreviated as PyPI, is the official third-party software repository for python. PyPI primarily hosts python packages in the form of archives called sdists (source distributions) or pre-compiled “wheels”.

EXERCISE: pypi
  1. Go to https://test.pypi.org/ and describe the aim of this repository.

Pip

pip is a de facto standard package-management system used to install and manage software packages from PyPi.

$ pip install some-package-name
$ pip uninstall some-package-name
$ pip search some-package-name
EXERCISE:
  1. Install the modules pooch, setuptools, pandas, pygal and pygal_maps_fr. Beware, you should use the option --user to force the installation in your home directory.
  2. List all the package in your venv using pip.

It is possible to install a local module with pip

$ pip install /path/to/my/local/module

where /path/to/my/local/module is the path to the module. But if some changes occur in the /path/to/my/local/module folder, the module will not be reloaded. This might be annoying during the development stage. To force python to reload the module at each change call, consider the -e option:

$ pip install -e /path/to/my/local/module

Creating a python module

We are going to create a simple package. The structure is always similar, and an example can be found for instance here.

Reference: How To Package Your Python Code

Picking a name

python module/package names should generally follow the following constraints:

  • All lowercase (🇫🇷: en minuscule)
  • Unique on PyPI, even if you do not want to make your package publicly available (you might want to specify it privately as a dependency later)
  • Underscore-separated or no word separators at all, and do not use hyphens (i.e., use _ not -).

We are going to create a module called biketrauma to visualize the bicycle_db (Source: here, adapted for the original version from there) used in the some of these lectures.

Module structure

The initial directory structure for biketrauma should look like this:

packaging_tutorial/
    ├── biketrauma/
    │     ├── __init__.py
    │     └── data/
    ├── setup.py
    └── .gitignore

The top-level directory is the root of our Version Control System (e.g. git) repository packaging_tutorial. The sub-directory, biketrauma, is the actual python module.

EXERCISE: packaging

We are going to create a new python module that can be used to visualize the bike dataset.

  1. Create a new folder ~/packaging_tutorial/ and initialize a git in it.
  2. Create a .gitignore file to ignore __pycache__, .vscode directories and files containing the string egg-info or dist in their name as well.
  3. Push your work into a new repository on your github.
  4. Create a sub-folder ~/packaging_tutorial/biketrauma. This is where our python module will be stored.
  5. Create a ~/packaging_tutorial/biketrauma/__init__.py file where a string __version__ defined at 0.0.1.
  6. Create an empty sub-folder ~/packaging_tutorial/biketrauma/data locally on your computer/session. How to add it to git? (Hint: .gitkeep)
  7. Create an empty ~/packaging_tutorial/setup.py file.
  8. Commit and push into your repository.

Reference: Single-sourcing the package version.

Sub-modules

The final directory structure of our module will look like:

packaging_tutorial/
    ├── biketrauma/
    │     ├── __init__.py
    │     ├── io/
    │     │     ├─ __init__.py
    │     │     └─ Load_db.py
    │     ├── preprocess/
    │     │     ├─ __init__.py
    │     │     └─ get_accident.py
    │     └── vis/
    │     │     ├─ __init__.py
    │     │     └─ plot_location.py
    │     └── data/
    │     │     └─ .gitkeep
    ├── setup.py
    ├── script.py
    └── .gitignore
EXERCISE: modules

See the git repo of the course: the subfolder Courses/Python-modules/modules_files: contains some files you should include into your package tree:

  1. Copy the script.py into the project root folder.

  2. Add some sub-folders to biketrauma directory called io (for input/output), preprocess, vis (for visualization). Add an empty __init__.py file into the module and sub-module folders.

  3. Populate the preprocess sub-module with the get_accident.py file.

  4. Populate the vis sub-module with the plot_location.py file.

  5. Populate the io sub-module with the file Load_db.py (it downloads the bike dataset). At the loading step of the io sub-module, it should create the following variables (Hint: this is done in the __init__.py file)

    url_db = "https://github.com/josephsalmon/HAX712X/raw/main/Data/accidents-velos_2022.csv.xz"
    path_target = os.path.join(
        os.path.dirname(os.path.realpath(__file__)), "..", "data", "bicycle_db.csv.xz"
    )
  6. Commit your changes.

Fix the import

In order to load the functions in the io, preprocess and vis sub-modules, you can add the following lines to the ~/packaging_tutorial/biketrauma/__init__.py:

from .io.Load_db import Load_db
from .vis.plot_location import plot_location
from .preprocess.get_accident import get_accident
EXERCISE:
  1. Be sure to have the following packages installed (Hint: this list could be saved in a requirements.txt file in the project root folder.):

    tqdm
    pygal_maps_fr
    pooch
    pandas
    numpy
    pygal
    setuptools
    lxml
  2. Check that your module does work by launching the script.

    $ cd ~/packaging_tutorial
    $ python script.py

    You can then open the file created file biketrauma_map.svg in a navigator:

  3. Commit and push your changes.

Package the module with setuptools

The main setup configuration file, setup.py, should contain a single call to setuptools.setup(), like so:

from setuptools import setup
from biketrauma import __version__ as current_version

setup(
  name='biketrauma',
  version=current_version,
  description='Visualization of a bicycle accident db',
  url='http://github.com/xxxxxxxxxxx.git',
  author='xxxxxxxxxxx',
  author_email='xxxxxxxxxx@xxxxxxxxxxxxx.xxx',
  license='MIT',
  packages=['biketrauma','biketrauma.io', 'biketrauma.preprocess', 'biketrauma.vis'],
  zip_safe=False
)

To create a sdist package (a source distribution):

$ cd ~/packaging_tutorial/
$ python setup.py sdist

This will create dist/biketrauma-0.0.1.tar.gz inside the top-level directory. You can now install it with

$ pip install ~/packaging_tutorial/dist/biketrauma-0.0.1.tar.gz

References:

Add requirement file

To get a list of the installed packages in your current Venv, you can use the following command:

$ pip freeze > requirements.txt

Unfortunately, it may generate a huge collection of package dependencies. To get a sparser list, you can use pipreqs.

EXERCISE: requirements

Create a minimal requirements.txt file with pipreqs. Add it to the biketrauma module.

Upload on PyPI

twine is a utility for publishing python packages on PyPI. We are going to use the test repository https://test.pypi.org/.

EXERCISE:
  1. Create an account on the PyPI test repository.

    This is quite easy to upload a python module on PyPI:

  2. Create some distributions in the normal way:

    $ python setup.py sdist bdist_wheel
  3. Upload with twine to Test PyPI and verify things look right. Twine will automatically prompt for your username and password:

    $ twine upload --repository-url https://test.pypi.org/legacy/ dist/*
    username: ...
    password: ...
  4. Upload to PyPI:

    $ twine upload dist/*
About the data folder

We have included the data folder in the sub-module tree which is not a good practice: permission may not be granted in the destination dir of the module… A better idea could be to create a data folder in a cache or temp directory.

References

Back to top