import fibo
What is a python
module?
You already know it: this is a set of python
functions and statements, and this is what you import at the beginning of your python
functions.
A module could be a single file
Indeed, a module can simply be a single file:
This does not enter the names of the functions defined in fibo
directly in the current symbol table though; it only enters the module name fibo
there. Using the module name you can access the functions:
1000) fibo.fib_print(
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
100) fibo.fib_list(
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
When importing a module, several methods are (automatically) defined. Their names are usually prefixed and suffixed by the symbol __
, e.g.,
__name__ fibo.
'fibo'
__file__ fibo.
'/home/jsalmon/Documents/Mes_cours/Montpellier/HAX712X/Courses/Python-modules/fibo.py'
… or a module could be a directory
You can also import a full directory (containing many python
files stored in a sub-folder). python
looks for a folder located in sys.path
list. You have already imported the numpy
module, for numerical analysis with python
:
import numpy as np
print(np.array([0, 1, 2, 3]).reshape(2, 2))
print(np.array([0, 1, 2, 3]).mean())
[[0 1]
[2 3]]
1.5
In fact, you have imported the following folder:
np.__path__
['/home/jsalmon/miniconda3/envs/HAX712X/lib/python3.12/site-packages/numpy']
Depending on your installation you might obtain ['/usr/lib/python3.9/site-packages/numpy']
or ['/home/username/anaconda3/lib/python3.7/site-packages/numpy']
if you installed with Anaconda.
More precisely you will get either
>>> np.__file__
'/usr/lib/python3.9/site-packages/numpy/__init__.py'
or
>>> np.__file__
'/home/username/anaconda3/lib/python3.7/site-packages/numpy/__init__.py'
Any (sub-)directory of your python
module should contain an __init__.py
file!
The
__init__.py
file can contain a list of functions to be loaded when the module is imported. It allows to expose functions to users in a concise way.You can also import modules with relative paths, using
.
,..
,...
, etc.
Reference: Absolute vs Relative Imports in Python by Mbithe Nzomo.
The dir()
function
The built-in function dir()
is used to find out which names a module defines. It returns a sorted list of strings:
import fibo, numpy
dir(fibo)
['__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'fib_list',
'fib_print']
dir(numpy)
['ALLOW_THREADS',
'BUFSIZE',
'CLIP',
'DataSource',
'ERR_CALL',
'ERR_DEFAULT',
'ERR_IGNORE',
'ERR_LOG',
'ERR_PRINT',
'ERR_RAISE',
'ERR_WARN',
'FLOATING_POINT_SUPPORT',
'FPE_DIVIDEBYZERO',
'FPE_INVALID',
'FPE_OVERFLOW',
'FPE_UNDERFLOW',
'False_',
'Inf',
'Infinity',
'MAXDIMS',
'MAY_SHARE_BOUNDS',
'MAY_SHARE_EXACT',
'NAN',
'NINF',
'NZERO',
'NaN',
'PINF',
'PZERO',
'RAISE',
'RankWarning',
'SHIFT_DIVIDEBYZERO',
'SHIFT_INVALID',
'SHIFT_OVERFLOW',
'SHIFT_UNDERFLOW',
'ScalarType',
'True_',
'UFUNC_BUFSIZE_DEFAULT',
'UFUNC_PYVALS_NAME',
'WRAP',
'_CopyMode',
'_NoValue',
'_UFUNC_API',
'__NUMPY_SETUP__',
'__all__',
'__builtins__',
'__cached__',
'__config__',
'__deprecated_attrs__',
'__dir__',
'__doc__',
'__expired_functions__',
'__file__',
'__former_attrs__',
'__future_scalars__',
'__getattr__',
'__loader__',
'__name__',
'__package__',
'__path__',
'__spec__',
'__version__',
'_add_newdoc_ufunc',
'_builtins',
'_distributor_init',
'_financial_names',
'_get_promotion_state',
'_globals',
'_int_extended_msg',
'_mat',
'_no_nep50_warning',
'_pyinstaller_hooks_dir',
'_pytesttester',
'_set_promotion_state',
'_specific_msg',
'_typing',
'_using_numpy2_behavior',
'_utils',
'abs',
'absolute',
'add',
'add_docstring',
'add_newdoc',
'add_newdoc_ufunc',
'all',
'allclose',
'alltrue',
'amax',
'amin',
'angle',
'any',
'append',
'apply_along_axis',
'apply_over_axes',
'arange',
'arccos',
'arccosh',
'arcsin',
'arcsinh',
'arctan',
'arctan2',
'arctanh',
'argmax',
'argmin',
'argpartition',
'argsort',
'argwhere',
'around',
'array',
'array2string',
'array_equal',
'array_equiv',
'array_repr',
'array_split',
'array_str',
'asanyarray',
'asarray',
'asarray_chkfinite',
'ascontiguousarray',
'asfarray',
'asfortranarray',
'asmatrix',
'atleast_1d',
'atleast_2d',
'atleast_3d',
'average',
'bartlett',
'base_repr',
'binary_repr',
'bincount',
'bitwise_and',
'bitwise_not',
'bitwise_or',
'bitwise_xor',
'blackman',
'block',
'bmat',
'bool_',
'broadcast',
'broadcast_arrays',
'broadcast_shapes',
'broadcast_to',
'busday_count',
'busday_offset',
'busdaycalendar',
'byte',
'byte_bounds',
'bytes_',
'c_',
'can_cast',
'cast',
'cbrt',
'cdouble',
'ceil',
'cfloat',
'char',
'character',
'chararray',
'choose',
'clip',
'clongdouble',
'clongfloat',
'column_stack',
'common_type',
'compare_chararrays',
'compat',
'complex128',
'complex256',
'complex64',
'complex_',
'complexfloating',
'compress',
'concatenate',
'conj',
'conjugate',
'convolve',
'copy',
'copysign',
'copyto',
'corrcoef',
'correlate',
'cos',
'cosh',
'count_nonzero',
'cov',
'cross',
'csingle',
'ctypeslib',
'cumprod',
'cumproduct',
'cumsum',
'datetime64',
'datetime_as_string',
'datetime_data',
'deg2rad',
'degrees',
'delete',
'deprecate',
'deprecate_with_doc',
'diag',
'diag_indices',
'diag_indices_from',
'diagflat',
'diagonal',
'diff',
'digitize',
'disp',
'divide',
'divmod',
'dot',
'double',
'dsplit',
'dstack',
'dtype',
'dtypes',
'e',
'ediff1d',
'einsum',
'einsum_path',
'emath',
'empty',
'empty_like',
'equal',
'errstate',
'euler_gamma',
'exceptions',
'exp',
'exp2',
'expand_dims',
'expm1',
'expm1x',
'extract',
'eye',
'fabs',
'fastCopyAndTranspose',
'fft',
'fill_diagonal',
'find_common_type',
'finfo',
'fix',
'flatiter',
'flatnonzero',
'flexible',
'flip',
'fliplr',
'flipud',
'float128',
'float16',
'float32',
'float64',
'float_',
'float_power',
'floating',
'floor',
'floor_divide',
'fmax',
'fmin',
'fmod',
'format_float_positional',
'format_float_scientific',
'format_parser',
'frexp',
'from_dlpack',
'frombuffer',
'fromfile',
'fromfunction',
'fromiter',
'frompyfunc',
'fromregex',
'fromstring',
'full',
'full_like',
'gcd',
'generic',
'genfromtxt',
'geomspace',
'get_array_wrap',
'get_include',
'get_printoptions',
'getbufsize',
'geterr',
'geterrcall',
'geterrobj',
'gradient',
'greater',
'greater_equal',
'half',
'hamming',
'hanning',
'heaviside',
'histogram',
'histogram2d',
'histogram_bin_edges',
'histogramdd',
'hsplit',
'hstack',
'hypot',
'i0',
'identity',
'iinfo',
'imag',
'in1d',
'index_exp',
'indices',
'inexact',
'inf',
'info',
'infty',
'inner',
'insert',
'int16',
'int32',
'int64',
'int8',
'int_',
'intc',
'integer',
'interp',
'intersect1d',
'intp',
'invert',
'is_busday',
'isclose',
'iscomplex',
'iscomplexobj',
'isfinite',
'isfortran',
'isin',
'isinf',
'isnan',
'isnat',
'isneginf',
'isposinf',
'isreal',
'isrealobj',
'isscalar',
'issctype',
'issubclass_',
'issubdtype',
'issubsctype',
'iterable',
'ix_',
'kaiser',
'kernel_version',
'kron',
'lcm',
'ldexp',
'left_shift',
'less',
'less_equal',
'lexsort',
'lib',
'linalg',
'linspace',
'little_endian',
'load',
'loadtxt',
'log',
'log10',
'log1p',
'log2',
'logaddexp',
'logaddexp2',
'logical_and',
'logical_not',
'logical_or',
'logical_xor',
'logspace',
'longcomplex',
'longdouble',
'longfloat',
'longlong',
'lookfor',
'ma',
'mask_indices',
'mat',
'matmul',
'matrix',
'max',
'maximum',
'maximum_sctype',
'may_share_memory',
'mean',
'median',
'memmap',
'meshgrid',
'mgrid',
'min',
'min_scalar_type',
'minimum',
'mintypecode',
'mod',
'modf',
'moveaxis',
'msort',
'multiply',
'nan',
'nan_to_num',
'nanargmax',
'nanargmin',
'nancumprod',
'nancumsum',
'nanmax',
'nanmean',
'nanmedian',
'nanmin',
'nanpercentile',
'nanprod',
'nanquantile',
'nanstd',
'nansum',
'nanvar',
'nbytes',
'ndarray',
'ndenumerate',
'ndim',
'ndindex',
'nditer',
'negative',
'nested_iters',
'newaxis',
'nextafter',
'nonzero',
'not_equal',
'numarray',
'number',
'obj2sctype',
'object_',
'ogrid',
'oldnumeric',
'ones',
'ones_like',
'outer',
'packbits',
'pad',
'partition',
'percentile',
'pi',
'piecewise',
'place',
'poly',
'poly1d',
'polyadd',
'polyder',
'polydiv',
'polyfit',
'polyint',
'polymul',
'polynomial',
'polysub',
'polyval',
'positive',
'power',
'printoptions',
'prod',
'product',
'promote_types',
'ptp',
'put',
'put_along_axis',
'putmask',
'quantile',
'r_',
'rad2deg',
'radians',
'random',
'ravel',
'ravel_multi_index',
'real',
'real_if_close',
'rec',
'recarray',
'recfromcsv',
'recfromtxt',
'reciprocal',
'record',
'remainder',
'repeat',
'require',
'reshape',
'resize',
'result_type',
'right_shift',
'rint',
'roll',
'rollaxis',
'roots',
'rot90',
'round',
'round_',
'row_stack',
's_',
'safe_eval',
'save',
'savetxt',
'savez',
'savez_compressed',
'sctype2char',
'sctypeDict',
'sctypes',
'searchsorted',
'select',
'set_numeric_ops',
'set_printoptions',
'set_string_function',
'setbufsize',
'setdiff1d',
'seterr',
'seterrcall',
'seterrobj',
'setxor1d',
'shape',
'shares_memory',
'short',
'show_config',
'show_runtime',
'sign',
'signbit',
'signedinteger',
'sin',
'sinc',
'single',
'singlecomplex',
'sinh',
'size',
'sometrue',
'sort',
'sort_complex',
'source',
'spacing',
'split',
'sqrt',
'square',
'squeeze',
'stack',
'std',
'str_',
'string_',
'subtract',
'sum',
'swapaxes',
'take',
'take_along_axis',
'tan',
'tanh',
'tensordot',
'test',
'testing',
'tile',
'timedelta64',
'trace',
'tracemalloc_domain',
'transpose',
'trapz',
'tri',
'tril',
'tril_indices',
'tril_indices_from',
'trim_zeros',
'triu',
'triu_indices',
'triu_indices_from',
'true_divide',
'trunc',
'typecodes',
'typename',
'typing',
'ubyte',
'ufunc',
'uint',
'uint16',
'uint32',
'uint64',
'uint8',
'uintc',
'uintp',
'ulonglong',
'unicode_',
'union1d',
'unique',
'unpackbits',
'unravel_index',
'unsignedinteger',
'unwrap',
'ushort',
'vander',
'var',
'vdot',
'vectorize',
'version',
'void',
'vsplit',
'vstack',
'where',
'who',
'zeros',
'zeros_like']
To list every element in your symbol table simply call dir()
.
Reference: Python doc of the dir
function
Namespaces
A namespace is a set of names (functions, variables, etc.). Different namespaces can co-exist at a given time but are completely isolated. In this way, you can control which function you are using.
A namespace containing all the built-in names is created when we start the python
interpreter and exists as long we don’t exit.
3) cos(
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[11], line 1 ----> 1 cos(3) NameError: name 'cos' is not defined
You need to import a package for mathematical functions:
import math, numpy as np
print(math.cos(3), np.cos(3))
-0.9899924966004454 -0.9899924966004454
Reference: Python Namespace and Scope tutorial
The Module Search Path
When a module named spam
is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py
in a list of directories given by the variable sys.path
. The variable sys.path
is initialized from these locations:
The directory containing the input script (or the current directory when no file is specified).
The environment variable
PYTHONPATH
(a list of directory names, with the same syntax as the shell variablePATH
).
Reference: Python documentation, The Module Search Path
Checking if a module exists
Find the loader for a module, optionally within the specified path.
import importlib
= importlib.util.find_spec("spam")
spam_spec = spam_spec is not None
found print(found)
False
Now,
import numpy
= importlib.util.find_spec("numpy")
numpy_spec print(numpy_spec)
ModuleSpec(name='numpy', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7739304b8470>, origin='/home/jsalmon/miniconda3/envs/HAX712X/lib/python3.12/site-packages/numpy/__init__.py', submodule_search_locations=['/home/jsalmon/miniconda3/envs/HAX712X/lib/python3.12/site-packages/numpy'])
should return more information and where the loader is.
References:
Lazy import
A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module name is encountered in an import
statement.
To force a module to be reloaded, you can use importlib.reload()
.
Remark: when using ipython
(interactive python
, an ancestor of the jupyter notebook
), one can use the “magic” command %autoreload 2
References: - Python doc on reload - IPython autoreload
“Compiled” python
files
To speed up loading modules, python
caches the compiled version of each module in the __pycache__
directory under the name module.version.pyc
, where the version encodes the format of the compiled file; it generally contains the python
version number.
For example, in CPython
release 3.3 the compiled version of spam.py would be cached as __pycache__/spam.cpython-33.pyc
. This naming convention allows compiled modules from different releases and different versions of python
to coexist.
You should add __pycache__
entry in your .gitignore
file to avoid adding a compiled python
file to your project.
The python
Package Index (Pypi) repository
The python
Package Index, abbreviated as PyPI, is the official third-party software repository for python
. PyPI primarily hosts python
packages in the form of archives called sdists
(source distributions) or pre-compiled “wheels”.
Pip
pip
is a de facto standard package-management system used to install and manage software packages from PyPi.
$ pip install some-package-name
$ pip uninstall some-package-name $ pip search some-package-name
It is possible to install a local module with pip
$ pip install /path/to/my/local/module
where /path/to/my/local/module
is the path to the module. But if some changes occur in the /path/to/my/local/module
folder, the module will not be reloaded. This might be annoying during the development stage. To force python
to reload the module at each change call, consider the -e
option:
$ pip install -e /path/to/my/local/module
Creating a python
module
We are going to create a simple package. The structure is always similar, and an example can be found for instance here.
Reference: How To Package Your Python Code
Picking a name
python
module/package names should generally follow the following constraints:
- All lowercase (🇫🇷: en minuscule)
- Unique on PyPI, even if you do not want to make your package publicly available (you might want to specify it privately as a dependency later)
- Underscore-separated or no word separators at all, and do not use hyphens (i.e., use
_
not-
).
We are going to create a module called biketrauma
to visualize the bicycle_db
(Source: here, adapted for the original version from there) used in the some of these lectures.
Module structure
The initial directory structure for biketrauma
should look like this:
packaging_tutorial/
├── biketrauma/
│ ├── __init__.py
│ └── data/
├── setup.py └── .gitignore
The top-level directory is the root of our Version Control System (e.g. git) repository packaging_tutorial
. The sub-directory, biketrauma
, is the actual python
module.
Sub-modules
The final directory structure of our module will look like:
packaging_tutorial/
├── biketrauma/
│ ├── __init__.py
│ ├── io/
│ │ ├─ __init__.py
│ │ └─ Load_db.py
│ ├── preprocess/
│ │ ├─ __init__.py
│ │ └─ get_accident.py
│ └── vis/
│ │ ├─ __init__.py
│ │ └─ plot_location.py
│ └── data/
│ │ └─ .gitkeep
├── setup.py
├── script.py └── .gitignore
Fix the import
In order to load the functions in the io
, preprocess
and vis
sub-modules, you can add the following lines to the ~/packaging_tutorial/biketrauma/__init__.py
:
from .io.Load_db import Load_db
from .vis.plot_location import plot_location
from .preprocess.get_accident import get_accident
Package the module with setuptools
The main setup configuration file, setup.py
, should contain a single call to setuptools.setup()
, like so:
from setuptools import setup
from biketrauma import __version__ as current_version
setup(='biketrauma',
name=current_version,
version='Visualization of a bicycle accident db',
description='http://github.com/xxxxxxxxxxx.git',
url='xxxxxxxxxxx',
author='xxxxxxxxxx@xxxxxxxxxxxxx.xxx',
author_email='MIT',
license=['biketrauma','biketrauma.io', 'biketrauma.preprocess', 'biketrauma.vis'],
packages=False
zip_safe )
To create a sdist
package (a source distribution):
$ cd ~/packaging_tutorial/ $ python setup.py sdist
This will create dist/biketrauma-0.0.1.tar.gz
inside the top-level directory. You can now install it with
$ pip install ~/packaging_tutorial/dist/biketrauma-0.0.1.tar.gz
References:
Add requirement file
To get a list of the installed packages in your current Venv, you can use the following command:
$ pip freeze > requirements.txt
Unfortunately, it may generate a huge collection of package dependencies. To get a sparser list, you can use pipreqs
.
Upload on PyPI
twine
is a utility for publishing python
packages on PyPI. We are going to use the test repository https://test.pypi.org/.
We have included the data folder in the sub-module tree which is not a good practice: permission may not be granted in the destination dir of the module… A better idea could be to create a data
folder in a cache or temp directory.
References
- Python Packaging User Guide
- Twine, uploads of source, provides additional documentation on using
twine
to upload packages to PyPI.