HAX712X: Software development for data science

(Almost) everything you need to know as an applied mathematician/statistician concerning coding and system administration.


This course material was improved with the help of some students including:


Students are expected to know basic notions of probabilities, optimization, linear algebra and statistics for this course. Some rudiments in coding are also expected (if, for, while, functions) but not mandatory.

Course description

This course focuses on discovering good coding practices (the language used is Python, but some elements of bash and git will also be useful) for professional coding. A special focus on data processing and visualization will be at the heart of the course. We will mostly focus on basic programming concepts, as well as on discovering the Python scientific libraries, including numpy, scipy, pandas, matplotlib, seaborn. Beyond pandas ninja skills, we will also introduce modern practices for coders: (unitary) tests, version control, documentation generation, etc.

Date Teacher Details
11/09/2023 BC Command-line tools
22/09/2023 BC Version control with Git
29/09/2023 BC IDE / Python virtual environment
06/10/2023 BC+JS Creating a Python Module, Classes & Exceptions
13/10/2023 JS Markdown to html (Quarto), Unit Tests
20/10/2023 JS Pandas: Titanic dataset, Pandas: Airparif dataset, Pandas: Bikes dataset
27/10/2023 BC Continuous Integration (CI), Generating a documentation (Sphinx), Deploy on Pipy
10/11/2023 JS SciPy
17/11/2023 JS Debugging & Profiling
20/11/2023 JS Sparse matrices, graphs and maps
15/12/2023 BC+JS The end: Project presentations


For this course, the grading consists of two projects: one group project (group composition available on Moodle) and a personal one.

Please carefully read the projects description page.


1 supplementary point on the final grade of the course can be obtained for contributions to improve the course material (practicals, Readme, etc.). See the Bonus section for more details on how to proceed.

Books and other resources

The resources for the course are available on the present GitHub repository. Additional elementary elements (in French) on Python are available in the course HLMA310 and the associated lecture notes IntroPython.pdf.

Moodle webpage

The Moodle web page is available to registered students only.

Additional resources

Back to top