HAX712X: Software development for data science

(Almost) everything you need to know as an applied mathematician/statistician concerning coding and system administration.

Teachers

This course material was improved with the help of some students including:

Prerequisite

Students are expected to know basic notions of probabilities, optimization, linear algebra and statistics for this course. Some rudiments in coding are also expected (if, for, while, functions) but not mandatory.

Course description

This course focuses on discovering good coding practices (the language used is Python, but some elements of bash and git will also be useful) for professional coding. A special focus on data processing and visualization will be at the heart of the course. We will mostly focus on basic programming concepts, as well as on discovering the Python scientific libraries, including numpy, scipy, pandas, matplotlib, seaborn. Beyond pandas ninja skills, we will also introduce modern practices for coders: (unitary) tests, version control, documentation generation, etc.

Date Teacher Details
13/09/2024 BC Command-line tools
17/09/2024 BC Version control with Git
27/09/2024 BC IDE / Python virtual environment
04/10/2024 JS Creating a Python Module, Classes & Exceptions
11/10/2024 JS Markdown to html (Quarto), Unit Tests
18/10/2024 JS Pandas: Titanic dataset, Pandas: Airparif dataset, Pandas: Bikes dataset
25/10/2024 JS Continuous Integration (CI), Generating a documentation (Sphinx), Deploy on Pipy
08/11/2024 JS SciPy
15/11/2024 JS Debugging & Profiling
29/11/2024 JS Sparse matrices, graphs and maps
06/12/2024 JS Sparse matrices, graphs and maps
13/12/2024 JS + BC The end: Project presentations

Grading

For this course, the grading consists of two projects: one group project (group composition available on Moodle) and a personal one.

Please carefully read the projects description page.

Bonus

1 supplementary point on the final grade of the course can be obtained for contributions to improve the course material (practicals, Readme, etc.). See the Bonus section for more details on how to proceed.

Books and other resources

The resources for the course are available on the present GitHub repository. Additional elementary elements (in French) on Python are available in the course HLMA310 and the associated lecture notes IntroPython.pdf.

Moodle webpage

The Moodle web page is available to registered students only.

Additional resources

Back to top