HAX712X: Software development for data science
(Almost) everything you need to know as an applied mathematician/statistician concerning coding and system administration.
Teachers
- Joseph Salmon: joseph.salmon@umontpellier.fr,
- Benjamin Charlier: benjamin.charlier@umontpellier.fr
This course material was improved with the help of some students including:
Prerequisite
Students are expected to know basic notions of probabilities, optimization, linear algebra and statistics for this course. Some rudiments in coding are also expected (if, for, while, functions) but not mandatory.
Course description
This course focuses on discovering good coding practices (the language used is Python, but some elements of bash and git will also be useful) for professional coding. A special focus on data processing and visualization will be at the heart of the course. We will mostly focus on basic programming concepts, as well as on discovering the Python scientific libraries, including numpy, scipy, pandas, matplotlib, seaborn
. Beyond pandas
ninja skills, we will also introduce modern practices for coders: (unitary) tests, version control, documentation generation, etc.
Date | Teacher | Details |
---|---|---|
13/09/2024 | BC | Command-line tools |
17/09/2024 | BC | Version control with Git |
27/09/2024 | BC | IDE / Python virtual environment |
04/10/2024 | JS | Creating a Python Module, Classes & Exceptions |
11/10/2024 | JS | Markdown to html (Quarto), Unit Tests |
18/10/2024 | JS | Pandas: Titanic dataset, Pandas: Airparif dataset, Pandas: Bikes dataset |
25/10/2024 | JS | Continuous Integration (CI), Generating a documentation (Sphinx), Deploy on Pipy |
08/11/2024 | JS | SciPy |
15/11/2024 | JS | Debugging & Profiling |
29/11/2024 | JS | Sparse matrices, graphs and maps |
06/12/2024 | JS | Sparse matrices, graphs and maps |
13/12/2024 | JS + BC | The end: Project presentations |
Grading
For this course, the grading consists of two projects: one group project (group composition available on Moodle) and a personal one.
Please carefully read the projects description page.
Bonus
1 supplementary point on the final grade of the course can be obtained for contributions to improve the course material (practicals, Readme, etc.). See the Bonus section for more details on how to proceed.
Books and other resources
The resources for the course are available on the present GitHub
repository. Additional elementary elements (in French) on Python are available in the course HLMA310 and the associated lecture notes IntroPython.pdf.
Moodle webpage
The Moodle web page is available to registered students only.
Additional resources
- (Python): Cours de Python, Univ. Paris Diderot
- (General): The Missing Semester of Your CS Education
- (General): Coding for Economists,
- (Algorithmic basis): Algorithms, by Jeff Erickson
- (Data Science): Python Data Science Handbook, With Application to Understanding Data by J. Van DerPlas, 2016;
videos: Reproducible Data Analysis in Jupyter - (General) Skiena, The algorithm design manual, 1998
- (General) Courant et al., Informatique pour tous en classes préparatoires aux grandes écoles: Manuel d’algorithmique et programmation structurée avec Python, 2013, (french)
- (General/data science) Guttag, Introduction to Computation and Programming, 2016
- (Code and style) Boswell et Foucher, The Art of Readable Code, 2011
- (Scientific computing tools for Python) Scipy lectures notes
- (Datasets) Open Climate Data