Software Engineering for Data Science
The Software Engineering for Data Scientists syllabus is designed to help Python Data Science programmers who need a more efficient and trusted development process. Software engineering best practices related to testing, code design, documentation, debugging and profiling are distilled into hands-on exercises optimised for learning.
Description of the course
The course offers practical examples of how employing basic software engineering principles and tools can benefit Python Data Science professionals. The course assumes a basic level of fluency with Python (e.g. built-in data types, control flow statements) and the PyData ecosystem (e.g. basics of pandas).
Learning objectives
By attending this course, you will learn about:
- Testing and debugging your code for increased trust
- Software design and documentation to produce more maintainable code
- How to produce more robust production code in order to reduce frictions caused by downtime and other code issues
- How to collaborate with other technical team members with more confidence
Syllabus
Fundamentals of software testing
- Overview on Python tools: unit tests, mock, pytest, coverage, Hypothesis
- Defensive programming vs unit testing vs test-driven development
Structuring Python code
- Notebooks vs Scripts vs Packages vs Modules
- Designing maintainable and reusable code
Effective documentation
- docstrings and documentation styles
- Overview on Python tools: sphinx
Logging and Debugging
- Configuring the Python logger mechanism
- Debugging Python code with pdb
Profiling and optimization
- Finding bottlenecks in your Python code
Refactoring Exercises to put everything together