Software Engineering for Data Science

The Software Engineering for Data Scientists syllabus is designed to help Python Data Science programmers who need a more efficient and trusted development process. Software engineering best practices related to testing, code design, documentation, debugging and profiling are distilled into hands-on exercises optimised for learning.

Audience

Data scientists, data analysts, business analysts, researchers who use Python for their analysis and modelling and would like to learn about software engineering best practices. Software engineers who are less familiar with Python and need to interface with Python data science colleagues.

Standard Duration

1.5 day

Tailor this course

We can work with you to customise a course for your team, so you can learn exactly what you need at the right pace.

Description of the course

The course offers practical examples of how employing basic software engineering principles and tools can benefit Python Data Science professionals. The course assumes a basic level of fluency with Python (e.g. built-in data types, control flow statements) and the PyData ecosystem (e.g. basics of pandas).

Learning objectives

By attending this course, you will learn about:

Testing and debugging your code for increased trust
Software design and documentation to produce more maintainable code
How to produce more robust production code in order to reduce frictions caused by downtime and other code issues
How to collaborate with other technical team members with more confidence

Syllabus

Fundamentals of software testing

Overview on Python tools: unit tests, mock, pytest, coverage, Hypothesis
Defensive programming vs unit testing vs test-driven development

Structuring Python code

Notebooks vs Scripts vs Packages vs Modules
Designing maintainable and reusable code

Effective documentation

docstrings and documentation styles
Overview on Python tools: sphinx

Logging and Debugging

Configuring the Python logger mechanism
Debugging Python code with pdb

Profiling and optimization

Finding bottlenecks in your Python code

Refactoring Exercises to put everything together