Software Engineering for Data Science

The Software Engineering for Data Scientists syllabus is designed to help Python Data Science programmers who need a more efficient and trusted development process. Software engineering best practices related to testing, code design, documentation, debugging and profiling are distilled into hands-on exercises optimised for learning.

Description of the course

The course offers practical examples of how employing basic software engineering principles and tools can benefit Python Data Science professionals. The course assumes a basic level of fluency with Python (e.g. built-in data types, control flow statements) and the PyData ecosystem (e.g. basics of pandas).

Learning objectives

By attending this course, you will learn about:

  • Testing and debugging your code for increased trust
  • Software design and documentation to produce more maintainable code
  • How to produce more robust production code in order to reduce frictions caused by downtime and other code issues
  • How to collaborate with other technical team members with more confidence


Fundamentals of software testing

  • Overview on Python tools: unit tests, mock, pytest, coverage, Hypothesis
  • Defensive programming vs unit testing vs test-driven development

Structuring Python code

  • Notebooks vs Scripts vs Packages vs Modules
  • Designing maintainable and reusable code

Effective documentation

  • docstrings and documentation styles
  • Overview on Python tools: sphinx

Logging and Debugging

  • Configuring the Python logger mechanism
  • Debugging Python code with pdb

Profiling and optimization

  • Finding bottlenecks in your Python code

Refactoring Exercises to put everything together

Get in touch

For any enquiry about our services, please contact: