Workshop: Data Science for Software Engineers

This is the brand new version of the hugely popular “Data Science for Developers” training. After two years of teaching thousands of Engineers around the world, this training has been rebuilt from the ground up to squeeze in more information, have bigger theories and provide better positioning.

In this beginner-level course, we will start the day by establishing what Data Science is and how it is used by companies large and small. You will learn about how to develop a Data Science project and how it differs from “normal” Software Engineering. Note that I use the word Data Science to encompass Machine Learning (ML), Exploratory Data Analysis (EDA), Data Mining, Analytics, Deep Learning, Artificial Intelligence (AI), etc. etc.

Next we will cover the “three key phases of Data Science”: data cleaning, modelling and evaluation. With these fundamentals, along with the extensive practical worksheets, you will be able to undertake and succeed in a simple Data Science project.

In intermediate course on the second day, we will delve into the most important topics in Data Science. The aim is to provide sufficient breadth to give you the appreciation so you can pick and choose to suit your specific problem.

The content matches the tasks and topics that production Engineers face on a day-to-day basis. Indeed, surveys suggest that more than half of an Engineer’s time is spent finding, collecting, organising and cleaning data. Therefore, we spend a significant amount of time learning how to handle and understand data.

Another goal of the intermediate course is to give a broad understanding of as many models as possible in the time available. If you are aware of the major categories, types and instances of models, then you are better positioned to be able to choose the optimal model for the problem.

This training is unique, because nowhere else do you see Data Science laid bare. The materials emphasise the common themes between algorithms, which helps Data Science “click”. Mathematics is avoided as much as practical to instead provide an intuitive understanding.

Who will benefit:

  • Engineers needing an introduction to Data Science.
  • People that want to understand the tools and technologies behind the hype.
  • Beginner Data Scientists wanting end-to-end practical experience and industry insight.

Prerequisites:

  • Some Python experience would be beneficial
  • Secondary School mathematics
  • A charged laptop with a browser that can connect to the internet


Topics:
* = Time permitting

  • Day 1: Introduction
    • Applications
    • Disciplines
    • Lifecycle
  • Technical Overview
    • Techniques
    • Technologies
    • Decisions
  • Phase 1: Introduction to Working With Data
    • Visualising data
    • Scaling data
    • Dealing with corrupted data
  • Phase 2: Introduction to Modelling
    • Classification
    • Regression
    • Clustering
  • Phase 3: Introduction to Evaluation
    • Numerical evaluation
    • Visual evaluation
  • Many in-depth practical examples demonstrating the day’s concepts


  • Day 2: Introduction
  • Probability
    • Evidence
    • Probabilities
    • Probability distributions
    • Summary statistics
  • Generalisation and Overfitting *
  • In-depth Data Cleaning
    • Visualisation 2
    • Data availability and consistency
    • Types of data
    • Corrupted data
    • Transforming data
    • Scaling data 2
    • Feature engineering (derived data)
    • Feature selection
    • Time series data
    • Related topics
  • In depth model evaluation *
    • Technical numerical evaluation *
    • Business numerical evaluation *
    • Technical visual evaluation and analysis *
    • Business visual evaluation *
  • Dimensionality reduction
    • PCA/SDA/LDA/QDA
    • Manifold learning
  • Overview of models
    • Classification
    • Regression
    • Clustering
  • Grand challenge *