event

Data Science Fundamentals: Machine Learning
DATE & TIME :

Monday, 25 September, 2017 - 06:00pm to 09:00pm

FOOD AND BEVERAGES SERVED?

no

VENUE:

Orange Meeting Room

TYPE OF EVENT:

Developer workshops

COURSE DETAILS :

This 3-day workshop is a careful combination of statistical theory, hands-on coding and programming exercises to help students understand — and implement — some of the most widely used, and fundamental, machine learning algorithms.

By building regressors and classifier algorithms from scratch, the student will go beyond applying machine learning models to actually developing their own models — and learn the right approach to fine-tuning the model performance as well as evaluating model fit against unseen data. Upon completion of the workshop, the student will be well versed in an array of important, versatile machine learning algorithms and equipped with the right knowledge to apply them to future datasets in their daily job.

25 September:
DATA SCIENCE EXPLAINED

  • Description of course materials and the learning environment
  • A comprehensive view on the roles of data science, the relating professions, career prospects and outlook.
  • Description of the workflow, tools, setup and programming languages in the course

R PROGRAMMING BASICS

  • Setting up the Workspace and Environment
  • Working with data types: scalar, vector, list, matrix, data frame
  • R’s built-in functions
  • Inspecting data using built-in functions
  • R’s plotting capabilities
  • R Markdown and reproducible research

STATISTICS FUNDAMENTAL

  • Demonstrate the use of various statistics in exploratory data analysis: 5-number summary, mean, mode, interquartile range, variance, standard deviation and correlation
  • Plots: scatterplots, scatterplot matrices, line graphs, histogram, ab-line, x and y-axis styling, plot title, tips and tricks for plotting in R
  • Quick way to get a “sense” of the distribution of our dataset
  • Confidence intervals and Hypothesis Testing

MACHINE LEARNING FUNDAMENTAL

  • Prediction with linear models
  • Precision and Recall
  • Prediction on unseen data

26 September:
DATA WRANGLING

  • Continuous variables and Categorical variables
  • Factors and levels
  • Description of the workflow, tools, setup and programming languages in the course
  • Reading from different data formats: CSVs, JSON, webpages, API
  • Various data preprocessing and data cleansing techniques

LINEAR REGRESSION

  • Code examples of linear regression
  • Statistical principles behind least squares regression
  • Linearity assumption
  • Dependent and Independent variables
  • Inspecting data using built-in functions
  • R-squared
  • Interpreting coefficients

IMPROVING MODEL’S PERFORMANCE

  • Limitations of common machine learning techniques
  • Preventing overfitting
  • Bias-Variance Tradeoff
  • k-fold Cross Validation

27 September:
MULTIVARIATE REGRESSION

  • Interaction term
  • Confounding variables
  • Measures of fit
  • ANOVA

CLASSIFICATION IN MACHINE LEARNING

  • k Nearest Neighbors and distance function
  • Logistic Regression and the sigmoid curve
  • Decision Tree
  • Random Forest
  • Bootstrap Aggregation and Boosting
  • Multiclass classification
  • Evaluating model’s performance

BUILDING A CLASSIFICATION ALGORITHM

  • Finding datasets
  • Feature engineering
  • Training on unseen data