COURSE DETAILS :
This 3-day workshop is a careful combination of statistical theory, hands-on coding and programming exercises to help students understand — and implement — some of the most widely used, and fundamental, machine learning algorithms.
By building regressors and classifier algorithms from scratch, the student will go beyond applying machine learning models to actually developing their own models — and learn the right approach to fine-tuning the model performance as well as evaluating model fit against unseen data. Upon completion of the workshop, the student will be well versed in an array of important, versatile machine learning algorithms and equipped with the right knowledge to apply them to future datasets in their daily job.
25 September:
DATA SCIENCE EXPLAINED
Description of course materials and the learning environment
A comprehensive view on the roles of data science, the relating professions, career prospects and outlook.
Description of the workflow, tools, setup and programming languages in the course
R PROGRAMMING BASICS
Setting up the Workspace and Environment
Working with data types: scalar, vector, list, matrix, data frame
R’s built-in functions
Inspecting data using built-in functions
R’s plotting capabilities
R Markdown and reproducible research
STATISTICS FUNDAMENTAL
Demonstrate the use of various statistics in exploratory data analysis: 5-number summary, mean, mode, interquartile range, variance, standard deviation and correlation
Plots: scatterplots, scatterplot matrices, line graphs, histogram, ab-line, x and y-axis styling, plot title, tips and tricks for plotting in R
Quick way to get a “sense” of the distribution of our dataset
Confidence intervals and Hypothesis Testing
MACHINE LEARNING FUNDAMENTAL
Prediction with linear models
Precision and Recall
Prediction on unseen data
26 September:
DATA WRANGLING
Continuous variables and Categorical variables
Factors and levels
Description of the workflow, tools, setup and programming languages in the course
Reading from different data formats: CSVs, JSON, webpages, API
Various data preprocessing and data cleansing techniques
LINEAR REGRESSION
Code examples of linear regression
Statistical principles behind least squares regression
Linearity assumption
Dependent and Independent variables
Inspecting data using built-in functions
R-squared
Interpreting coefficients
IMPROVING MODEL’S PERFORMANCE
Limitations of common machine learning techniques
Preventing overfitting
Bias-Variance Tradeoff
k-fold Cross Validation
27 September:
MULTIVARIATE REGRESSION
Interaction term
Confounding variables
Measures of fit
ANOVA
CLASSIFICATION IN MACHINE LEARNING
k Nearest Neighbors and distance function
Logistic Regression and the sigmoid curve
Decision Tree
Random Forest
Bootstrap Aggregation and Boosting
Multiclass classification
Evaluating model’s performance
BUILDING A CLASSIFICATION ALGORITHM
Finding datasets
Feature engineering
Training on unseen data