Data Science Fundamentals: Machine Learning

Date & Time
2017-09-25T11:00:00
Food and beverages served?
no
Type of event
Developer workshops
COURSE DETAILS : This 3-day workshop is a careful combination of statistical theory, hands-on coding and programming exercises to help students understand — and implement — some of the most widely used, and fundamental, machine learning algorithms. By building regressors and classifier algorithms from scratch, the student will go beyond applying machine learning models to actually developing their own models — and learn the right approach to fine-tuning the model performance as well as evaluating model fit against unseen data. Upon completion of the workshop, the student will be well versed in an array of important, versatile machine learning algorithms and equipped with the right knowledge to apply them to future datasets in their daily job. 25 September: DATA SCIENCE EXPLAINED Description of course materials and the learning environment A comprehensive view on the roles of data science, the relating professions, career prospects and outlook. Description of the workflow, tools, setup and programming languages in the course R PROGRAMMING BASICS Setting up the Workspace and Environment Working with data types: scalar, vector, list, matrix, data frame R’s built-in functions Inspecting data using built-in functions R’s plotting capabilities R Markdown and reproducible research STATISTICS FUNDAMENTAL Demonstrate the use of various statistics in exploratory data analysis: 5-number summary, mean, mode, interquartile range, variance, standard deviation and correlation Plots: scatterplots, scatterplot matrices, line graphs, histogram, ab-line, x and y-axis styling, plot title, tips and tricks for plotting in R Quick way to get a “sense” of the distribution of our dataset Confidence intervals and Hypothesis Testing MACHINE LEARNING FUNDAMENTAL Prediction with linear models Precision and Recall Prediction on unseen data 26 September: DATA WRANGLING Continuous variables and Categorical variables Factors and levels Description of the workflow, tools, setup and programming languages in the course Reading from different data formats: CSVs, JSON, webpages, API Various data preprocessing and data cleansing techniques LINEAR REGRESSION Code examples of linear regression Statistical principles behind least squares regression Linearity assumption Dependent and Independent variables Inspecting data using built-in functions R-squared Interpreting coefficients IMPROVING MODEL’S PERFORMANCE Limitations of common machine learning techniques Preventing overfitting Bias-Variance Tradeoff k-fold Cross Validation 27 September: MULTIVARIATE REGRESSION Interaction term Confounding variables Measures of fit ANOVA CLASSIFICATION IN MACHINE LEARNING k Nearest Neighbors and distance function Logistic Regression and the sigmoid curve Decision Tree Random Forest Bootstrap Aggregation and Boosting Multiclass classification Evaluating model’s performance BUILDING A CLASSIFICATION ALGORITHM Finding datasets Feature engineering Training on unseen data