COMP 333 Introduction to Data Analytics

Summer 2020 Semester 1: Resources

Resources

Cheat Sheets

Probability cheat sheet
Python Basics
Python NumPy
Python pandas
Python MatPlotLib
Python Seaborn
Basic prediction algorithms overview
Python sckit-learn ML
When to use algorithms in Python sckit-learn ML

General Data Analytics


stackoverflow - for questions on technology how-to, eg Python, Latex, Jupyter
kdnuggets - for discussions on topics related to knowledge discovery, so not restricted to data analytics
data carpentry - tutorials on data analytics; see especially the tutorial fo social sciences
software carpentry - tutorials on programming for scientists, including Python
R datasets


Top 12 Essential Command Line Tools for Data Scientists wget, cat, wc, head, tail, find, cut, uniq, awk, grep, sed, history.
The 10 Statistical Techniques Data Scientists Need to Master linear regression, classification (logistical regression, discrimant analysis), resampling methods (bootstraping, cross validation), subsect selection, (best-subset, forward stepwise, backward stepwise, hybrid) shrinkage (ridge regression, lasso), dimension reduction (principal components, partial least squares), nonlinear models (step function, piecewise function, spline, generalized additive models), tree-based methods ( bagging, boosting, random forests), support vector machines, unsupervised learning (principal component analysis, k-means, hierarchical clustering).

Statistics

learning statistics with jamovi: a tutorial for psychology students and other beginners. (Version 0.70) by Navarro DJ and Foxcroft DR (2019). Good online book, but ignore the material on jamovi.

Python

A quick overview to orientate you, A Complete Tutorial to Learn Data Science with Python from Scratch.
The Top 15 Python Libraries for Data Science in 2017
OOP in Python: Classes, Methods and Operator Overloading video Aug 17, 2015.
The Python Tutorial: Classes
Python operator overloading
Intro to NumPy, Bryan Van de Ven, April 2016.
Intro to SciPy, M. Velasco and A. Perera, Feb 2013. Do not read SymPy part.
Intro to MatPlotLib, datacamp, Feb 2013.
Intro to pandas, Slides 70-170, Virginia Tech, Srijith Rajamohan, 2016.

Read Scientific Python Lectures, 2017, chapters 1-5.
pandas tutorial, dataquest, 2016.
Top 8 resources for learning data analysis with pandas, May 2016.

Professor Steven Skiena's Course CSE519 Data Science

This course covers much more than COMP 333 does. The book website has links to video lectures, examples, exercises, and more.
See in particular:
Skiena Lecture 6 -Data Munging video (1:02:20), March 2017.
Skiena Lecture 7 - Data Cleaning video (1:09:50), March 2017.
Skiena Lecture 5 - Correlation video (1:12:34), March 2017.
Skiena Lecture 22 - Clustering video (1:07:54), March 2017.
Skiena Lecture 11 - Visualizing Data video (1:15:07), March 2017.
Skiena Lecture 23: Machine Learning video(1:16:59)

Professors Trevor Hastie and Rob Tibshirani Book and Course on Machine Learning

In-depth introduction to machine learning in 15 hours of expert videos, September 2014.

Story Telling and Visualization

Tamara Munzner's book Visualization Analysis and Design, CRC Press, 2014.

Cole Nussbaumer Knaflic, Storytelling with Data, Wiley, 2015.

Visualization with matplotlib
Visualization with pandas
Seaborn visualization - examples
Visualization with Seaborn tutorial


Last modified on 05 May 2020 by gregb@cs.concordia.ca