STAT1013 Data Science Toolbox

📝 Administrative information

  • ⏲️ Lectures: Wed. 11:30 - 14:15, Lady Shaw Building LT6
  • 👨‍🏫 Instructor: Ben Dai
  • ⌨️ colab: notebook or click Open in Colab
  • 💻 GitHub: CUHK-STAT1013
  • Office hour: Wed. 15:30 - 16:30 (by Appointment only)

Slides Open In Colab

🧾 Course Content

🖥️ Description:

We will learn about the key concepts, techniques and toolkits of Data Science. The course provides a general overview of the information, issues, and resources used by data analysts and data scientists. This course consists of two parts. The first provides a conceptual overview of the concepts involved in transforming data into useful knowledge. The second is a hands-on overview of the program’s tools.

👌 What you’ll learn:

  • Understand principles behind statistical inference and statistical machine learning methods;
  • Familiar with Data Science Toolbox: GitHub; Python: numpy, pandas, seaborn, sklearn, Colab; Jupyter notebook, Markdown;
  • Analyze continuous and categorical data using statistics, Python programming based on Colab and software as appropriate;
  • Ability in using advanced Python tools to describe, summarize, and visualize dataset;
  • Understand and implement good coding practices, including statistical inference on A/B test, and statistical learning/prediction based on tabular data.

🏗️ Prerequisites:

  • Calculus & Linear algebra: inner product, matrix-vector product.
  • Basic Statistics: STAT1011 level statistics, basics of distributions, probabilities, conditional probability, mean, standard deviation, etc.
  • Python: basic grammar; Numpy, pandas, TensorFlow libraries

📋 Reference Textbooks

The following textbooks are useful, but none are exactly same with our course.

💯 Grading (tentative)

👨‍💻 Coursework:

  • Homeworks (15%): There will be three homework. Please submit your homework by a well-documented Jupyter Notebook.
    • HW 1 (5%): Statistics in Python
    • HW 2 (5%): Markdown Documentation
    • HW 3 (5%): A/B Test Exercise
  • InClass Quiz (coding and exercise) (30%): Implement statistics/ML methods to make prediction

  • Final project / Essay (55%): A full data analysis provided in form of pdf report and Jupyter notebook. (1) An executable notebook containing the performed analysis on the data; (2) A technique report includes the (i) A/B test you used (ii) result analysis

📝 Honesty: Our course places very high importance on honesty in coursework submitted by students, and adopts a policy of zero tolerance on academic dishonesty.

📢 (Late) submission: Homework/projects are submitted via BlackBoard, the quiz is submitted via kaggle. We will penalize 10% credits per 6 hours for the late submission.

All students welcome: we are happy to have audiences in our lecture.

🗓️ Schedule (tentative)

DateDescriptionCourse MaterialsEventsDeadlines
PreCourse information
[pdf]

Python Tutorial
[Youtube]
Suggested Readings:

- learnpython.org
- The Python Tutorial (official Python documentation)

  
Jan 11Overview of Data Science
[pdf] [notebook]
Suggested Readings:

- Data Scientist: The Sexiest Job of the 21st Century
- The 4 Biggest Trends In Big Data And Analytics
- Types Of Data

  
Jan 18Statistics in Python I
[pdf] [notebook]
Suggested Readings:

- Examples of Normal Distribution and Probability In Every Day Life
- Introduction to probability, statistics, and random processes

  
Feb 1Statistics in Python II
[notebook]
Suggested Readings:

- Learning Statistics with Python by Danielle Navarro and Ethan Weed
- Statistics in Python by Gaël Varoquaux
- seaborn - tutorial
- The Ultimate Python Seaborn Tutorial: Gotta Catch ‘Em All

  
Feb 8Statistics in Python III
[pdf] [notebook]
Suggested Readings:

- The Monty Hall Problem

HW1 [colab]
release

 
Feb 15Markdown Documentation
[cheat sheet][CV template][MHP report]
Suggested Readings:

- Markdown Cheet Sheet

Proj: PART-A [demo]
release
HW1 [sol]
due
Feb 22Github
[notebook]
Suggested Readings:

- Github Docs
- Websites for you and your projects
- All the Math you need to conduct an A/B test

HW2 [md]
release


Mar 1Prob, LLN, CLT
[notebook]
Suggested Readings:

- Central Limit Theorem and the Law of Large Numbers
- Q-Q Plots Explained



Proj: PART-A
due

Mar 15A/B Test I
[notebook][pdf]
Suggested Readings:

- A/B Testing: A Complete Guide to Statistical Testing
- A/B testing

InClass-Ex [sol]HW2
due

Mar 22A/B Test II
[pdf]
Suggested Readings:

- A/B Testing: Step by Step & Hypothesis Testing

InClass-Ex [md]

Mar 29A/B Test III
[pdf]
 HW3 [colab]
release

InClass-Ex [md]


April 12A/B Test IV
[notebook]
Suggested Readings:

- STAT3009 - Machine learning overview
- Chapters 2-3 in The Elements of Statistical Learning
- Linear regression in sklearn

 
HW3 [sol]
due

April 19⏰ InClass Quiz

- [Quiz.info]
- [Study Guide]