# STAT1013 Data Science Toolbox

## 📝 Administrative information

- ⏲️
**Lectures**:**Wed**. 11:30 - 14:15, Lady Shaw Building LT6 - 👨🏫
**Instructor**: Ben Dai - ⌨️
**colab**: notebook or click`Open in Colab`

- 💻
**GitHub**: CUHK-STAT1013 - ⌛
**Office hour**:**Wed**. 15:30 - 16:30

## 🧾 Course Content

🖥️ **Description:**

We will learn about the key concepts, techniques and toolkits of Data Science. The course provides a general overview of the information, issues, and resources used by data analysts and data scientists. This course consists of two parts. The first provides a conceptual overview of the concepts involved in transforming data into useful knowledge. The second is a hands-on overview of the program’s tools.

👌 **What you’ll learn:**

- Understand principles behind statistical inference and statistical machine learning methods;
- Familiar with Data Science Toolbox: GitHub; Python: numpy, pandas, seaborn, sklearn, Colab; Jupyter notebook, Markdown;
- Analyze continuous and categorical data using statistics, Python programming based on Colab and software as appropriate;
- Ability in using advanced Python tools to describe, summarize, and visualize dataset;
- Understand and implement good coding practices, including statistical inference on A/B test, and statistical learning/prediction based on tabular data.

🏗️ **Prerequisites:**

**Calculus & Linear algebra**: inner product, matrix-vector product.**Basic Statistics**:**STAT1011 level statistics**, basics of distributions, probabilities, conditional probability, mean, standard deviation, etc.**Python**: basic grammar; Numpy, pandas, TensorFlow libraries

## 📋 Reference Textbooks

The following textbooks are useful, but none are exactly same with our course.

VanderPlas, J. (2016). Python data science handbook: Essential tools for working with data.

*O’Reilly Media, Inc.*Bruce, P. & Bruce, A. (2017) Practical Statistics for Data Scientists.

*O’Reilly Media, Inc.*Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy online controlled experiments: A practical guide to A/B testing.

*Cambridge University Press*.

## 💯 Grading (tentative)

👨💻 **Coursework:**

**Homeworks**(15%): There will be three homework. Please submit your homework by a well-documented*Jupyter Notebook*.- HW 1 (5%): Statistics in Python
- HW 2 (5%): Markdown Documentation
- HW 3 (5%): A/B Test Exercise

**InClass Quiz**(coding and exercise) (30%): Implement statistics/ML methods to make prediction**Final project / Essay**(55%): A full data analysis provided in form of pdf report and Jupyter notebook. (1) An executable notebook containing the performed analysis on the data; (2) A technique report includes the (i) A/B test you used (ii) result analysis

📝 **Honesty**: Our course places very high importance on honesty in coursework submitted by students, and adopts a policy of *zero tolerance* on academic dishonesty.

📢 **(Late) submission**: Homework/projects are submitted via BlackBoard, the quiz is submitted via kaggle. We will *penalize* **10%** credits per 6 hours for the late submission.

**All students welcome**: we are happy to have audiences in our lecture.

## 🗓️ Schedule (tentative)

Date | Description | Course Materials | Events | Deadlines |
---|---|---|---|---|

Pre | Course information [pdf] Python Tutorial [Youtube] | Suggested Readings: - learnpython.org - The Python Tutorial (official Python documentation) | ||

Jan 11 | Overview of Data Science [pdf] [notebook] | Suggested Readings: - Data Scientist: The Sexiest Job of the 21st Century - The 4 Biggest Trends In Big Data And Analytics - Types Of Data | ||

Jan 18 | Statistics in Python I [pdf] [notebook] | Suggested Readings: - Examples of Normal Distribution and Probability In Every Day Life - Introduction to probability, statistics, and random processes | ||

Feb 1 | Statistics in Python II [notebook] | Suggested Readings: - Learning Statistics with Python by Danielle Navarro and Ethan Weed - Statistics in Python by Gaël Varoquaux - seaborn - tutorial - The Ultimate Python Seaborn Tutorial: Gotta Catch ‘Em All | ||

Feb 8 | Statistics in Python III [pdf] [notebook] | Suggested Readings: - The Monty Hall Problem | HW1 [colab] release | |

Feb 15 | Markdown Documentation [cheat sheet] | Suggested Readings: - Markdown Cheet Sheet - Github Docs - Websites for you and your projects | HW2 [pdf] release | HW1 [sol] due |

Feb 22 | Github [notebook] | Suggested Readings: - All the Math you need to conduct an A/B test | HW2 [sol] due | |

Mar 1 | A/B Test I [pdf] | Suggested Readings: - A/B Testing: A Complete Guide to Statistical Testing - A/B testing | ||

Mar 15 | A/B Test II [pdf] | Suggested Readings: - A/B Testing: Step by Step & Hypothesis Testing | HW3 [pdf] release | |

Mar 22 | A/B Test III [pdf] | Suggested Readings: - IBM - Linear Regression | HW3 [sol] due | |

Mar 29 | Machine Learning I [pdf] | Suggested Readings: - STAT3009 - Machine learning overview | ||

April 12 | Machine Learning II [pdf] | Suggested Readings: - Chapters 2-3 in The Elements of Statistical Learning - Linear regression in sklearn | ||

April 19 | ⏰ InClass Quiz |