STAT1013 Data Science Toolbox
📝 Administrative information
- ⏲️ Lectures: Wed. 11:30 - 14:15, Lady Shaw Building LT6
- 👨🏫 Instructor: Ben Dai
- ⌨️ colab: notebook or click
Open in Colab
- 💻 GitHub: CUHK-STAT1013
- ⌛ Office hour: Wed. 15:30 - 16:30 (by Appointment only)
🧾 Course Content
🖥️ Description:
We will learn about the key concepts, techniques and toolkits of Data Science. The course provides a general overview of the information, issues, and resources used by data analysts and data scientists. This course consists of two parts. The first provides a conceptual overview of the concepts involved in transforming data into useful knowledge. The second is a hands-on overview of the program’s tools.
👌 What you’ll learn:
- Understand principles behind statistical inference and statistical machine learning methods;
- Familiar with Data Science Toolbox: GitHub; Python: numpy, pandas, seaborn, sklearn, Colab; Jupyter notebook, Markdown;
- Analyze continuous and categorical data using statistics, Python programming based on Colab and software as appropriate;
- Ability in using advanced Python tools to describe, summarize, and visualize dataset;
- Understand and implement good coding practices, including statistical inference on A/B test, and statistical learning/prediction based on tabular data.
🏗️ Prerequisites:
- Calculus & Linear algebra: inner product, matrix-vector product.
- Basic Statistics: STAT1011 level statistics, basics of distributions, probabilities, conditional probability, mean, standard deviation, etc.
- Python: basic grammar; Numpy, pandas, TensorFlow libraries
📋 Reference Textbooks
The following textbooks are useful, but none are exactly same with our course.
VanderPlas, J. (2016). Python data science handbook: Essential tools for working with data. O’Reilly Media, Inc.
Bruce, P. & Bruce, A. (2017) Practical Statistics for Data Scientists. O’Reilly Media, Inc.
Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy online controlled experiments: A practical guide to A/B testing. Cambridge University Press.
💯 Grading (tentative)
👨💻 Coursework:
- Homeworks (15%): There will be three homework. Please submit your homework by a well-documented Jupyter Notebook.
- HW 1 (5%): Statistics in Python
- HW 2 (5%): Markdown Documentation
- HW 3 (5%): A/B Test Exercise
InClass Quiz (coding and exercise) (30%): Implement statistics/ML methods to make prediction
- Final project / Essay (55%): A full data analysis provided in form of pdf report and Jupyter notebook. (1) An executable notebook containing the performed analysis on the data; (2) A technique report includes the (i) A/B test you used (ii) result analysis
📝 Honesty: Our course places very high importance on honesty in coursework submitted by students, and adopts a policy of zero tolerance on academic dishonesty.
📢 (Late) submission: Homework/projects are submitted via BlackBoard, the quiz is submitted via kaggle. We will penalize 10% credits per 6 hours for the late submission.
All students welcome: we are happy to have audiences in our lecture.