โAll models are wrong, but some are useful.โ โ George E. P. Box

• โฒ๏ธ Lectures: Thur. 12:30 - 15:15, Mong Man Wai Building 710
• ๐จโ๐ซ Instructor: Ben Dai
• โจ๏ธ colab: notebook or click Open in Colab
• ๐ป GitHub: CUHK-STAT3009
• โ Office hour: Thur. Appointment only

## ๐งพ Course Content

๐ฅ๏ธ Description:

Commercial sites such as search engines, advertisers and median (e.g., Netflix, Amazon), and financial institutions employ recommender systems for content recommendation, predicting customer behavior, compliance, or risk. This course provides an overview of predictive models for recommender systems, including content-based collaborative algorithms, matrix factorization, and deep learning models. The course also demonstrate Python implementation for existing recommender systems.

๐ What youโll learn:

• Understand principles behind recommender systems approaches such as correlation-based collaborative filtering, latent factor models, neural recommender systems
• Implement and analyze recommender systems to real applications by Python, sklearn, and TensorFlow
• Choose and design suitable models for different applications

๐๏ธ Prerequisites:

• Calculus & Linear algebra: inner product, matrix-vector product, linear regression (OLS).
• Basic Statistics: basics of distributions, probabilities, mean, standard deviation, etc.
• Python: basic grammar; Numpy, pandas, TensorFlow libraries
• (Recommended) Completed Machine Learning Crash Course either in-person, online, or self-study, or you have equivalent knowledge.

## ๐ Reference Textbooks

The following textbooks are useful, but none are exactly same with our course.

๐จโ๐ป Coursework:

• Homeworks (15%): There will be three homework. Please submit your homework by a well-documented Jupyter Notebook.
• HW 1 (5%): Implementation of k-fold cross-validation
• HW 2 (5%): Practice of ALS related Algorithms
• HW 3 (5%): Prototyping neural networks in recommender systems via TensorFlow
• Inclass quizzes (coding and exercise) (30%): Open-book exam, and problems will be like Homework and examples in the lectures.
• Quiz 1 (5%): Implement baseline methods and correlation-based collaborative filtering
• Quiz 2 (25%): STAT & Python exercise
• Real application project (55%): A full analysis provided in form of report and Jupyter notebook. (1) An executable notebook containing the performed analysis on the data; (2) A technique report includes the (i) mathematical form and intuitive interpretation of your predictive models (ii) details about the data processing and hyperparameters tuning.
• Proj 1 (27%): Real-time Kaggle competition based on Matrix Factorization
• Proj 2 (28%): Real-time Kaggle competition based on Real Dataset

๐จ๐ปโ๐คโ๐จ๐พ Collaboration policy: we admit you to form a group to finish your real application projects. The number of group members should be smaller or equal than 2. The contribution of each member should be clearly stated in the final report. You will receive 5% points (of the project) if you work solo to projects.

๐ Honesty: Our course places very high importance on honesty in coursework submitted by students, and adopts a policy of zero tolerance on academic dishonesty.

๐ข (Late) submission: Homework/projects are submitted via BlackBoard, the competitions are submitted via kaggle. We will penalize 10% credits per 6 hours for the late submission.

All students welcome: we are happy to have audiences in our lecture.

## ๐๏ธ Schedule (tentative)

The slides will be released just before the lecture, and the code will be published in colab just after the lecture. The well-structured code is also public available in Github:CUHK-STAT3009.

PrepareCourse information
[slides]

Python Tutorial

Numpy, Pandas, Matplotlib
1. learnpython.org
2. The Python Tutorial (official Python documentation)
Sep 07Background and baseline methods
[slides] [colab] [github]
Sep 14Correlation-based RS
[slides] [colab] [github]
Sep 21โฐ Quiz 1: implement baseline methods and correlation-based RS
[instruct] [report]
InClass quiz
Sep 28ML overview
[slides] [colab] [github]
1. Chapters 2-3 in The Elements of Statistical Learning
2. Linear regression in sklearn
HW 1 release
[colab]
Oct 05Matrix factorization I: ALS/BCD
[slides] [colab] [github]
[colab]
HW 1 due
[sol]
Oct 12Matrix factorization II: SGD
[slides] [colab] [github]
1. Stochastic Gradient Descent (sklearn documentation)
2. Stochastic Gradient Descent Algorithm With Python and NumPy
Proj 1 release
[instruct]
HW 2 due
[sol]
Oct 19Factorization Meets the Neighborhood
[slides] [github]
Oct 26Case Study: MovieLens
[slides] [github]
1. Home Depot Product Search Relevance (Kaggle competition)
Proj 1 due
Nov 02Neural Networks
[slides] [github]
1. Chapter 11 in The Elements of Statistical Learning
2. Neural Networks and Deep Learning (free online book)
Proj 2 release
[instruct]
Nov 09Neural collaborative filtering
[slides] [github]