STAT3009 Recommender Systems

โ€œAll models are wrong, but some are useful.โ€ โ€” George E. P. Box

๐Ÿ“ Administrative information

  • โฒ๏ธ Lectures: Thur. 12:30 - 15:15, Science Center LG23
  • ๐Ÿ‘จโ€๐Ÿซ Instructor: Ben Dai
  • โŒจ๏ธ colab: notebook or click Open in Colab
  • ๐Ÿ’ป GitHub: CUHK-STAT3009
  • โŒ› Office hour: Thur. 15:30 - 16:30

Slides Github Open In Colab

๐Ÿงพ Course Content

๐Ÿ–ฅ๏ธ Description:

Commercial sites such as search engines, advertisers and median (e.g., Netflix, Amazon), and financial institutions employ recommender systems for content recommendation, predicting customer behavior, compliance, or risk. This course provides an overview of predictive models for recommender systems, including content-based collaborative algorithms, matrix factorization, and deep learning models. The course also demonstrate Python implementation for existing recommender systems.

๐Ÿ‘Œ What youโ€™ll learn:

  • Understand principles behind recommender systems approaches such as correlation-based collaborative filtering, latent factor models, neural recommender systems
  • Implement and analyze recommender systems to real applications by Python, sklearn, and TensorFlow
  • Choose and design suitable models for different applications

๐Ÿ—๏ธ Prerequisites:

  • Calculus & Linear algebra: inner product, matrix-vector product, linear regression (OLS).
  • Basic Statistics: basics of distributions, probabilities, mean, standard deviation, etc.
  • Python: basic grammar; Numpy, pandas, TensorFlow libraries
  • (Recommended) Completed Machine Learning Crash Course either in-person, online, or self-study, or you have equivalent knowledge.

๐Ÿ“‹ Reference Textbooks

The following textbooks are useful, but none are exactly same with our course.

๐Ÿ’ฏ Grading (tentative)

๐Ÿ‘จโ€๐Ÿ’ป Coursework:

  • Homeworks (15%): There will be three homework. Please submit your homework by a well-documented Jupyter Notebook.
    • HW 1 (5%): Implementation of k-fold cross-validation
    • HW 2 (5%): Practice of ALS related Algorithms
    • HW 3 (5%): Prototyping neural networks in recommender systems via TensorFlow
  • Inclass quizzes (coding and exercise) (30%): Open-book exam, and problems will be like Homework and examples in the lectures.
    • Quiz 1 (5%): Implement baseline methods and correlation-based collaborative filtering
    • Quiz 2 (25%): STAT & Python exercise
  • Real application project (55%): A full analysis provided in form of report and Jupyter notebook. (1) An executable notebook containing the performed analysis on the data; (2) A technique report includes the (i) mathematical form and intuitive interpretation of your predictive models (ii) details about the data processing and hyperparameters tuning.
    • Proj 1 (27%): Real-time Kaggle competition based on Matrix Factorization
    • Proj 2 (28%): Real-time Kaggle competition based on Real Dataset

๐Ÿ‘จ๐Ÿปโ€๐Ÿคโ€๐Ÿ‘จ๐Ÿพ Collaboration policy: we admit you to form a group to finish your real application projects. The number of group members should be smaller or equal than 2. The contribution of each member should be clearly stated in the final report. You will receive 5% points (of the project) if you work solo to projects.

๐Ÿ“ Honesty: Our course places very high importance on honesty in coursework submitted by students, and adopts a policy of zero tolerance on academic dishonesty.

๐Ÿ“ข (Late) submission: Homework/projects are submitted via BlackBoard, the competitions are submitted via kaggle. We will penalize 10% credits per 6 hours for the late submission.

All students welcome: we are happy to have audiences in our lecture.

๐Ÿ—“๏ธ Schedule (tentative)

The slides will be released just before the lecture, and the code will be published in colab just after the lecture. The well-structured code is also public available in Github:CUHK-STAT3009.

DateDescriptionCourse MaterialsEventsDeadlines
PrepareCourse information

Python Tutorial

Numpy, Pandas, Matplotlib
[notes] [YouTube]
Suggested Readings:
  2. The Python Tutorial (official Python documentation)
Sep 08Background and baseline methods
[slides] [colab] [github]
Suggested Readings:
  1. Wiki: Netflix Prize
  2. Recommender Systems Datasets - UCSD CSE
Sep 15Correlation-based RS
[slides] [colab] [github]
Suggested Readings:
  1. K-Nearest Neighbors Algorithm
  2. Cosine Similarity in NLP
  3. Curse of high dimensionality
Sep 22โฐ Quiz 1: implement baseline methods and correlation-based RS
[instruct] [report]
InClass quiz
via Kaggle (link on BlackBoard)
Sep 29ML overview
[slides] [colab] [github]
Suggested Readings:
  1. Chapters 2-3 in The Elements of Statistical Learning
  2. Linear regression in sklearn
HW 1 release
Oct 06Matrix factorization I: ALS/BCD
[slides] [colab] [github]
Suggested Readings:
  1. Netflix Update: Try This at Home (first one applied MF in RS)
  2. Matrix factorization techniques for recommender systems
  3. Finding Similar Music using Matrix Factorization
  4. Matrix factorization techniques for recommender systems
  5. Matrix completion and low-Rank SVD via fast alternating least squares
  6. Coordinate Descent (Slides by Ryan Tibshirani)
HW 2 release
HW 1 due
Oct 13Matrix factorization II: SGD
[slides] [colab] [github]
Suggested Readings:
  1. Stochastic Gradient Descent (sklearn documentation)
  2. Stochastic Gradient Descent Algorithm With Python and NumPy
Proj 1 release
HW 2 due
Oct 20Factorization Meets the Neighborhood
[slides] [github]
Suggested Readings:
  1. Improving regularized singular value decomposition for collaborative filtering
  2. Factorization meets the neighborhood: a multifaceted collaborative filtering model
  3. Smooth neighborhood recommender systems
Oct 27Case Study: MovieLens
[slides] [github]
Suggested Readings:
  1. Home Depot Product Search Relevance (Kaggle competition)
Proj 1 due
Nov 03Neural Networks
[slides] [github]
Suggested Readings:
  1. Chapter 11 in The Elements of Statistical Learning
  2. Neural Networks and Deep Learning (free online book)
Proj 2 release
Nov 10Neural collaborative filtering
[slides] [github]
Suggested Readings:
  1. Neural Collaborative Filtering (original paper of NCF)
  2. Deep Learning based Recommender System: A Survey and New Perspectives
  3. TensorFlow Recommenders
  4. Understanding Embedding Layer in Keras (NLP)
HW 3 release
Nov 17Side information
[slides] [github]
Suggested Readings:
  1. Home Depot Product Search Relevance (Kaggle competition)
  2. Introducing TensorFlow Recommenders
  3. Deep learning for recommender systems: A Netflix case study
HW 3 due
Nov 24Model Averaging
Suggested Readings:
  1. Chapters 8 and 16 in The Elements of Statistical Learning
  2. Ensemble Models: What Are They and When Should You Use Them?
  3. combo: A Python Toolbox for Machine Learning Model Combination
Dec 01 โฐ Quiz 2: Math & Python
InClass quiz
- Proj 2 due