# STAT6050 Statistical Learning Theory

βThere is Nothing More Practical Than A Good Theory.β β Kurt Lewin

- π Administrative information
- ποΈ Schedule (tentative)
- π§Ύ Course Content
- π― Grading (tentative)
- π Textbooks
- π§Ύ Reference course

## π Administrative information

- β²οΈ
**Lectures**:**Fri**. 10:30AM - 12:15PM - π«
**Room**: Lady Shaw Bldg C5, CUHK - π¨βπ«
**Instructor**: Ben Dai - π¨βπΌ
**TA**: Hao Shi - β³
**Office hours**:**Fri**. 2:00PM - 3:00PM

## ποΈ Schedule (tentative)

Week | Content |
---|---|

Week01 | Introduction |

Week02 | Approximation error and estimation error |

Week03 | Uniform concentration inequality |

Week04 | Rademacher complexity I |

Week05 | Rademacher complexity II |

Week06 | Method of regularization |

Week07 | Nonparametric regression on RKHS |

Week08 | Classification: Fisher consistency and calibrated surrogate losses |

Week09 | [Revisiting Excess Risk Bounds: chain argument] |

Week10 | [Revisiting Excess Risk Bounds: local complexity and random entropy] |

Week11 | [Case study: recommender systems] |

Week12 | [Case study: ranking] |

Week13 | [Case study: neural networks] |

## π§Ύ Course Content

π₯οΈ **Description:**

This course will provide tools to the theoretical analysis of statistical machine learning methods. It will cover approaches such as parametric models, neural networks, kernel methods, SVM to tasks such as regression, classifiaction, recommender systems, ranking, and it will focus on developing a theoretical understanding and insights of the statistical properties of learning methods.

π **Key words:**

- Empirical risk minimization
- Estimation error, approximation error
- Regret or excess risk bounds, convergence rate, consistency
- Fisher consistency, calibrated surrogate losses
- Uniform concentration inequality
- Rademacher complexity, covering number, entropy
- Penalization, method of sieve
- Local/random complexity

ποΈ **Prerequisites:**

- Probability at the level of STAT5005 or equivalent (plus mathematical maturity). This is an advanced theory course, a strong mathematical/statistical/probabilistic background is necessary.

## π― Grading (tentative)

π¨βπ» **Coursework:**

- Homeworks (50%)

There will be three homework assignments. You are welcome to discuss Problems with other students, but the final solutions should be completely on your own. You will receive one bonus point for a typed written assignment in LaTeX or Markdown. We will accept scanned handwritten version but without the bonus point. **Late submission will not be accepted.**

- Paper review / project (50%)

You will write a review of 2-3 papers on the same topic, which can be in any area related to the course. (1) You should summarize and critique the *assumptions* and *theoretical results* in the papers, discuss its overall *contributions*. (2) You might extend a theoretical result, develop a new method and investigate its performance, or run experiments to see the applicability of the methods. It is OK to work on projects in groups of three, see **Collaboration policy**.

π¨π»βπ€βπ¨πΎ **Collaboration policy**: we admit you to form a group to finish your final project. The number of group members should be smaller or equal than 3. The contribution of each member should be clearly stated in the final report. You will receive one (1) bonus point if you work solo to projects.

## π Textbooks

Koltchinskii, V. (2011).

*Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: Ecole dβEtΓ© de ProbabilitΓ©s de Saint-Flour*XXXVIII-2008 (Vol. 2033). Springer Science & Business Media.Van Der Vaart, A. W. & Wellner, J. (1996).

*Weak Convergence and Empirical Processes: with Applications to Statistics*. Springer Science & Business Media.Hastie, T., Tibshirani, R., & Friedman, J. (2009).

*The Elements of Statistical Learning: Data Mining, Inference, and Prediction*. Springer.Anthony, M., & Bartlett, P. L. (1999).

*Neural network learning: Theoretical foundations*(Vol. 9). Cambridge: cambridge university press.

## π§Ύ Reference course

Peter Bartlett, CS 281B / Stat 241B: Statistical Learning Theory

Tengyu Ma, STATS214 / CS229M: Machine Learning Theory

Larry Wasserman, 36-708: Statistical Methods for Machine Learning

Clayton Scott, EECS 598: Statistical Learning Theory

Yoonkyung Lee, STAT 881: Advanced Statistical Learning