The course is inteded for graduate students interested in mathematical aspects of some modern machine learning techniques. Knowledge of probability theory and mathematical maturity are prerequisite. Familiarity with functional analysis is helpful.

**Lectures: **TuTh255-410, 202 Upson Hall

**Office Hours: **We9-11, 322 Rhodes Hall

**Instructor:** Ziv Goldfeld, 322 Rhodes Hall

- Syllabus
- Paper reading and presentation assignments: List & instructions
- Paper reading and presentation assignments: Doodle poll link
- Paper reading and presentation assignments: Doodle poll results
**[Updated on Sep. 18th]** - Paper reading and presentation assignments: Instructions and guidelines
__Sep. 13th, 2019:__1st homework sheet is now posted (see section below)__Oct. 7th, 2019:__2nd homework sheet is now posted (see section below)__Nov. 4th, 2019:__Slides on Gaussian-smoothed optimal transport__Nov. 15th, 2019:__3rd homework sheet is now posted (see section below)__Nov. 16th, 2019:__Final project instructions__Dec. 13th, 2019:__Final project reports and code (if applicable) submission is due by the end of**Thursday, Dec. 19th, 2019**.

- Homework sheet 1 (submission due to Sep. 26th at 2:55pm in class)
**[2nd version - Eratta fixed thanks to Kia Khezeli]** - Homework sheet 2 (submission due to Oct. 20th at 2:55pm in class)
- Homework sheet 3 (submission due to Nov. 26th at 2:55pm in class)

Statistical distances such as optimal transport (particularly, the Wasserstein metric), total variation, Kullback-Leibler (KL) divergence, Chi-squared divergence, and others, are used to design and analyze a variety of machine learning systems. Applications include anomaly and outlier detection, ordinal regression, generative adversarial networks (GANs), and many more. This course will establish the mathematical foundations of these important measures, explore their statistical properties (e.g., convergence rates of empirical measures), and focus on GANs and, more generally, on deep neural networks (DNNs) as applications (design and analysis).

The format is based on paper reading and presentation assignments performed by the students. Each student will present a work of hers/his choice from a prescribed list. The course instructor will deliver the first 3-4 lectures, as well as some throughout the semester. The final project will include a scientific assignment based on another chosen article. Choices for project assignments include extension of existing results, implementation tasks, critical summary of a paper, etc. The last 4 lectures will be dedicated to final project synopses presentations.

- Introduction to optimal transport and Wasserstein distances
- Applications of optimal transport to GANs
- Convergence rate of empirical Wasserstein distance and curse of dimensionality
- Entropic optimal transport
- Gaussian-smoothed optimal transport
- Other statistical distances (total variation, KL and Chi-squared divergences) – convergence and smoothing
- Relations between statistical distances
- From statistical distances to information measures
- Information bottleneck principle for deep learning
- Estimating information flows in DNNs
- MINE: Mutual information neural estimator