ECE 6970 - Statistical Distances for Modern Machine Learning

For whom?

The course is inteded for graduate students interested in mathematical aspects of some modern machine learning techniques. Knowledge of probability theory and mathematical maturity are prerequisite. Familiarity with functional analysis is helpful.

Time and Location:

Lectures: TuTh255-410, 202 Upson Hall

Office Hours: We9-11, 322 Rhodes Hall

Instructor: Ziv Goldfeld, 322 Rhodes Hall

News:

Syllabus
Paper reading and presentation assignments: List & instructions
Paper reading and presentation assignments: Doodle poll link
Paper reading and presentation assignments: Doodle poll results [Updated on Sep. 18th]
Paper reading and presentation assignments: Instructions and guidelines
Sep. 13th, 2019: 1st homework sheet is now posted (see section below)
Oct. 7th, 2019: 2nd homework sheet is now posted (see section below)
Nov. 4th, 2019: Slides on Gaussian-smoothed optimal transport
Nov. 15th, 2019: 3rd homework sheet is now posted (see section below)
Nov. 16th, 2019: Final project instructions
Dec. 13th, 2019: Final project reports and code (if applicable) submission is due by the end of Thursday, Dec. 19th, 2019.

Homework Sheets:

Homework sheet 1 (submission due to Sep. 26th at 2:55pm in class) [2nd version - Eratta fixed thanks to Kia Khezeli]
Homework sheet 2 (submission due to Oct. 20th at 2:55pm in class)
Homework sheet 3 (submission due to Nov. 26th at 2:55pm in class)

Overview:

Statistical distances such as optimal transport (particularly, the Wasserstein metric), total variation, Kullback-Leibler (KL) divergence, Chi-squared divergence, and others, are used to design and analyze a variety of machine learning systems. Applications include anomaly and outlier detection, ordinal regression, generative adversarial networks (GANs), and many more. This course will establish the mathematical foundations of these important measures, explore their statistical properties (e.g., convergence rates of empirical measures), and focus on GANs and, more generally, on deep neural networks (DNNs) as applications (design and analysis).

Format:

The format is based on paper reading and presentation assignments performed by the students. Each student will present a work of hers/his choice from a prescribed list. The course instructor will deliver the first 3-4 lectures, as well as some throughout the semester. The final project will include a scientific assignment based on another chosen article. Choices for project assignments include extension of existing results, implementation tasks, critical summary of a paper, etc. The last 4 lectures will be dedicated to final project synopses presentations.

List of Tentative Topics:

Introduction to optimal transport and Wasserstein distances
Applications of optimal transport to GANs
Convergence rate of empirical Wasserstein distance and curse of dimensionality
Entropic optimal transport
Gaussian-smoothed optimal transport
Other statistical distances (total variation, KL and Chi-squared divergences) – convergence and smoothing
Relations between statistical distances
From statistical distances to information measures
Information bottleneck principle for deep learning
Estimating information flows in DNNs
MINE: Mutual information neural estimator