Algorithmic and Information Theoretic Methods in Data Science

Algorithmic & Information Theoretic Methods in Data Science (ECE6980)

Course home

Lectures

There are no textbooks that we will follow for the course. We will provide pointers to various research articles and books as we navigate the course.

Please use these files as a template for SCRIBING. Scribes are due within 144 hours of the class.

The goal is to get improved explanations than what you got in the class. Please try to add the details that were missing from the lectures. This is particularly true for the components that we are not giving as homework. For example, we did not show that under Poisson sampling multiplicities are Poisson distributed, and having that in the scribes really helps!

23rd August

Introduction and overview, total variation distance, learning in total variation distance. [scribe]

Also look at Sections 1.1, and 1.4 from last year's scribe here.

28th August

Empirical distribution for TV learning, Poisson sampling, introduced uniformity testing problem. [scribe]

Please look at Section 2 from last year's scribe here, for tv-learning derivation. (You can read Section 1 too). Please see Chapter 5.1, 5.3, and 5.4 of Probability and Computing for Birthday Paradox, and Poisson sampling.

30th August

Hypothesis testing, test statistic, Chebychev's inequality, testing Bernoulli distribution, testing uniformity. [scribe]

Extra Reading: [Paninski's Original Paper]. Please look at the upper bound here. They design a test statistic that is very very simple. Just the number of elements appearing once.

6th September

Basic Information Theory: Entropy, KL divergence, concavity of entropy, Fano's Inequality, and Multiway Classification. Please read chapter two of Cover and Thomas. [scribe]

11th September

Proof of Fano's Inequality (KL divergence version), Multiway Classification. Construction of exponentially many distributions with large TV distance and small KL divergence. Finished lower bound for learning discrete distributions. [scribe]

Extra Reading: [Assouad, Fano, and Le Cam] A very short paper describing three lower bounding methods.

13th September

LeCam's Two point method, designing hard priors for uniformity testing, started uniformity lower bound computations. [scribe]

18th September

Finished uniformity lower bound. Defined mixture models, Gaussian mixture models. Defined improper, and proper learning of GMM. Computational vs sample complexity gaps in learning GMM. [scribe]

20th September

Scheffe's estimate for choosing a distribution from a collection. Tournament method. Simple algorithm for learning GMMs using this approach. [scribe]

Futher Readings: [one-dimensional GMM Learning], COLT 2014 [high-dimensional spherical GMM Learning], NIPS 2015

25th September

Metric entropy, covering numbers. McDiarmid's inequality for sharp concentration of empirical distribution estimation. Introduction to property estimation, Empirical entropy estimation performance. [scribe]

Futher Readings: [Paninski's paper ], This showed a bunch of nice results. Fun read too!!

27th September

Estimating Entropies, Polynomial approximation, etc. [scribe]

Futher Readings:

[Paninski's paper ], again
[Valiant Valiant paper showing k/log k bounds],
[Wu-Yang ], [Jiao-Venkat-Han-Weissman]. These papers find the sample complexity in all parameter regimes.
[Renyi entropy estimation].

2nd October

Estimating Entropy Estimation Derivations.

11th October

Maximum Likelihood Estimation, Competitiveness of Profile MLE for symmetric properties [scribe]

Futher Readings:

[Optimality of (P)MLE]

16th, 18th October

Communication Complexity vs Sample Complexity, Hide and Seek Problem,[scribe]

Futher Readings:

[Shamir's paper]

23rd, 25th October

Computational Statistical Trade-Offs, Planted Clique, Sparse PCA

Futher Readings:

[Sparse PCA paper] [Submatrix Detection paper]

30th October

Memory vs Sample Complexity,

Futher Readings:

[Parity paper] [Moments paper] [AMS Paper]

1st November

Robustness in Statistical Problems

Futher Readings:

[Robust mean estimation paper]