Research Interests

My research interests include information theory, optimal transport theory, statistical learning theory, high-dimensional statistics, and applied probability. I develop mathematical tools and leverage them for an in-depth theoretical analysis of modern inference and learning systems.

Smooth statistical distances for principled machine learning

This project aims to create a global, high-dimensional inference framework that prompts a scalable (in dimension) generalization and sample complexity theory for ML. To that end, a new class of discrepancy measures between probability distributions adapted to high-dimensional spaces is developed. Termed smooth statistical distances (SSDs), these distances level out irregularities in the considered distributions (via convolution with a chosen smoothing kernel) in a way that preserves inference capability, but alleviates the curse of dimensionality when estimating them from data. Measuring or optimizing distances between distributions is central to basic inference setups and advanced ML tasks. Therefore, research questions include: (i) SSD fundamentals, encompassing geometric, topological and functional properties; (ii) high-dimensional statistical questions, such as empirical approximation, limit distributions, testing, goodness-of-fit, etc.; and (iii) learning-theoretic applications, including generalization theory for generative models, efficient barycenter computation/estimation, anomaly detection, etc.

Information-theoretical analysis of deep neural networks

This project develops tools, rooted in information theory, for measuring the flow of information through deep neural networks (DDNs). The goal here is to explain the process by which DNNs progressively build representations of data---from crude and over-redundant representations in shallow layers to highly-clustered and interpretable ones in deeper layers---and to give the designer more control over that process. To that end, the project develops efficient estimators of information measures over the network (possibly via built-in dimensionality reduction techniques). Such estimators also lead to new visualization, optimization, and pruning methods of DDNs. New instance-dependent generalization bounds based on information measures are also of interest.

Causal inference and directed information

Directed information (DI) is the causal surrogate of Shannon's mutual information, which is widely applied in information theory (e.g., for establishing capacity of communication channels with feedback). Despite the popularity of mutual information in modern machine learning, directed information received little attention. This project develops scalable, differentiable and provably consistent directed information (rate) estimators that can be easily integrated as a loss function or a regularizer for downstream tasks. Applications to various causal inference tasks are explored.

Additional research trajectories

These include connections between statistical distances and differential privacy, information-theoretic security, high-dimensional nonparametric estimation, and interacting particle systems.