Research Interests

My research interests include optimal transport theory, information theory, mathematical statistics, and applied probability. I develop mathematical tools for the design and analysis of modern inference and learning systems.

Gromov-Wasserstein alignment theory

Effective alignment of heterogeneous datasets is a fundamental challenge in data science, with profound implications for applications ranging from language models and computer vision to single-cell genomics. The Gromov-Wasserstein (GW) distance, rooted in optimal transport theory, offers a powerful mathematical framework for alignment problems. It quantifies the least amount of distortion needed to transform one dataset into another through an optimal matching. Although alignment schemes inspired by the GW framework have seen many applications, existing methods are heuristic and limited, suffer from computational and statistical unscalability, exhibit sensitivity to outliers, and lack performance guarantees. These limitations hinder their deployment in real-world scenarios that demand rapid and reliable responses at scale. This project seeks to develop a comprehensive structural, statistical, and computational GW theory, to facilitate next-generation alignment and interpolation methods for heterogeneous datasets. We target methods that enjoy efficient computation, scalability to high dimensions, and robustness to outliers, all subject to strong formal guarantees. The project integrates ideas from OT theory, optimization, mathematical statistics, and information theory.

Regularized statistical divergences: a structural, statistical, and computational theory

Many learning tasks, from generative modeling to style transfer, can be distilled into a question of comparing or deriving transformations between complex, high-dimensional probability distributions. The key mathematical objects that quantify this comparison are statistical divergences, which are discrepancy measures between probability distributions. Popular classes of divergences include Wasserstein distances (rooted in optimal transport theory), f-divergences, integral probability metrics, and more. Despite their potency for modeling, analyzing, and designing learning algorithms, such divergences typically suffer from the computational and statistical hardness issues, especially in high-dimensional settings. To alleviate this impasse, this project explores new regularization paradigms for statistical divergences that preserve their meaningful structure and compatibility for inference while enabling statistical and computational scalability. Research questions include: (i) structural, topological, and geometric properties of regularized divergences (e.g., via smoothing, slicing, entropic penalty, etc.); (ii) high-dimensional statistical questions, such as empirical convergence rates, neural estimation techniques, robust estimation, limit distribution theory, etc.; and (iii) learning-theoretic applications to generative modeling, barycenter computation/estimation, testing, anomaly detection, etc.

Information-theoretic analysis of deep learning models

This project develops information-theoretic tools for measuring the flow of information through deep neural networks. The goal here is to explain the process by which deep nets progressively build representations of data—from crude and over-redundant representations in shallow layers to highly-clustered and interpretable ones in deeper layers—and to give the designer more control over that process. To that end, the project develops efficient estimators of information measures over the network (possibly via built-in dimensionality reduction techniques). Such estimators also lead to new visualization, optimization, and pruning methods of deep neural networks. New instance-dependent generalization bounds based on information measures are also of interest.

Causal inference and directed information

Directed information is the causal surrogate of Shannon's mutual information, which is widely applied in information theory (e.g., for establishing capacity of communication channels with feedback). Despite the popularity of mutual information in modern machine learning, directed information received little attention. This project develops scalable, differentiable and provably consistent directed information (rate) estimators that can be easily integrated as a loss function or a regularizer for downstream tasks. Applications to various causal inference tasks are explored.

Additional research trajectories

These include connections between statistical distances and differential privacy, information-theoretic security, high-dimensional nonparametric estimation, and interacting particle systems.