Research Interests

My research aims to develop theoretical foundations for learning algorithms that are provably accurate, efficient, scalable, robust, and private. To model a variety of learning tasks holistically, I employ a flexible mathematical framework based on statistical divergences and information measures. Through this lens, many learning tasks can be abstractly viewed as operations in a suitable geometry on probability distributions over high-dimensional manifolds. I study fundamental aspects of learning and inference from this perspective, integrating ideas from optimal transport theory, information theory, mathematical statistics, optimization, and applied probability. This interdisciplinary approach often reveals new connections that foster cross-field fertilization. Some specific research interests are outlined below.

Optimal Transport Theory

Optimal transport (OT) theory provides a framework for comparing and transforming between high-dimensional probability distributions. It enables translating problems of learning, transforming, or ensembling complex datasets over high-dimensional manifolds into geometric questions about distances, geodesics, and barycenters in metric spaces, unlocking invaluable tools from Riemannian and metric geometry. This allows holistically treating a breadth of learning tasks to derive novel methodologies, provide formal performance guarantees, and study fundamental aspects and limitations. My work encompasses foundational OT theory, statistical and computational OT with a focus on scalability to high dimensions (via regularization techniques, such as smoothing, low-dimensional projections, or entropic penalty), as well as robustness to outliers and privacy considerations. These advancements are then employed for a principled and theory-driven treatment of tasks, spanning homogeneity/independence testing, generative modeling, ensembling, and more.

Gromov-Wasserstein Alignment Theory

Alignment of heterogeneous datasets with varying modalities or semantics is ubiquitous in applications like language models, computer vision, and genomics. The Gromov-Wasserstein (GW) distance offers a mathematical framework for alignment by representing datasets as metric measure spaces and optimally matching them. While rooted in OT, the GW problem has resisted existing analysis techniques (due to its quadratic nature), leaving open foundational questions regarding geometry, estimation, inference, and computation. My research aims to develop a comprehensive theory that addresses these facets and facilitates principled alignment methods. Our group has recently achieved several breakthroughs in GW theory, including inaugural results on duality, estimation rates, limit laws, algorithms with convergence guarantees, and flows in the GW geometry. This research trajectory integrates ideas from OT theory, optimization, mathematical statistics, analysis, and information theory.

Neural Estimation of Classical and Quantum Divergences

Estimation of statistical divergences and information measures is a core building block in data science, but traditional methods often lack scalability for large-scale problems. Neural estimation (NE) has emerged as a method of choice, parametrizing a variational form of the divergence using a neural network, approximating expectations through sample means, and optimizing the objective using backpropagation and minibatches. Alas, while being scalable and easily integrable into large architectures, NE is challenging to analyze due to the interplay among approximation, estimation, and optimization errors. I work to develop the theoretical foundations of NE in various settings, from statistical divergences (e.g., f-divergences or OT/GW distances) to information measures (mutual information, directed information, and sliced information). Recently, I extended NE to quantum divergences by combining variational quantum algorithms with classical neural nets, an approach now being applied in quantum computing and machine learning.

Robust Estimation and Decision-Making

As the scope of data-driven decision-making grows, so does the risk posed by data poisoning attacks, where adversarial processes manipulate training data prior to observation. To address this, my work adopts a three-pronged approach encompassing modeling, statistics, and computation. We have proposed a unified and flexible framework based on OT to capture both local (in the Wasserstein metric) and global (in total variation) modes of data corruption, and studied minimax optimal robust distribution estimation under this corruption model. By developing an efficient spectral algorithm for computing the estimator, we then used it for robust decision-making via Wasserstein distributionally robust optimization (W-DRO). Remarkably, the theoretical bounds on the estimation risk adapt to the complexity of the optimal decision rather than that of the worst-case hypothesis, relating to phenomena like implicit regularization and generalization. Robustness to outliers and adversarial attacks is key for any data-driven task and it therefore consistently features as a theme across most of my work.

Information Theory for Learning and Privacy

Information theory provides critical insights into how machine learning algorithms and privacy mechanisms process information. I use information-theoretic tools to analyze how representations, abstractions, and attention patterns develop in deep models, aiming to give developers greater control over these processes. In collaboration with the MIT-IBM Watson AI Lab, I explored information flows in neural networks and explained the information bottleneck principle, shedding light on the clustering of internal representations. To scale this exploration, I proposed sliced mutual information—an efficient alternative to Shannon’s mutual information—now applied in feature extraction, fairness, testing, generative modeling, and studies of generalization and explainability. These insights inspire novel visualization, pruning, and regularization methods, as well as instance-dependent generalization bounds. I am also interested in data privacy and security, private OT and GW computation, physical layer security, and connections between statistical privacy frameworks and classical/quantum information measures.

Theoretical Foundations of Large Language Models

While large language models (LLMs) are revolutionizing AI, an accompanying theory to explain, guarantee, and ultimately guide their principled development is largely lacking. I have a keen interest in developing such a theory. More to come.