Research Interests

My research aims to develop principled and interpretable AI algorithms, subject to formal guarantees on accuracy, efficiency, scalability, robustness, and privacy. I employ a unifying geometric framework that models diverse learning problems as sequences of operations on probability distributions over high-dimensional manifolds. This abstraction lends itself well to rigorous analysis, integrating ideas from optimal transport theory, information theory, mathematical statistics, optimization, and applied probability. This interdisciplinary nature often reveals new connections that foster cross-field fertilization. Some specific research interests are outlined below.

Optimal Transport Theory

Optimal transport (OT) theory provides a framework for comparing and transforming high-dimensional probability distributions. It enables translating problems of learning, transforming, or ensembling complex datasets over high-dimensional manifolds into geometric questions about distances, geodesics, and barycenters in metric spaces, unlocking invaluable tools from Riemannian and metric geometry. This allows holistically treating a breadth of learning tasks to derive novel methodologies, provide formal performance guarantees, and study fundamental aspects and limitations. My work encompasses foundational OT theory, Wasserstein geometry, statistical and computational OT with a focus on scalability to high dimensions (via regularization techniques, such as smoothing, low-dimensional projections, or entropic penalty), as well as robustness to outliers and privacy considerations. These advancements are employed for a principled and white-box treatment of learning tasks, spanning homogeneity/independence testing, generative modeling, ensembling, and sampling.

Gromov-Wasserstein Alignment Theory

Alignment of heterogeneous datasets with varying modalities or semantics is ubiquitous in applications like language models, computer vision, and genomics. The Gromov-Wasserstein (GW) problem provides an interpretable and geometry-driven framework for data alignment by representing datasets as metric measure spaces and optimally matching them. While rooted in OT, the GW problem has resisted existing analysis techniques (due to its quadratic nature), leaving open foundational questions regarding geometry, estimation, inference, and computation. My research aims to develop a comprehensive theory that addresses these facets and facilitates principled alignment methods. Our group has recently achieved several breakthroughs in GW theory, including inaugural results on duality, estimation rates, limit laws, algorithms with convergence guarantees, and flows in the GW geometry. This research integrates ideas from OT theory, optimization, mathematical statistics, analysis, and information theory.

Neural Estimation of Classical and Quantum Divergences

Estimation from samples of statistical divergences, such as OT distances, GW alignment, or f-divergences is foundational in data science, but traditional methods lack scalability to large problems. Neural estimation (NE) has emerged as a method of choice, parametrizing a variational form of the divergence using a neural network, approximating expectations through sample means, and optimizing the objective using backpropagation and minibatches. While being scalable and easily integrable into large architectures, NE is challenging to analyze due to the interplay among approximation, estimation, and optimization errors. My work develops NE algorithms and their theoretical underpinnings in various settings, from statistical divergences (e.g., f-divergences or OT/GW distances) to information measures (mutual information, directed information, and sliced information). Recently, I extended NE to quantum divergences by proposing a variational quantum algorithm that combines parameterized quantum circuits with classical neural nets, an approach now being applied in quantum computing and machine learning.

Robust Estimation and Decision-Making

Robustness to distribution shifts and adversarial data poisoning is key to safe AI. Machine learning pipelines for logistics, fraud detection, and cloud monitoring now face hybrid adversaries that simultaneously inject gross outliers and craft subtle per-sample perturbations. Yet classical defenses (robust statistics for outliers, Wasserstein DRO for small shifts) collapse when these attacks co-occur. To address this crucial safety gap, my work adopts a three-pronged approach encompassing modeling, statistics, and computation. We have proposed a unified and flexible framework based on OT to capture both local (in the Wasserstein metric) and global (in total variation) modes of data corruption, and studied minimax optimal robust distribution estimation under this corruption model. By developing an efficient spectral algorithm for computing the estimator, we employed it for robust decision-making via Wasserstein distributionally robust optimization (W-DRO). Notably, the theoretical guarantees on the estimation risk adapt to the complexity of the optimal decision rather than that of the worst-case hypothesis, which relates to phenomena like implicit regularization and generalization.

Information Theory for Learning and Privacy

Complementing geometric methods, information theory provides another appealing approach towards white-box AI by revealing fundamental limits and optimal information-processing mechanisms. I leverage information-theoretic tools to analyze how representations, abstractions, and attention patterns develop in deep models, aiming to give developers greater control over these processes. In collaboration with the MIT-IBM Watson AI Lab, I explored information flows in neural networks and explained the information bottleneck principle, shedding light on the clustering of internal representations. To scale this exploration, I proposed sliced mutual information—an efficient alternative to Shannon’s mutual information—now applied in feature extraction, fairness, testing, generative modeling, and studies of generalization and explainability. These insights inspire novel visualization, pruning, and regularization methods, as well as instance-dependent generalization bounds. I am also interested in data privacy and security, private OT and GW computation, physical layer security, and connections between statistical privacy frameworks and classical/quantum information measures.

Theoretical Foundations of Large Language Models

While large language models (LLMs) are revolutionizing AI, an accompanying theory to explain, guarantee, and ultimately guide their principled development is largely lacking. I have a keen interest in developing such a theory. More to come.