|Richard J. Samworth (University of Cambridge)
|Isotonic subgroup selection
|Abstract: Given a sample of covariate-response pairs, we consider the subgroup selection problem of identifying a subset of the covariate domain where the regression function exceeds a pre-determined threshold. We introduce a computationally-feasible approach for subgroup selection in the context of multivariate isotonic regression based on martingale tests and multiple testing procedures for logically-structured hypotheses. Our proposed procedure satisfies a non-asymptotic, uniform Type I error rate guarantee with power that attains the minimax optimal rate up to poly-logarithmic factors. Extensions cover classification, isotonic quantile regression and heterogeneous treatment effect settings. Numerical studies on both simulated and real data confirm the practical effectiveness of our proposal, which is implemented in the R package ISS.
|Varun Gupta (Northwestern University)
|Control of non-stationary LQR - Stochastic and Robust perspectives
|Abstract: In this talk, we will present two vignettes on the problem of online control of the Linear Quadratic Regulator (LQR) problem when the dynamics are non-stationary and unknown. LQR is arguably the simplest Markov Decision Process, and serves as a fertile ground for developing new frameworks for studying online and robust control policies. In the first part of the talk, we will present a minmax dynamic regret optimal policy under two somewhat strong assumptions: (i) the noise process is independent across time steps, and (ii) the total variation of the dynamics over T time steps is sublinear in T (we do not assume this variation is known). In the second part, we will relax both these assumptions. Since dynamic regret minimax is too strong a goal, we propose a policy that guarantees bounded-input-bounded-output stability in the closed loop. The talk is based on papers with Yuwei Luo, Mladen Kolar, Jing Yu and Adam Wierman.
|Lan Wang (University of Miami)
|Distributional Off-Policy Evaluation in Reinforcement Learning
|Abstract: In the existing literature of reinforcement learning (RL), off-policy evaluation is mainly focused on estimating a value (e.g., an expected discounted cumulative reward) of a target policy given the pre-collected data generated by some behavior policy. Motivated by the recent success of distributional RL in many practical applications, we study the distributional off-policy evaluation problem in the batch setting when the reward is multi-variate. We propose an offline Wasserstein-based approach to simultaneously estimate the joint distribution of a multivariate discounted cumulative reward given any initial state-action pair in the setting of an infinite-horizon Markov decision process. Finite sample error bound for the proposed estimator with respect to a modified Wasserstein metric is established in terms of both the number of trajectories and the number of decision points on each trajectory in the batch data. Extensive numerical studies are conducted to demonstrate the superior performance of our proposed method. (Joint work with Zhengling Qi, Chenjia Bai, and Zhaoran Wang)
|Christina Lee Yu (Cornell University)
|Exploiting Low Order Interactions for Causal Inference in the Presence of Network Interference
|Abstract: In many domains, we are interested in estimating the total treatment effect (TTE) in the presence of network interference, where the outcome of one individual or unit is affected by the treatment assignment of those in its local network. Additional challenges arise when complex cluster randomized designs are not feasible to implement, or the network is unknown and costly to estimate. We propose a new measure of model complexity that characterizes the difficulty of estimating the total treatment effect under the standard A/B testing setup. By leveraging a staggered rollout design, in which treatment is incrementally given to random subsets of individuals, we derive unbiased estimators for TTE that do not rely on any prior structural knowledge of the network, as long as the network interference effects are constrained to low-degree interactions among neighbors of an individual.
|Yao Xie (Georgia Institute of Technology)
|Conformal prediction for time series
|Abstract: We develop a general framework for constructing distribution-free prediction intervals for time series for a given black-box algorithm. Theoretically, we establish asymptotic marginal and conditional coverage guarantees of the prediction intervals while allowing for general temporal dependence and that the interval is asymptotically optimal compared with an oracle. Methodologically, we introduce computationally efficient algorithms, EnbPI, SPCI, and an optimal kernel-based approach that wrap around ensemble predictors closely related to standard conformal prediction (CP) but do not require data exchangeability. We perform extensive simulation and real-data analyses to demonstrate its effectiveness compared with existing methods.
|Edgar Dobriban (University of Pennsylvania)
|PAC Prediction Sets under Dataset Shift
|Abstract: We will discuss PAC prediction sets, which predict sets of labels given any pre-trained machine learning method, and guarantee coverage of a large fraction of true labels with high probability over a calibration set. In the standard i.i.d. setting, they are equivalent to the training set conditional coverage property of inductive conformal prediction (Vovk 2012), and their coverage property is also equivalent to that of Wilks' tolerance regions. We will discuss recent progress (Park et al. 2022a,b, 2023) on using them in dataset shift settings where the test distribution is different from the calibration distribution.
|Leying Guan (Yale University)
|A conformal test of linear models via permutation-augmented regressions
|Abstract: Permutation tests are widely recognized as robust alternatives to tests based on normal theory. Random permutation tests have been frequently employed to assess the significance of variables in linear models. Despite their widespread use, existing random permutation tests lack finite-sample and assumption-free guarantees for controlling type I error in partial correlation tests. To address this ongoing challenge, we have developed a conformal test through permutation-augmented regressions, which we refer to as PALMRT. PALMRT not only achieves power competitive with conventional methods but also provides reliable control of type I errors at no more than 2α, given any targeted level α, for arbitrary fixed designs and error distributions. We have confirmed this through extensive simulations. Compared to the cyclic permutation test (CPT) and residual permutation test (RPT), which also offer theoretical guarantees, PALMRT does not com- promise as much on power or set stringent requirements on the sample size, making it suitable for diverse biomedical applications. We further illustrate the differences in a long-Covid study where PALMRT validated key findings previously identified using the t-test after multiple corrections, while both CPT and RPT suffered from a drastic loss of power and failed to identify any discoveries. We endorse PALMRT as a robust and practical hypoth- esis test in scientific research for its superior error control, power preservation, and simplicity.
|Ryan Tibshirani (University of California, Berkeley)
|Conformal PID Control for Time Series Prediction
|Abstract: We study the problem of uncertainty quantification for time series prediction, with the goal of providing easy-to-use algorithms with formal guarantees. The algorithms we present build upon ideas from conformal prediction and control theory, are able to prospectively model conformal scores in an online setting, and adapt to the presence of systematic errors due to seasonality, trends, and general distribution shifts. Our theory both simplifies and strengthens existing analyses in online conformal prediction. Experiments on 4-week-ahead forecasting of statewide COVID-19 death counts in the U.S. show an improvement in coverage over the ensemble forecaster used in official CDC communications. We also run experiments on predicting electricity demand, market returns, and temperature using autoregressive, Theta, Prophet, and Transformer models. We provide an extendable codebase for testing our methods and for the integration of new algorithms, data sets, and forecasting rules.
|Abbass Sharif (AXS)
|Samuel Fleischer (LA Dodgers)
|Leo Pekelis (Gradient)
|Justin Dyer (LinkedIn)
|Xiaowu Dai (University of California, Los Angeles)
|Kernel ordinary differential equations
|Abstract: The ordinary differential equation (ODE) is widely used in modelling biological and physical processes in science. A new reproducing kernelbased approach is proposed for the estimation and inference of ODE given noisy observations. The functional forms in ODE are assumed to be known or restricted to be linear or additive, and pairwise interactions are allowed. Sparse estimation is performed to select individual functionals and construct confidence intervals for the estimated signal trajectories. The estimation optimality and selection consistency of kernel ODE are established under both the low-dimensional and high-dimensional settings, where the number of unknown functionals can be smaller or larger than the sample size. The proposal builds upon the smoothing spline analysis of variance (SS-ANOVA) framework, but tackles several important problems that are not yet fully addressed, and thus extends the scope of existing SS-ANOVA too.
|Hengrui Cai (University of California, Irvine)
|Towards Trustworthy Explanation: On Causal Rationalization
|Abstract: With recent advances in natural language processing, rationalization becomes an essential self-explaining diagram to disentangle the black box by selecting a subset of input texts to account for the major variation in prediction. Yet, existing association-based approaches on rationalization cannot identify true rationales when two or more snippets are highly inter-correlated and thus provide a similar contribution to prediction accuracy, so-called spuriousness. To address this limitation, we novelly leverage two causal desiderata, non-spuriousness and efficiency, into rationalization from the causal inference perspective. We formally define a series of probabilities of causation based on a newly proposed structural causal model of rationalization, with its theoretical identification established as the main component of learning necessary and sufficient rationales. The superior performance of the proposed causal rationalization is demonstrated on real-world review and medical datasets with extensive experiments compared to state-of-the-art methods.
|Evan Rosenman (Claremont McKenna College)
|Shrinkage Estimation for Causal Inference and Experimental Design
|Abstract: How can observational data be used to improve the design and analysis of randomized controlled trials (RCTs)? We consider how to develop estimators to merge causal effect estimates obtained from observational and experimental datasets, when the two data sources measure the same treatment. We discuss two methods for deriving shrinkage estimators for this task: positing a shrinkage structure and minimizing an unbiased risk estimate; or using a hierarchical model. We develop several estimators under these frameworks. Next, we consider how these estimators might contribute to more efficient designs for prospective randomized trials. We show that the risk of a shrinkage estimator can be computed efficiently via numerical integration. We then propose algorithms for determining the experimental design -- that is, the best allocation of units to strata -- by optimizing over this computable shrinker risk.
|Jelena Bradic (University of California, San Diego)
|Dynamic treatment effects: high-dimensional inference under model misspecification
|Abstract: Estimating dynamic treatment effects is essential across various disciplines, offering nuanced insights into the time-dependent causal impact of interventions. However, this estimation presents challenges due to the "curse of dimensionality" and time-varying confounding, which can lead to biased estimates. Additionally, correctly specifying the growing number of treatment assignments and outcome models with multiple exposures seems overly complex. Given these challenges, the concept of double robustness, where model misspecification is permitted, is extremely valuable, yet unachieved in practical applications. This paper introduces a new approach by proposing novel, robust estimators for both treatment assignments and outcome models. We present a "sequential model double robust" solution, demonstrating that double robustness over multiple time points can be achieved when each time exposure is doubly robust. This approach improves the robustness and reliability of dynamic treatment effects estimation, addressing a significant gap in this field.
|Chinenye Ifebirinachi (North Carolina A&T State University)
|Estimating Causal Effects with Interaction Screening and Selection in High-Dimensional Data
|Boxin Zhao (University of Chicago)
|An Adaptive Sampling Approach for Data Market and Data Valuation
|Yating Liu (University of Chicago)
|Debiased Differential Network Estimator via Kendall's Tau for Transelliptical Graphical Models
|Joseph Graves (North Carolina A&T State University)
|Penalized Estimation of Covariance and Precision Matrices using Blockwise Missing Multimodal Data
|Michael Pokojovy (Old Dominion University)
|A robust deterministic affine-equivariant algorithm for multivariate location and scatter
|Daoji Li (California State University, Fullerton)
|CoxKnockoff: Controlled Feature Selection for the Cox Model Using Knockoffs
|Ziyi Liang (University of Southern California)
|Integrative conformal p-values for powerful out-of-distribution testing with labeled outliers
|Yanfei Zhou (University of Southern California)
|Uncertainty-Aware Learning with Conformalized Training
|Chiara Magnani (University of Milano-Bicocca)
|Rank Tests for Outlier Detection
|Jeff Cai (University of Notre Dame)
|Personalized reinforcement learning with applications to recommender system
|Abstract: Reinforcement learning (RL) has achieved remarkable success across various domains; however, its applicability is often hampered by challenges in practicality and interpretability. Many real-world applications, such as in healthcare and business settings, have large and/or continuous state and action spaces and demand personalized solutions. In addition, the interpretability of the model is crucial to decision-makers so as to guide their decision-making process while incorporating their domain knowledge. To bridge this gap, we propose a personalized reinforcement learning framework that integrates personalized information into the state-transition and reward-generating mechanisms. We develop an online RL algorithm for our framework. Specifically, our algorithm learns the embeddings of the personalized state-transition distribution in a Reproducing Kernel Hilbert Space (RKHS) by balancing the exploitation-exploration tradeoff. We further provide the regret bound of the algorithm and demonstrate its effectiveness in recommender systems.
|Siddhartha Chib (Washington University in St. Louis)
|Regression Under Endogeneity: Bernstein-von Mises Theory and Bayes Factors Testing
|Abstract : A standard assumption in the Bayesian estimation of linear regression models is that the regressors are exogenous (uncorrelated with the error). In practice, however, this assumption can be invalid. In this paper, under the rubric of the exponentially tilted empirical likelihood, we derive the consequences of neglected endogeneity. We derive a Bernstein-von Mises theorem for the posterior distribution of a (default) base model that assumes that the regressors are exogenous when that assumption, in fact, is false. We also develop a Bayes factor test for endogeneity that compares the base model with an extended model that is immune from the problem of neglected endogeneity. We prove that this test is a consistent selection procedure: as the sample becomes large, it almost surely selects the base model if the regressors are exogenous, and the extended model otherwise. The methods are illustrated with simulated data, and problems concerning the causal effect of automobile prices on automobile demand, and the causal effect of potentially endogenous airplane ticket prices on passenger volume.
|Yan Yu (University of Cincinnati)
|Machine Learning for Monetary Policy and Equity Market Risk Premia
|Abstract: We investigate the excess stock market return predictability via a comprehensive set of machine learning tools. We propose a novel determinant, the federal funds rate. We find that the federal funds rate negatively predicts excess equity market returns because of its dependence on inflation and unemployment. The novel findings lend compelling support to recent monetary asset pricing models and unequivocally validate Fed Chair Jerome Powell’s view: Our policy actions work through financial conditions. And those, in turn, affect economic activity, the labor market, and inflation. Formal variable selection analyses identify monetary policy as a crucial equity premium determinant together with the two most prominent asset pricing state variables, market price multiples and variance. We find the selected three-factor model has stable predictive power and even outperforms popular complex machine learning models in out-of-sample.
|Anqi Zhao (Duke University)
|Rerandomization based on p-values from covariate balance tests
|Abstract: Randomized experiments balance all covariates on average and are considered the gold standard for estimating treatment effects. Chance imbalances are nonetheless common in realized treatment allocations, complicating the interpretation of experimental results. To inform readers of the comparability of treatment groups at baseline, contemporary scientific publications often report covariate balance tables with not only covariate means by treatment group but also the associated p-values from significance tests of their differences. The practical need to avoid small p-values as indicators of poor balance motivates balance check and rerandomization based on these p-values from covariate balance tests (ReP) as an attractive tool for improving covariate balance in randomized experiments. Despite the intuitiveness of such strategy and its possibly already widespread use in practice, the literature lacks results about its implications on subsequent inference, subjecting many effectively rerandomized experiments to possibly inefficient analyses. To fill this gap, we examine a variety of potentially useful schemes for ReP and quantify their impact on subsequent inference. Specifically, we focus on three estimators of the average treatment effect from the unadjusted, additive, and interacted linear regressions of the outcome on treatment, respectively, and derive their asymptotic sampling properties under ReP. The main findings are threefold. First, the estimator from the interacted regression is asymptotically the most efficient under all ReP schemes examined, and permits convenient regression-assisted inference identical to that under complete randomization. Second, ReP, in contrast to complete randomization, improves the asymptotic efficiency of the estimators from the unadjusted and additive regressions. Standard regression analyses are accordingly still valid but in general overconservative. Third, ReP reduces the asymptotic conditional biases of the three estimators and improves their coherence in terms of mean squared difference. Based on these results, we recommend using ReP for design and the interacted regression for analysis to ensure both covariate balance and efficient inference. Importantly, our theory is design-based and holds regardless of how well the models involved in both the rerandomization and analysis stages represent the true data-generating processes.
|Denny Zhou (Google DeepMind)
|Teach language models to reason
|Abstract: Over the past decades, the machine learning community has developed tons of data-driven techniques aimed at enhancing learning efficiency, such as semi-supervised learning, meta learning, active learning, and transfer learning. However, none of these techniques have proven to be highly effective for real-world natural language processing tasks. This shortcoming uncovers a fundamental flaw in machine learning - the absence of reasoning. Humans often learn from just a few examples because of their capacity to reason, as opposed to only relying on data statistics. In this talk, I will present the large language models (LLM) reasoning work that we pioneered, and show our innovative approaches significantly bridge the gap between human intelligence and machine learning: crushed state-of-the-art results in the literature while demanding only a few (usually just one) annotated examples and no training. This monumental shift from a learning-centric to a reasoning-centric paradigm is having a profound impact across a myriad of real-world applications.
|Baharan Mirzasoleiman (U.C. Los Angeles)
|Data-efficient (Pre-)Training of Deep Networks
|Abstract: Large datasets have enabled over-parameterized neural networks to achieve unprecedented success. However, training such models, with millions or billions of parameters, on large data requires expensive computational resources,which consume substantial energy, leave a massive amount of carbon footprint, and often soon become obsolete and turn into e-waste. While there has been a persistent effort to improve the performance and reliability of machine learning models, their sustainability is often neglected. To improve the efficiency and sustainability of learning deep models, I discuss the first scalable framework with rigorous theoretical guarantees to identify the most valuable examples for training and pre-training deep networks. I also demonstrate the effectiveness of such a framework for (pre-)training various over-parameterized models on different vision and NLP benchmark datasets.
|Samet Oymak (U. of Michigan)
|Understanding Optimization Geometry of Transformer Models
|Abstract: Recent advances in language modeling, such as ChatGPT, have had a revolutionary impact within a short timeframe. These language models are based on the transformer architecture which uses the self-attention mechanism as its central component to process language sequences. However, the theoretical principles underlying the attention mechanism are poorly understood, especially the nonconvex optimization dynamics. In this talk, we show that the optimization dynamics of the attention layer is biased towards implementing "hard retrieval" and "soft composition" operations. "Hard retrieval" refers to the model's ability to precisely select the set of "top tokens" within the input sequence (e.g. selecting the most relevant words within a sentence). The model accomplishes this by implicitly solving a support vector machine problem that separates the top tokens from the rest of the sequence. As its output, the model creates a convex combination of the top tokens which we refer to as "soft composition". Finally, we will discuss the next token prediction in language modeling where the input and output tokens belong to a discrete vocabulary. Here, we show that the generative process implemented by the transformer can be understood through a Markov chain-like process extracted from the training data. We will conclude by discussing new directions and open problems.
|Quanquan Gu (U.C. Los Angeles)
|Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
|Abstract: Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data. We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN), which starts from a supervised fine-tuned model. At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. More specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. Our method progressively elevates the LLM from a nascent model to a formidable one, unlocking the full potential of human-annotated demonstration data for SFT. Theoretically, we prove that the global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution. Empirically, we evaluate our method on several benchmark datasets including the HuggingFace Open LLM Leaderboard, MT-Bench, and datasets from Big-Bench. Our results show that SPIN can significantly improve the LLM's performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
|Wen Zhou (Colorado State University)
|Optimal nonparametric inference on network effects with dependent edges
|Abstract: Testing network effects in weighted directed networks is a foundational problem in econometrics, sociology, and psychology. Yet, the prevalent edge dependency poses a significant methodological challenge. Most existing methods are model-based and come with stringent assumptions, limiting their applicability. In response, we introduce a novel, fully nonparametric framework that requires only minimal regularity assumptions. While inspired by recent developments in U-statistic literature (Chen and Kato, 2019; Zhang and Xia, 2022), our approach notably broadens their scopes. Specifically, we identified and carefully addressed the challenge of indeterminate degeneracy in the test statistics -- a problem that aforementioned tools do not handle. We established Berry-Esseen type bound for the accuracy of type-I error rate control. Using original analysis, we also proved the minimax optimality of our test's power. Simulations underscore the superiority of our method in computation speed, accuracy, and numerical robustness compared to competing methods. We also applied our method to the U.S. faculty hiring network data and discovered intriguing findings.
|Emma Zhang (Emory University)
|Modeling networks with textual edges
|Abstract: Edges in many real-world networks are associated with rich text information, such as email communications between accounts and interactions between social media users. To better account for the rich text information, we propose a new latent space network model that treats texts as embedded vectors. We establish a set of identifiability conditions for the proposed model and formulate a projected gradient descent algorithm for model estimation. We further investigate theoretical properties of the iterates from the proposed algorithm. The efficacy of our method is demonstrated through simulations and an analysis of the Enron email dataset.
|Annie Qu (University of California, Irvine)
|A Model-Agnostic Graph Neural Network for Integrating Local and Global Information
|Abstract: Graph neural networks (GNNs) have achieved promising performance in a variety of graph focused tasks. Despite their success, the two major limitations of existing GNNs are the capability of learning various-order representations and providing interpretability of such deep learning-based black-box models. To tackle these issues, we propose a novel Model-agnostic Graph Neural Network (MaGNet) framework. The proposed framework is able to extract knowledge from high-order neighbors, sequentially integrates information of various orders, and offers explanations for the learned model by identifying influential compact graph structures. In particular, MaGNet consists of two components: an estimation model for the latent representation of complex relationships under graph topology, and an interpretation model that identifies influential nodes, edges, and important node features. Theoretically, we establish the generalization error bound for MaGNet via empirical Rademacher complexity and showcase its power to represent the layer-wise neighborhood mixing. We conduct comprehensive numerical studies using both simulated data and a real-world case study on investigating the neural mechanisms of the rat hippocampus, demonstrating that the performance of MaGNet is competitive with state-of-the-art methods.
|Joshua Cape (University of Wisconsin-Madison)
|On spectral estimators for stochastic blockmodels
|Abstract: Stochastic blockmodel random graphs have received considerable attention in the statistics community over the past decade. This talk highlights two recent spectral-oriented works, addressing (i) asymptotic efficiency when estimating connectivity probabilities and (ii) the use of rotations when estimating sparse latent factor membership matrices.