Publications
2025
-
PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning ScaffoldPreprint 2025We present PokeeResearch-7B, a 7 billion-parameter research agent trained using Reinforcement Learning from AI Feedback (RLAIF) with LLM-based reward signals focused on factual accuracy, citation faithfulness, and instruction adherence. Our approach incorporates a chain-of-thought reasoning framework to improve robustness and handle tool failures. The model achieves state-of-the-art performance among 7B-scale deep research agents across ten benchmarks.
2023
-
Cross-Modal Fine-Tuning: Align then RefineICML 2023We propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities.
2021
-
Geometry-Aware Gradient Algorithms for Neural Architecture SearchICLR 2021 (Spotlight)We argue for the study of single-level empirical risk minimization to understand NAS with weight-sharing, reducing the design of NAS methods to devising optimizers and regularizers. Invoking the theory of mirror descent, we present a geometry-aware framework that exploits the underlying structure of this optimization to return sparse architectural parameters, leading to simple yet novel algorithms that enjoy fast convergence guarantees and achieve state-of-the-art accuracy on the latest NAS benchmarks. We achieve near-oracle-optimal performance on CIFAR-10 and CIFAR-100.
-
Rethinking Neural Operations for Diverse TasksNeurIPS 2021We revisit the problem of designing effective neural operations for diverse tasks. We find that standard convolutions are not the best choice for many tasks, and propose XD-operations, a family of operations that can be efficiently searched over to find the best operation for a given task. We demonstrate the effectiveness of XD-operations on a diverse set of tasks spanning 1D, 2D, and 3D data.
-
Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-SharingNeurIPS 2021We investigate hyperparameter tuning in the federated learning setting, where the goal is to find hyperparameters that perform well across heterogeneous clients. We identify key challenges, propose FedEx as a practical baseline, and establish connections to weight-sharing methods from neural architecture search.
-
On Data Efficiency of Meta-learningAISTATS 2021We study the data efficiency of modern meta-learning algorithms. Using techniques from algorithmic stability, we derive bounds on the transfer risk that indicate how much supervision is needed for each method. We propose active meta-learning, which incorporates active data selection into learning-to-learn, leading to better performance in the limited supervision regime.
2020
-
A System for Massively Parallel Hyperparameter TuningMLSys 2020We introduce ASHA, a simple and robust hyperparameter optimization algorithm which exploits parallelism and aggressive early-stopping to tackle large-scale hyperparameter optimization problems. Our extensive empirical results show that ASHA outperforms existing state-of-the-art methods; scales linearly with the number of workers; and is suitable for massive parallelism, converging to a high quality configuration in half the time taken by Vizier (Google's internal hyperparameter optimization service) in an experiment with 500 workers.
2019
-
Random Search and Reproducibility for Neural Architecture SearchUAI 2019We propose new NAS baselines based on the observations that NAS is a specialized hyperparameter optimization problem and random search is a competitive baseline. Our results show that random search with early-stopping performs at least as well as ENAS on PTB and CIFAR-10. We also explore reproducibility issues of published NAS results and provide all information needed to exactly reproduce our results.
2018
-
Hyperband: A Novel Bandit-Based Approach to Hyperparameter OptimizationJMLR 2018Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. We formulate hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations. We introduce Hyperband, a novel algorithm for this framework that provides over an order-of-magnitude speedup over Bayesian optimization methods on a variety of deep-learning and kernel-based learning problems.
2017
-
Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter OptimizationICLR 2017We present Hyperband, a novel algorithm for hyperparameter optimization that is simple, flexible, and theoretically sound. Hyperband is a principled early-stopping method that adaptively allocates a predefined resource to randomly sampled configurations, providing over an order of magnitude speedups on neural network and kernel-based learning problems.