About me
I work on pre-training at Anthropic. Previously, I co-founded Voyage AI and led its research to develop the best embeddings models and rerankers for semantic search and information retrieval in the industry. I did my PhD at Stanford University, affiliated with Stanford AI Lab and the Stanford NLP Group. My research interests broadly lie in synthetic data, multimodal learning, and reasoning.
News
- 2025New paper — Synthetic Bootstrapped Pretraining: the first synthetic pre-training method that doesn't rely on teacher distillation.
- 2025MoCa scales multimodal embedding models with unlabeled interleaved multimodal data.
Language Models
- Synthetic Bootstrapped Pretraining
ICLR 2026 Twitter - MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings
ACL 2026 · Oral Code Twitter - Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
ICLR 2024 Twitter - Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
ICLR 2024 Code Twitter - Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
ICML 2023 · Oral Code Twitter - Self-supervised Learning is More Robust to Dataset Imbalance
ICLR 2022 · Spotlight Code Twitter
Domain Adaptation and Transfer Learning
- Cycle Self-Training for Domain Adaptation
NeurIPS 2021 Code - Learning to Adapt to Evolving Domains
NeurIPS 2020 - Meta-learning Transferable Representations with a Single Target Domain
ArXiv 2011.01418 - Towards Understanding the Transferability of Deep Representations
ArXiv 1909.12031 - Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers
ICML 2019 · Long Talk Code - Separate to Adapt: Open Set Domain Adaptation via Progressive Separation
CVPR 2019 Code
