About me

I work on pre-training at Anthropic. Previously, I co-founded Voyage AI and led its research to develop the best embeddings models and rerankers for semantic search and information retrieval in the industry. I did my PhD at Stanford University, affiliated with Stanford AI Lab and the Stanford NLP Group. My research interests broadly lie in synthetic data, multimodal learning, and reasoning.

News

  • 2025New paper — Synthetic Bootstrapped Pretraining: the first synthetic pre-training method that doesn't rely on teacher distillation.
  • 2025MoCa scales multimodal embedding models with unlabeled interleaved multimodal data.

Language Models

Domain Adaptation and Transfer Learning