Artificial intelligence relies on machine learning algorithms trained on massive datasets to make predictions—think of how ChatGPT learned language by gorging on the internet. In biology, however, scientists face a frustrating challenge—the high-quality datasets needed to train powerful artificial intelligence models are rare. Without these datasets, we can’t harness machine learning to tackle our most pressing health challenges.