Introducing Standard Model Biomedicine

We're building a multimodal foundation model that integrates any type of data for biopharma, academic medical centers, and biobanks.

Sep 30, 2025

We are building the Standard Model for Biomedicine.

The Standard Model is a multimodal foundation model that integrates any biomedical measurement at any scale, from the molecular to the whole-patient level, into a common, shared representation of a human patient. This is a fundamentally human endeavor that respects the inherent complexity of patients.

We expect biomedical AI researchers will tailor the Standard Model for downstream use cases as diverse as medicine itself. In biopharma, we anticipate the Standard Model powering tasks ranging from digital twins for prognosis to site enrollment and trial planning and execution. In academic medicine, we foresee our model pushing forward AI-driven radiation toxicity assessments in early lung cancer patients using electrocardiograms and radiology scans, or patient genomic profiles for targeted therapy recommendations.

Broadly speaking, the Standard Model will accelerate AI development timelines and increase performance of downstream applications in biomedical AI, just as other foundation models have done in language and vision in other industries.

We are backed on this mission by VC firms Arkitekt Ventures and Virtue, and we’ve grown a team of AI and biopharma leaders with deep domain expertise to build our model.

Biomedicine Needs a Multimodal Foundation Model

We are driven by four beliefs about what it takes to accelerate performant AI for human biology:

Performance Is Everything in Biomedicine. Small differences in clinical efficacy relative to standard of care drive billions of dollars in revenue and change millions of lives. Every marginal improvement in performance can define clinical, regulatory, or commercial success.

Data Enables Performance. The primacy of scale is well accepted in traditional machine learning and AI; the most performant large language models are trained on nearly exhaustive amounts of human text, and likewise for vision models. The largest biomedical models have not yet been trained at this scale.

Data is Naturally Siloed in Biomedicine. Any organization with ultimately limited data access will likely fail, because they will never scale to the level required. Because biopharma and academic researchers typically collect data in narrow bands for specific questions, the data required for training at scale never amasses within one organization.

The Most Performant Foundation Models Will Not Be Siloed. The best foundation models will be trained on data across modalities as well as disease areas; data that is naturally siloed in the medical world today. Those siloes are detrimental to foundation model training. Patient data is inherently multimodal because patients are inherently multimodal; relevant measurements even for very specific indications span multiple ways of measuring patients.

Where We Are Headed: From a Universal Patient Representation to a Biological Reasoning Engine

We are building a universal foundation model for biomedicine. We do this by ingesting data from any modality and mapping it to a shared representation space. Each patient’s data is translated into a set of embeddings across time, representing patient state. Much like physics represents particle systems in appropriate coordinates, we develop a new coordinate system for patients.

Once this fundamental representation – the Standard Model – is established, it can be fed to other reasoning engines. These engines can then drive decisions in diverse use cases across biopharma and academic medicine, such as:

analyzing natural histories of patient disease
optimizing care pathways
forecasting clinical trials in silico
optimizing inclusion/exclusion criteria
modeling the probability of technical and regulatory success
and beyond

Critically, we intend to serve as the quiet backbone of these systems. Biopharma and academic medical centers should spend time bringing domain expertise to downstream applications, context-specific benchmarking (e.g. clinical trials), and curating specialized datasets. Standard Model Biomedicine offers the benefits of AI performance that come with scale and allows experts to optimize for the last 10 percent specific to their applications.

Where We Are Today

Human biology operates meaningfully across several scales, from the incredibly small, such as genomic mutations, to the incredibly large, such as lifestyle choices or surgical interventions. We firmly believe the best foundation model will integrate data from every scale.

To that end, we’re releasing three new papers focused on molecular, cellular, and whole-patient scales. These publications build on the paper we quietly published last year to showcase the strength of our model: Advancing High Resolution Vision-Language Models in Biomedicine.

Taken together, we have published state-of-the-art methods at every level of human biology. Ultimately, we link models at every scale into the Standard Model of Biomedicine.

1. Oncology & Whole Genome Sequencing

The ability to encode genomic sequences is the most basic and fundamental aspect of the Standard Model. Genomic mutations are fundamental to oncology and other diseases; any model of biomedicine must represent them in some way.

In this paper, we trained GenVarFormer (GVF), a whole genome sequencing foundation model, by predicting the functional consequence of variants on gene expression to achieve state-of-the-art performance on downstream tasks.

Read the full paper here: GenVarFormer: Predicting Gene Expression From Long-Range Mutations in Cancer

2. A Molecular Language Model

Linking molecular and cellular foundation models to a shared representation space, much like vision-language models, is a fundamental challenge for building the Standard Model. To that end, we developed a test case linking proteomic graph neural networks to language. This approach is applicable beyond proteomics to any graph-based representations at a cellular level.

Read the full paper here: Patient-Specific Biomolecular Instruction Tuning of Graph-LLMs

3. Using EHRs to Predict Next Medical Codes

Any multimodal foundation model must have the ability to ingest text, in particular longitudinal EHRs, the broadest and highest-level modality of input. This work reframes EHRs as timestamped chains of clinical events and fine‑tunes large language models to predict the next event, improving temporal reasoning over disease trajectories.

Read the full paper here: Building the EHR Foundation Model via Next Event Prediction

Our Vision for the Future of Biomedicine

Going forward, Standard Model Biomedicine will continue to drive cutting-edge research, deploy our model to partners across the ecosystem, and collaborate with the highest-quality data sources in biomedicine. We will combine all modalities and scales of data into a single overarching Standard Model of human biology.

We’re always looking for people to join us in building the future of biomedical research.

If you’re a biopharma company that wants to partner or a data source that wants to drive value from your data, reach out here
If you’re a medical AI researcher that wants to partner, reach out here
If you’re both technical and biologically inclined and would like to join our team, reach out here
If you’d just like to chat about foundation models in biology, definitely reach out here

We’re looking forward to sharing more about Standard Model Biomedicine and our work. To stay updated, follow us on LinkedIn and X, subscribe to our Substack, or reach out to us.

The foundation for the future of biomedical research and drug development is being laid now, one datapoint at a time.