AI for drug discovery has seen a serious influx of capital over the last few years. Across target identification, molecular design, and lab automation, companies have collectively raised more than $20 billion.1 The result is an increasingly dense landscape of AI-native companies focused on improving how drugs are discovered and designed.
But stepping back and looking across the full biopharma value chain, I think there is an asymmetry forming.
Most of this activity is concentrated in the preclinical stage. That’s where the tooling, the talent, and the capital are going. By contrast, late-stage clinical development – where drugs are tested, documented, and ultimately submitted for approval – remains relatively underexplored from an AI-first perspective.
This is somewhat surprising given where the bulk of operational effort actually sits. Late-stage clinical is dominated by document-heavy workflows: clinical trial protocols, regulatory submissions, safety reports, site communications, and compliance documentation. These processes are not only labor-intensive but also highly structured, repetitive, and governed by clear rules – characteristics that have made them prime candidates for automation in other industries.
Excluding clinical trial recruitment, which is a very different beast, I could only find a small number of companies that are focused on automating the operational backbone of clinical development and regulatory work, and collectively, they appear to have raised somewhere in the low hundreds of millions.
To put that into perspective, just 4 operational workflows that I think are well-suited for LLM automation account for more than $30B in annual spend — or 10% of the entire global pharma R&D budget.2
Example biopharma workflows that are ready for automation
Among many other adjacent late-stage clinical functions that could follow, these categories seem most logical for LLM automation to me — they are large, repetitive, heavily document-driven, and still rely on a surprising amount of manual human labor.
Regulatory affairs is best thought of as the communication layer. It translates scientific results into something regulators can review and approve. What makes it interesting for LLMs: Inputs and outputs are highly structured, the source data already exists, and the deliverables follow standardized formats. Annual spend is around $17B3, much of it still driven by very manual, people-heavy work.
Pharmacovigilance is the drug-safety surveillance system – it monitors, evaluates, and communicates if a medicine causes safety problems after approval. While case intake is already largely automated, aggregating those cases into the periodic safety reports regulators require is still almost entirely manual. A top-10 pharma processes more than 700,000 adverse event reports per year4 with a total annual industry spend of roughly $8B.5
Clinical trial programming creates SAS or R code that transforms patient records into formats regulators can read – a single Phase 3 trial may require 500 to 2,000 individual scripts, totaling an annual service spend of around $3B.6 LLMs are already revolutionizing general software engineering, and this vertical might be next.
Medical writing turns trial results into the core documents needed to move a program forward — protocols, investigator brochures, clinical study reports, and submission summaries. Formats are again very standardized, but the work is still incredibly manual and time-consuming, with an annual spend of around $5B.7
So, if this is such a big opportunity, why do we see so few companies in that space?
It is boring
It just is. The human brain is wired to discover and to understand how the world works. Humans are empathetic — they want to help others. Drug discovery is the combination of both, which is why this work feels so rewarding.
Biopharma’s back office is the opposite. It is painstaking, dry, almost accounting-adjacent work. Nobody would put “co-pilot for periodic safety reports” on a Bay Area 101 billboard. But the same profile is exactly what made AI dominant in legal, accounting, and tax. These are all industries where the work follows a very similar pattern: document in, document out; externally defined standards that determine correctness, highly templated outputs, saving money, and time.
Mistakes carry a lot more weight
In law or accounting, errors usually get caught in review or, worst case, can be fixed later. In late-stage clinical development, the consequences are more serious and often compound.
If the vigilance process misses safety signals, patients are at risk of serious harm. For example, Vioxx was pulled from the market in 2004 because it was linked to an increased risk of heart attacks and strokes. The signal only emerged slowly across a broad population and was recognized too late, resulting in tens of thousands of excess deaths before withdrawal.8
Mistakes or inconsistencies in the submission process can also severely delay approval. FDA submissions are extremely long, highly interconnected document sets, and if models introduce cascading inconsistencies or hallucinations, the FDA can reject them. That forces the sponsor to fix the issues, resubmit, and wait for the agency to review the updated filing again. The process can take 6-18 months, much of it outside the company’s control, and for a high-revenue drug, that can mean hundreds of thousands of dollars in lost sales per day.
Really hard to sell into, and revenue is more fragmented than it looks
Because the stakes are so high, buyers tend to gravitate toward large, established vendors with long track records in the industry. Even when this work is outsourced, sponsors still retain responsibility for the outcome, which makes it much harder for a small startup to win critical regulatory or safety work.
On top of that, the sale is just operationally messy. Companies are not just buying software seats — they have to fit the tool into deeply rooted internal processes that often vary from company to company. For example, using AI to help prepare an FDA submission would not just touch regulatory, but also medical writing, clinical, safety, QA, and publishing, each of which may have its own review order, handoffs, systems, and SOPs.
And that headline number is split across regulatory, pharmacovigilance, clinical programming, and medical writing — each with its own buyers and vendor relationships. A startup picking one vertical is looking at a much smaller TAM, and going horizontal means selling into all of them at once.
Big AI companies might dominate — and this might be the biggest risk
Of all the reasons above, this one feels most important. Because these are deeply embedded internal processes, there is a strong incentive for biopharma companies to build these systems in-house on top of frontier models, and I think we are seeing some early data points here already. Moderna has rolled out ChatGPT Enterprise broadly and built internal GPTs like Dose ID.9 Pfizer integrated Claude into its internal Vox platform. Anthropic recently launched Claude for Life Sciences, explicitly aimed at “scientists, clinical trial coordinators, and regulatory managers,”10 and AbbVie, Sanofi, Novo Nordisk, AstraZeneca, and Genmab are already using it across regulatory compliance, clinical documentation, and drug discovery. Anthropic has reported that Novo Nordisk cut clinical study documentation from more than 10 weeks to 10 minutes using Claude — exactly the kind of medical writing work a startup might otherwise sell into.
If that pattern holds, the big AI firms may end up taking a more Palantir-like position here — not just providing the model, but helping large biopharmas build internal systems that integrate well into existing workflows. In that world, the value accrues to the model layer and to internal teams, and there is not much oxygen left for a standalone vertical SaaS company.
Still curious
The arguments that small mistakes are heavily punished, sales cycles are slow, and big AI firms might dominate are all very real. But given the size of the white space, I’m a bit surprised we are not seeing more companies being built in the space.
If there are, and I just haven’t found them, I would love to hear!
- PitchBook, “AI biotechs fetch big premiums as investors pile into drug discovery startups” (December 2025). https://pitchbook.com/news/articles/ai-biotechs-fetch-big-premiums-as-investors-pile-into-drug-discovery-startups ↩︎
- BioSpace, “Undeterred by Political, Economic Headwinds, Pharma Ups R&D Investment in 2024 and Beyond” (May 2025). https://www.biospace.com/business/undeterred-by-political-economic-headwinds-pharma-ups-r-d-investment-in-2024-and-beyond ↩︎
- Fact.MR, “Regulatory Affairs Outsourcing Market” (2024). https://www.factmr.com/report/regulatory-affairs-outsourcing-market ↩︎
- TransCelerate BioPharma / Drug Safety, “Individual Case Safety Report Replication: An Analysis of Case Reporting Transmission Networks” (2023). https://pmc.ncbi.nlm.nih.gov/articles/PMC9870831/ ↩︎
- Grand View Research, “Pharmacovigilance Market Size, Share & Trends Analysis Report” (2024).
https://www.grandviewresearch.com/industry-analysis/pharmacovigilance-industry ↩︎ - Growth Market Reports, “Biostatistics and Programming Services Market” (2024). https://growthmarketreports.com/report/biostatistics-and-programming-services-market ↩︎
- Market Data Forecast, “Global Medical Writing Market” (2024). https://www.marketdataforecast.com/report/global-medical-writing-market ↩︎
- U.S. Food and Drug Administration, “Vioxx (rofecoxib) Questions and Answers.” https://www.fda.gov/drugs/postmarket-drug-safety-information-patients-and-providers/vioxx-rofecoxib-questions-and-answers ↩︎
- OpenAI, “Moderna” case study. https://openai.com/index/moderna/ ↩︎
- Anthropic, “Claude for Life Sciences” (October 2025). https://www.anthropic.com/news/claude-for-life-sciences ↩︎

