The AI Evaluation Health Systems Are Not Doing

Health systems have serious processes for evaluating clinical AI before deployment. Vendor selection. Pilot data review. Governance committee sign-off. Legal review. IT security. Clinical leadership approval. Then the tool goes live, and in most health systems, the evaluation essentially stops.

This is not negligence. It is the predictable result of a governance infrastructure designed for deployment decisions, not for ongoing performance management. The committee that approved the tool exists to approve tools. Once the tool is approved, the committee moves on to the next decision. Nobody is assigned to own what happens to the tool in Year 2.

The problem is that the tool does not stay the same.

The Algorithm You Approved Is Not Necessarily the Algorithm Running Today

In 2021, the FDA finalized its framework for Predetermined Change Control Plans (PCCPs), which allows AI and machine learning medical device manufacturers to update their algorithms after FDA clearance without submitting a new 510(k) application — as long as those updates fall within a pre-specified modification plan.

What this means practically: your vendor can update the algorithm after your governance committee approves it. The update may improve performance. It may affect performance in your specific patient population in ways that are not immediately visible in aggregate statistics. The FDA process permits it. Your contract may or may not require notification.

I have reviewed contracts from major clinical AI vendors over the past two years. Update notification requirements ranged from 30 days advance notice to nothing at all. In two contracts, there was no update notification provision whatsoever.

Your governance committee approved a specific tool with specific performance characteristics on your patient population. That approval was based on data. If the tool has been updated since deployment, the data the committee reviewed may no longer describe what is making clinical recommendations in your system today.

"The approval process answers the question: should we deploy this? Nobody has built the process that answers: is this still the tool we approved?"

Three Things That Change After Deployment, Without Anyone Telling You

Change 01

The Algorithm Itself

Vendor updates are the most visible version of this problem, but not the only one. Even without a formal algorithm update, models drift. A sepsis prediction model trained on pre-pandemic population data performs differently on post-pandemic populations. The model weights have not changed. The underlying distribution has. Performance can degrade gradually and invisibly across a full year before anyone notices the alert pattern has shifted.

The governance committee reviewed performance data at a point in time. That performance is not guaranteed to persist as the patient population, documentation practices, and clinical workflows evolve around the model.

Change 02

The EHR Integration

Clinical AI tools do not operate independently. They pull data from your EHR, and the quality and consistency of that data determines what the model sees. When your health system upgrades its EHR, modifies documentation templates, changes how a clinical variable is coded, or restructures a data feed, the input to the AI tool changes.

The tool has no mechanism to flag this. The clinical team has no reason to notice until something looks wrong. An EHR migration that went smoothly from an operations standpoint can silently degrade the performance of every AI tool downstream of it. Integration drift is one of the most common and least-tracked failure modes in deployed clinical AI.

Change 03

The Workflow Context

Tools are approved in the context of a specific workflow. Workflows change. Staff turn over. New clinicians who were not part of the implementation are trained on a version of the workflow that has already drifted from the original. The AI tool that was integrated thoughtfully in Year 1 becomes, by Year 3, a background process that generates alerts nobody has been explicitly trained to interpret.

Alert fatigue compounds this. If a tool's alert volume increases after an algorithm update or population drift, the clinical team's response changes: more alerts get dismissed, more get overridden without documentation. The tool is still "running." Its clinical impact has fundamentally changed.

What a Post-Deployment Monitoring Program Actually Requires

This is not a complicated process. The basic infrastructure is a calendar event and a named owner.

Four-Element Monitoring Framework

Defined review schedule: Quarterly for high-stakes tools (clinical decision support, diagnostic AI, sepsis/deterioration prediction). Annual minimum for lower-stakes tools. The review schedule goes into the governance charter, not a memo.
Vendor notification mechanism: A process to receive, log, and review vendor notifications of algorithm updates. If the contract does not require notifications, the monitoring program cannot fill that gap — this is a contract negotiation issue, addressed below.
Alert pattern monitoring: A mechanism to flag unexpected changes in alert volume, alert type distribution, or clinician override rate. These are early warning indicators of model drift or integration changes. Most EHRs have reporting infrastructure to surface this; it requires someone to look at it.
Named clinical owner: One person's name on a document that says: I am responsible for knowing if this tool is still performing as approved. Not a committee. Not a department. One person. The committee can review their report. The accountability needs to be individual.

None of this requires new technology. It requires organizational will and explicit assignment. In most health systems, the governance committee that approved the deployment does not own post-deployment performance. The vendor's contract does not require them to proactively flag performance changes in your specific environment. The monitoring program fills a gap that both of those structures leave open.

The Contract Is Where This Starts

Post-deployment monitoring provisions cannot be retrofitted into a signed agreement. If the contract is already executed, you are working with whatever audit rights and notification requirements you negotiated before you signed — and if you did not negotiate them specifically, you may have none.

The provisions to negotiate before deployment:

Contract Provisions Checklist

Performance warranty with defined thresholds: A contractual commitment that the tool will maintain specified accuracy, sensitivity, or specificity metrics over the contract term. Specify the metrics that matter for your clinical use case, not generic performance averages.
Algorithm update notification: Advance notice (30 days minimum) before any material model change. Define "material" in the contract: any change to model architecture, training data, or performance characteristics exceeding a defined threshold.
Audit rights: The right to request and review algorithm update logs, performance data in your specific deployment environment, and documentation of changes made since the original deployment. Without this, you have no contractual mechanism to verify what changed.
Integration change protocol: A defined process for vendor testing and communication when your EHR or underlying data infrastructure changes. Who is responsible for re-validating the integration? What is the timeline? What happens if post-change performance is degraded?

These provisions are negotiable. Most vendors have not been asked for them explicitly. The AI Due Diligence Contract Guide covers the audit rights and performance warranty language in detail, with specific provisions to look for and red flags in standard vendor contract language.

AI Due Diligence Contract Guide

Covers audit rights, performance warranty provisions, algorithm update clauses, and the specific language to negotiate before you sign. Free download.

Get the guide at drsarahmatt.com/contract-guide

If your health system is approaching a contract renewal and you have never reviewed the monitoring provisions, that is the right moment for a conversation.

Book a call: calendly.com/sarahmattmd

THE SARAH MATT BRIEFING

The practitioner-level context doesn't live on this blog.

Each week, subscribers get the field specifics, vendor questions, and practitioner tools that go beyond the public post. This week: the five questions to ask your AI vendor right now about post-deployment performance, and the field note from two contracts that had no update notification provision at all.

Weekly. No filler. Unsubscribe anytime.

Subscribe at drsarahmatt.com