Foundation-model strategy: frontier-hosted, open-weights, or bespoke fine-tunes

Single-vendor dependency on a foundation model is a strategic risk

Most enterprises picked one frontier-model vendor in 2023 or 2024 and built increasingly deep dependence on that vendor's models. The dependency is invisible in the good times — capability improves, prices come down, the integration just works. The dependency surfaces sharply when prices change unexpectedly, the model deprecates, the data residency story shifts, or a competitor leapfrogs and you discover your team only knows how to call one API.

The right framing is portfolio thinking. The same enterprise that diversifies cloud providers, payment processors, or telecom carriers should diversify foundation models. Not because frontier vendors are unreliable, but because strategic capability that lives entirely outside your walls is structurally fragile regardless of vendor quality.

Frontier-hosted models earn their cost on capability-bound tasks

Hosted frontier models — Anthropic, OpenAI, Google — are at the capability frontier and continue to widen the gap on the hardest tasks: long-context reasoning, complex tool use, instruction-following under ambiguity, and the agentic loops that depend on all of these. For use cases where the marginal capability matters and volumes are moderate, frontier-hosted is the right default.

The cost premium is real but often justified by what gets shipped. A single frontier model call producing a correct answer on the first try is usually cheaper than three open-weights calls plus a human review queue. The unit economics are not always what the per-token price suggests.

Inference spend reduction: 30–45% mixed vs all-frontier
Sovereignty-sensitive workloads: Self-hosted open weights
Provider count, prod: 3–5 frontier + open + fine-tuned
Model swap, gateway: Config change no re-architecture

Open-weights models earn their place on volume and sovereignty

Open-weights models — Llama, Mistral, Qwen, and the rapidly improving family of mid-size models — are within capability range of frontier models on a growing set of tasks, deployable wherever the enterprise chooses, with full data control. For high-volume, lower-complexity workloads, open-weights deployed on the enterprise's own infrastructure or a sovereign cloud often produces 4–8x lower per-token cost.

The total-cost picture has to include the operational burden: GPU capacity planning, model serving infrastructure, monitoring, version management. For sufficient volume, the operational cost is amortized away. For low-volume workloads, hosted frontier remains cheaper end-to-end. The crossover point depends on volume and on the workload's tolerance for latency.

Bespoke fine-tunes earn their place on domain depth

Fine-tuning a smaller model on a domain corpus produces a model that beats general frontier capability on that specific domain at a fraction of the inference cost. This is the right move for narrow, high-volume use cases — claims processing, support routing, document classification, structured extraction — where the workload is repetitive and the domain knowledge is concentrated.

The investment to produce a bespoke fine-tune is real: training data curation, eval set development, training infrastructure, model serving. But the return on a high-volume use case is durable: a model the enterprise owns, runs cheaply, and can refine continuously as the domain evolves. Vendor dependency drops to zero.

A model gateway makes the mix operationally swappable

The integration question 'which model do we call' has to be a runtime configuration, not a hardcoded library import. A model gateway sits in front of every model call, routes to the appropriate model based on the use case's declared requirements, handles auth and rate limiting and cost attribution, and produces the audit trail across providers.

With a gateway in place, switching providers becomes a configuration change. Without it, every model swap is a re-architecture. We deploy the gateway as part of the platform stand-up on the Enterprise AI In-House program because the foundation-model strategy is operationally meaningless without it.

Capability evals are how you make the mix decision repeatedly

Model selection is a recurring decision, not a one-time choice. New models ship every few months; capabilities shift; pricing changes. The way to make the decision repeatedly is a capability eval suite that runs each candidate model through the use cases that matter, with the metrics the business cares about: accuracy on representative tasks, refusal accuracy, format adherence, latency, cost.

We deploy the eval suite as a permanent fixture, not a one-shot evaluation. Every quarter, the suite runs against current and candidate models. The procurement and AI council reviews the results and adjusts the mix. The decision is data-driven and repeatable; vendor pitches are inputs, not outputs.

Sovereignty matters more for some workloads than others

Some workloads are fine on hosted frontier — public documentation Q&A, marketing content generation, generic agentic tasks against external data. Other workloads cannot leave the enterprise's perimeter — regulated financial data, healthcare PHI, classified or controlled content, M&A-relevant deal data. The sovereignty requirement determines where the workload runs, which determines the model that can serve it.

Open-weights models on enterprise infrastructure or a sovereign cloud serve the second category. The capability gap to frontier is real on some tasks and negligible on others. The model-selection rule is to match capability requirement against the available sovereign options and only escalate to hosted frontier when the workload's sovereignty allows it.

We were paying a frontier vendor at full freight for a workload that an open-weights model on our own GPUs handled at 88% of the quality and one-sixth the cost. Once we built the gateway, we moved that workload over in a week. The capability hit was within tolerance and the savings funded the rest of the platform.
— Head of AI Engineering, Fortune 500 logistics

Frequently asked

Why diversify foundation-model providers?

Single-vendor dependency on a strategic capability is structurally fragile regardless of vendor quality. Pricing changes, model deprecations, data residency shifts, and competitive leapfrogs all become operational shocks when the enterprise has only one model provider. Diversification — frontier-hosted plus open-weights plus bespoke fine-tunes, swappable through a gateway — is the same portfolio thinking that already governs cloud providers, payment processors, and telecom.

When does a frontier-hosted model earn its cost?

On capability-bound tasks where the marginal frontier capability matters: long-context reasoning, complex tool use, instruction-following under ambiguity, agentic loops. The cost premium is justified when a single frontier call producing the correct answer first try beats three open-weights calls plus a human review queue. Unit economics depend on workload, not on per-token sticker price.

When does an open-weights model beat hosted frontier?

On high-volume, lower-complexity workloads where capability is sufficient and the operational cost of self-hosting is amortized across enough volume. Open-weights on enterprise infrastructure or sovereign cloud commonly produces 4–8x lower per-token cost. They also serve sovereignty-sensitive workloads that cannot leave the enterprise perimeter regardless of cost. The crossover depends on volume, latency tolerance, and sovereignty requirements.

When does fine-tuning a model make sense?

When a narrow, high-volume use case needs domain depth — claims processing, support routing, document classification, structured extraction. Fine-tuning a smaller model on a domain corpus typically beats general frontier capability on that domain at a fraction of inference cost. The investment is real (data curation, eval, training, serving), and the return on a high-volume durable use case is durable: a model the enterprise owns and refines continuously.

What does a model gateway do?

Sits in front of every model call, routes to the appropriate model based on the use case's declared requirements, handles auth, rate limiting, cost attribution, fallback paths, and produces a unified audit trail across providers. With a gateway, swapping providers is a configuration change. Without one, every model swap is a re-architecture, which is why teams without gateways tend to stay locked into their initial vendor choice.

How often should the model mix be re-evaluated?

Quarterly, against a permanent capability eval suite that runs current and candidate models through the use cases that matter. Model selection is a recurring decision because new models ship constantly, capabilities shift, and pricing changes. The eval suite produces the data; the AI council reviews and adjusts. Vendor pitches are inputs, not outputs. Locking in a model choice without periodic re-evaluation is how enterprises end up overpaying for stale capability.

Foundation-model strategy: frontier-hosted, open-weights, or bespoke fine-tunes

Single-vendor dependency on a foundation model is a strategic risk

Frontier-hosted models earn their cost on capability-bound tasks

Open-weights models earn their place on volume and sovereignty

Bespoke fine-tunes earn their place on domain depth

A model gateway makes the mix operationally swappable

Capability evals are how you make the mix decision repeatedly

Sovereignty matters more for some workloads than others

Frequently asked

More from Field Notes

Enterprise AI without the vendor lock: a 90-day program for sovereign capability

The operating model for in-house AI: roles, governance, and release process

Eval harnesses and the governance posture your auditor will accept