Back to all articles
INSIGHTS/13 min read/

On-Premise Legal AI: When the Deployment Model Decides the Adoption

A senior analysis of when on-premise legal AI is the right answer, what a complete on-premise deployment actually includes, and how to structure the evaluation.

For most law firms outside the top global tier, the decision that determines whether legal AI gets adopted at all is not the choice of vendor or the configuration of agents — it is the deployment model. Cloud SaaS is fast to set up but creates barriers that, in regulated practice areas, are absolute rather than negotiable. This analysis examines when on-premise deployment is the right answer, what it actually involves, and how to structure the evaluation.

The reason on-premise persists in legal practice

There has been a long-running narrative that on-premise software is a legacy concern, gradually disappearing as cloud infrastructure matures. In most software categories this is correct. In legal AI, it is not. The reason is that law firms do not own most of the data they handle. The data belongs to clients, and a meaningful proportion of those clients — banks, telecommunications operators, government agencies, defense contractors, healthcare providers — operate under regulatory regimes that impose specific constraints on where their data can reside and who can access it.

When a firm chooses a cloud-only legal AI vendor, it commits in advance to losing every client that imposes such constraints. For a global firm whose practice mix can absorb that loss, this may be acceptable. For a firm whose growth strategy depends on serving large institutional clients, it is not. The question is not whether on-premise adds value in the abstract — it is whether the firm can structurally afford to take cloud-only as a final answer.

A second factor sits behind the regulatory one. Even when client contracts permit cloud SaaS in principle, the firm's own chief information security officer typically has veto rights on vendor selection. In our experience scoping deployments, CISO objections account for more failed adoptions than feature gaps. On-premise removes the most common objections at once: data stays on infrastructure the firm controls, the firm holds the encryption keys, and the audit log is in the firm's own log infrastructure.

When on-premise is the right answer

The clearest signal that on-premise is the right deployment model is that the firm has already lost work because of cloud constraints — an RFP that demanded on-premise deployment, a client meeting where the cloud vendor was vetoed, a procurement process that stalled at security review. If these events have happened, future deals will continue to require on-premise capability regardless of how much progress is made on the cloud side.

A second signal is client mix. Firms serving banks and telecommunications operators under Vietnamese regulation, or operating in jurisdictions with strict data localization rules — Vietnam, China, the Kingdom of Saudi Arabia, Russia, and several others — will encounter on-premise requirements consistently rather than occasionally. A third signal is size and infrastructure capability: firms above fifty lawyers typically have the IT staff to manage on-premise deployment, and the cost amortization improves as the seat count grows.

The corresponding negative signals are equally clear. A boutique firm under twenty lawyers with no clients in regulated sectors will struggle to justify the operational complexity. A firm without dedicated IT staff should not attempt on-premise without engaging external implementation support. A firm whose primary need is rapid time-to-value will find that the eight-to-twelve-week deployment timeline of on-premise is not worth the trade-off if cloud is acceptable to clients.

What on-premise legal AI actually includes

A common source of confusion in vendor evaluations is the difference between true on-premise deployment and various hybrid configurations that vendors describe as on-premise. A complete on-premise legal AI deployment includes seven components, all running on infrastructure that the firm controls.

The application layer — the agent workspace, the matter vault, the reviewer interface — must run locally. The inference engine, meaning the large language models that produce the actual responses, must also run locally on GPU hardware in the firm's environment. Many cloud vendors will offer to host the application layer locally but still route inference through their cloud; this is hybrid, not on-premise, and it does not solve the data residency problem because matter content still leaves the firm's network at the inference step.

The OCR pipeline that processes incoming documents must run locally, as must the vector database that stores embeddings used for retrieval. Authentication must integrate with the firm's identity provider over secure protocols, with no external dependency at runtime. Audit logging must write to an immutable log store under the firm's control, exportable to the firm's central log infrastructure. Finally, encryption keys must be controllable by the firm — what the industry calls bring-your-own-key — so that the vendor at no point has access to plaintext matter content.

A deployment missing any of these seven components is a hybrid deployment and should be evaluated accordingly. There are legitimate hybrid models — particularly customer-controlled tenants in major cloud providers — but they should not be sold or evaluated as if they were equivalent to full on-premise.

Hardware and operational considerations

For a typical mid-size firm with fifty to one hundred lawyers and processing volume around twenty-five thousand pages per month, the hardware profile is well-understood. Inference requires two GPU servers using NVIDIA A100 80GB cards or equivalent, supplemented by an application server with thirty-two cores and one hundred twenty-eight gigabytes of memory, and a vector database server with sixteen cores and NVMe storage. Matter storage requires approximately ten terabytes of encrypted capacity for an active firm, with the usual backup and disaster recovery requirements adding roughly the same amount in secondary infrastructure.

Capital expenditure on this profile ranges from two hundred to four hundred thousand US dollars depending on redundancy choices and procurement terms. Operating costs add power, cooling, rack space, and the IT staff time required to maintain the system. For firms that prefer to avoid capital expenditure entirely, a customer-controlled cloud tenant — meaning the same software stack deployed in the firm's own AWS, Azure, or Google Cloud account — preserves most of the data residency benefits while shifting infrastructure to operating expenditure.

Air-gapped deployment

For a small number of matter types — defense, intelligence-related work, government investigations, certain sovereign engagements — even on-premise with internet connectivity is insufficient. These matters require air-gapped deployment, meaning the legal AI environment has no network path to the public internet at any point.

Air-gapped deployment is technically feasible but carries trade-offs that should be explicitly discussed during scoping. Model updates must be applied via offline media, which means improvements roll out on a slower cadence than cloud versions. Real-time external lookups — for instance to verify a regulatory citation against a public source — are not possible. Operational support during incidents requires the vendor to work through the firm's own access procedures rather than connecting directly.

For most firms, the right structure is to maintain a separate air-gapped deployment for the small subset of matters requiring it, rather than air-gapping the firm's entire AI environment. This isolates the operational complexity to where it is genuinely needed.

Deployment timeline

Realistic timelines for the three deployment models differ substantially. Managed cloud deployment can reach production use within two weeks: vendor security review takes about a week, tenant provisioning and SSO configuration take a few days each, and pilot user onboarding completes within the second week. Customer-controlled cloud tenant deployment runs four to six weeks, with the additional time absorbed by joint security review with the firm's IT team and the more substantial integration work. Full on-premise deployment requires eight to twelve weeks, with procurement and physical installation of GPU hardware being the dominant constraint when hardware is not already on hand.

Pricing implications

On-premise deployment changes the pricing structure in a way that buyers should anticipate. Cloud SaaS is typically priced purely per seat because the vendor's cost-to-serve scales linearly with usage. On-premise involves significant fixed cost on the vendor side — engineering hours for deployment, custom security documentation, infrastructure validation, dedicated support — that does not scale with seat count. As a result, on-premise legal AI is consistently priced as a base platform fee plus per-seat. The base fee captures the fixed component and is not negotiable away.

For buyers, this means on-premise is rarely competitive with cloud SaaS on cost-per-seat for small deployments. Where it becomes economically attractive is at scale — fifty or more seats — and where it is required, the comparison to cloud pricing is irrelevant because cloud is not an option in the first place.

A decision framework

A simple screen distinguishes firms that should evaluate on-premise from those that should not. Five questions are sufficient. First, do any current clients explicitly prohibit cloud SaaS for matter files? Second, is the firm planning to pursue clients in the next year that have data residency requirements? Third, has the firm's CISO blocked any AI vendor in the past twelve months? Fourth, does the firm handle Vietnamese personal data subject to NĐ-13/2023? Fifth, would a single-tenant deployment open RFPs the firm currently cannot bid on?

Two or more affirmative answers indicate that on-premise warrants a serious evaluation. Fewer than two and the operational complexity of on-premise is unlikely to be justified by the deal-flow it enables.

Book a deployment scoping call →

Take the next step

See Magic Circle on your own documents.

A sixty-minute working session with our team. Bring one matter file — Vietnamese, English, or bilingual — and we will run a side-by-side review with citations.

Book a demo

Continue reading

Related analysis