On-prem LLM
Models that run on your hardware.The agent's brain runs on a machine you own, in a place you control. Customer data does not leave the building to be processed.
The model is on the same machine as the data.
An on-prem language model is a model that runs on hardware in the customer's location, processing data that never leaves the local network. The model file lives on disk. The inference runs on local compute. The data flows in and out of the model without traversing the public internet.
CORTX deployments use on-prem models for the agent's reasoning by default. Cloud-hosted models can be used for specific tasks where the latency or capability of a frontier model justifies it — but the architectural posture is local-first.
Where the work happens.
The data does not leave the box.
What happens during inference.
When the agent needs to reason about a task, it constructs the request — including the relevant MCP context — and sends it to the local model. The model runs inference on local compute. The response returns to the agent. The agent acts.
The customer's data — patient records, financial data, supplier information, partner details — is never transmitted to a remote service. It is read from the local disk, passed to the local model, and the result is written back to the local disk.
When a specific task benefits from a frontier model — typically a one-off generation task with no PII, like drafting a piece of marketing copy — the agent can route that specific call to a cloud model. These cases are rare, configurable, and logged.
Data sovereignty as default.
Cloud AI is convenient. It is also a category of dependency that small businesses do not always understand they're entering. The data leaves the building. The vendor logs requests. The model can change without notice. The pricing changes without notice. The terms of service change without notice.
On-prem inverts each of those defaults. The data does not leave. The vendor does not log. The model file is fixed unless the customer chooses to update it. The cost is predictable. The dependency is bounded.
The trade-off is real. On-prem models are smaller than the largest cloud models. For most operational reasoning — workflow execution, validation, exception handling — they are sufficient. For frontier capability, the agent can route specific tasks to cloud models, with the customer's awareness and consent.
What it actually takes.
A current-generation Mac Mini, configured once, sealed, placed in the customer's office. That is the hardware specification for a typical small-business deployment.
The Mini runs the agent, the local model, the workflow engine, and the encrypted database. It is connected to the customer's network. Remote access for authorized staff goes through Tailscale. There is no public endpoint.
The customer owns the machine. When the deployment ends, the machine stays.
The substrate.
On-prem LLM is the substrate everything else runs on. The agent runs on it. The MCPs are read against it. The tool calls invoke it. It is the local computational foundation of the entire deployment.
Where on-prem posture matters most.
Every CORTX deployment uses on-prem LLM by default. The posture is densest in:
Patient data cannot leave the building.
Read Professional Services Accounting, legal, consulting.Client confidentiality is the firm's product.
Read Distribution Importers & wholesalers.Supplier and partner data is the firm's competitive position.
Read Logistics Freight, warehousing, fulfillment.Operational data informs strategic decisions.
Read