Model Registry
Not every model qualifies for sovereign AI.
Sovereignty is not a model feature — it is a deployment decision. The same model can be fully sovereign when self-hosted or conditionally sovereign when managed by a cloud provider. This registry maps the distinction clearly.
Sovereignty Tiers
Three tiers. One question: who controls the inference?
Sovereignty is determined by where inference happens, who owns the runtime, and whether your data crosses a boundary you do not control.
Fully Sovereign
Model runs entirely on infrastructure you control. No data leaves your network boundary. Open weights, self-hosted, no licensing dependency on a third-party API.
- Open-weight model you can download and self-host
- Runs on your own hardware, VMs, or private cloud
- Zero external API calls during inference
- You own the runtime, the weights, and the output
Conditionally Sovereign
Model is hosted by a cloud provider within a dedicated tenant. Data does not leave your cloud account, but the model weights and runtime are controlled by the provider.
- Hosted within your Azure, AWS, or GCP tenant
- Data stays in your cloud region and account
- Provider controls model weights and updates
- Dependency on provider availability and pricing
Not Sovereign
Inference happens on the provider's shared infrastructure. Your prompts, documents, and responses leave your boundary and are processed externally.
- API calls route to provider's shared servers
- No control over where data is processed
- Subject to provider's data retention policies
- Not suitable for regulated or sensitive workloads
Fully Sovereign Models
Open-weight models you can self-host and fully control
These models can be downloaded, self-hosted on your own infrastructure, and run with zero external API calls. Your data never leaves your network boundary.
| Model | Provider | Deployment | Context | Enterprise fit | Use cases |
|---|---|---|---|---|---|
Llama 3.1 8B / 70B / 405B Most mature open-weight family for enterprise private deployment. All sizes available for self-hosting. | Meta (open weights) | Ollama · vLLM · OpenShift AI | 128K | High | RAGAgentsSummarization |
Mistral 7B / Mixtral 8×7B Highly efficient. Mixtral MoE architecture gives near-70B quality at lower compute cost. | Mistral AI (open weights) | Ollama · vLLM · Docker | 32K | High | RAGClassificationCode |
Mistral Nemo 12B Best-in-class at 12B scale. Long context makes it ideal for document intelligence pipelines. | Mistral AI (open weights) | Ollama · vLLM | 128K | High | DocumentsRAGLong context |
Phi-3 / Phi-3.5 Mini Exceptional quality-to-size ratio. Suitable for edge, laptop, or resource-constrained private deployments. | Microsoft (open weights) | Ollama · Edge · Docker | 128K | Medium | EdgeLightweightClassification |
Gemma 2 9B / 27B Strong reasoning capability. 27B model approaches GPT-3.5 quality in private deployment benchmarks. | Google (open weights) | Ollama · vLLM | 8K | Medium | RAGReasoningSummarization |
Qwen 2.5 7B / 72B Outstanding multilingual and code capability. 72B is competitive with GPT-4o on many enterprise tasks. | Alibaba (open weights) | Ollama · vLLM | 128K | High | MultilingualCodeRAG |
DeepSeek R1 / V3 Exceptional reasoning model. R1 rivals o1-class performance when self-hosted. Requires GPU infrastructure. | DeepSeek (open weights) | vLLM · OpenShift AI | 64K | High | ReasoningAnalysisCode |
Code Llama / StarCoder2 Purpose-built for code generation, completion, and review in private developer tooling. | Meta / BigCode (open weights) | Ollama · vLLM | 16K | Medium | CodeDeveloperCompletion |
Conditionally Sovereign Models
Cloud-managed models within your tenant boundary
These models run inside your Azure, AWS, or GCP account. Your data stays within your cloud tenant, but model weights and runtime are controlled by the provider.
Conditional sovereignty depends on your cloud agreement, region selection, and data processing terms. Always verify provider data residency commitments before using these models with sensitive data.
| Model | Provider / Platform | Deployment | Context | Enterprise fit | Use cases |
|---|---|---|---|---|---|
GPT-4o / GPT-4 Turbo Data stays within your Azure tenant and region. No training on your data by default. HIPAA and SOC2 eligible. | Microsoft via Azure OpenAI | Azure OpenAI Service | 128K | High | RAGAgentsEnterprise |
GPT-3.5 Turbo Lower cost option for high-volume RAG pipelines within Azure boundary. Good for summarization at scale. | Microsoft via Azure OpenAI | Azure OpenAI Service | 16K | High | RAGSummarizationHigh volume |
Claude 3.x (Haiku / Sonnet / Opus) Longest context window available in a managed tier. Data stays in your AWS account. Strong for document Q&A. | Anthropic via AWS Bedrock | AWS Bedrock | 200K | High | DocumentsLong contextRAG |
Llama 3 (Managed) Open weights, managed runtime. A middle path when self-hosting GPU infrastructure is not yet possible. | Meta via AWS Bedrock / Azure | AWS Bedrock · Azure AI | 128K | High | RAGManagedTransition |
Mistral Large / Small Managed Mistral inside Azure boundary. Useful when GPU ops team is not available but data must stay in Azure. | Mistral via Azure AI | Azure AI Foundry | 32K | Medium | RAGManagedAzure |
Not Sovereign — Public APIs
Models that cross your data boundary
These models process your data on shared provider infrastructure. They are not suitable for sovereign AI workloads involving sensitive, regulated, or confidential data.
Public API models are listed here for comparison, not recommendation. For many use cases without sensitive data, public APIs are practical. For sovereign AI, they are excluded by definition.
| Model | Provider | Deployment | Context | Sovereign fit | Sovereign alternative |
|---|---|---|---|---|---|
GPT-4o / GPT-4 Prompts and documents are processed on OpenAI shared infrastructure. Not suitable for private or regulated data. | OpenAI (direct API) | OpenAI API | 128K | Not Sovereign | Azure OpenAI Service |
Claude 3.x Processed on Anthropic's infrastructure. Data leaves your boundary. Use AWS Bedrock for a conditional alternative. | Anthropic (direct API) | Anthropic API | 200K | Not Sovereign | AWS Bedrock |
Gemini 1.5 Pro / Flash Inference on Google's shared servers. Use Vertex AI within your GCP project for a conditionally sovereign path. | Google (direct API) | Google AI Studio / API | 1M | Not Sovereign | Vertex AI (GCP) |
Deployment Runtimes
Where and how sovereign models run
The runtime determines the sovereignty level, performance envelope, and operational complexity. Choose based on your infrastructure maturity and compliance requirements.
Ollama
Local and single-server model runtime. Ideal for development, proof-of-concept, and low-volume private deployments.
- Best for
- Development · Laptops · Small teams
- Models
- Llama 3, Mistral, Phi-3, Gemma 2, Qwen 2.5
- Note
- One-command model pull and serve. No GPU required for smaller models. Not designed for production scale.
vLLM
High-throughput inference server with continuous batching. Purpose-built for GPU-accelerated production workloads.
- Best for
- Production · GPU servers · High concurrency
- Models
- Llama 3, Mistral, Qwen 2.5, DeepSeek, Gemma 2
- Note
- OpenAI-compatible API surface. Supports tensor parallelism for large models. Preferred for production private RAG.
OpenShift AI
Red Hat's ML platform for enterprise Kubernetes deployments. Integrates model serving with observability and governance.
- Best for
- Enterprise · Kubernetes · Air-gapped
- Models
- Llama 3, Mistral, DeepSeek R1
- Note
- Preferred for regulated industries, air-gapped environments, and enterprises already using OpenShift.
Azure OpenAI Service
Microsoft-managed OpenAI models within your Azure subscription and region. HIPAA, SOC2, and EU data boundary eligible.
- Best for
- Azure-aligned orgs · Compliance · Fast start
- Models
- GPT-4o, GPT-3.5 Turbo
- Note
- No self-managed GPU infrastructure required. Data stays in your Azure tenant. Provider controls model updates.
AWS Bedrock
Managed foundation model API within your AWS account. Supports Claude, Llama, Mistral, and others with VPC integration.
- Best for
- AWS-aligned orgs · Multi-model · Compliance
- Models
- Claude 3, Llama 3, Mistral Large
- Note
- Data stays in your AWS account and region. No cross-account data sharing. Supports PrivateLink for VPC isolation.
Decision Guide
Which model path fits your situation?
Sovereignty requirements, infrastructure maturity, and compliance obligations determine the right deployment path — not model quality rankings.
Situation
You handle regulated data (HIPAA, GDPR, financial)
Recommendation
Fully Sovereign or Conditionally Sovereign only
Self-host on vLLM / OpenShift AI, or use Azure OpenAI / AWS Bedrock within your tenant
Situation
You have a GPU-equipped private server or Kubernetes cluster
Recommendation
Fully Sovereign — self-hosted
Llama 3.1 70B or Mistral Nemo on vLLM. OpenShift AI if enterprise Kubernetes is already in use.
Situation
You want private AI but don't have GPU infrastructure yet
Recommendation
Conditionally Sovereign — managed cloud
Azure OpenAI (if Azure-aligned) or AWS Bedrock (if AWS-aligned). Plan migration to self-hosted as GPU capacity grows.
Situation
You need to run AI on laptops or edge devices
Recommendation
Fully Sovereign — local runtime
Phi-3 Mini or Llama 3.1 8B on Ollama. Works without internet, ideal for field teams or disconnected environments.
Situation
You need maximum model quality right now
Recommendation
Conditionally Sovereign
GPT-4o via Azure OpenAI or Claude Sonnet via AWS Bedrock. Not fully sovereign but data stays in your cloud tenant.
Situation
You are building a prototype or internal demo
Recommendation
Start with Ollama locally, plan for vLLM in production
Llama 3.1 8B on Ollama for speed. Design the application layer to swap the model endpoint without rewriting the app.
Registry Principles
How SovAIHub evaluates models
This registry does not rank models on benchmark performance. It evaluates them on sovereignty, deployment control, and enterprise operational fit.
Boundary control
Where does inference happen? Who owns the compute? Can you prevent data from leaving your network? These questions determine sovereignty, not model size or capability.
Deployment operability
A sovereign model you cannot realistically operate is not a useful recommendation. Models are rated on whether enterprise teams can deploy, monitor, and maintain them.
Weight availability
Fully sovereign models require open or licensed weights you can download. A model that requires an external API for every inference is not self-hosted, regardless of marketing language.
Enterprise context fit
Context window, throughput, and accuracy on enterprise tasks (document Q&A, summarization, classification) matter more than general benchmark scores for private RAG workloads.
Regulatory alignment
Models deployed in regulated industries must support data residency, audit logging, and access control. The registry notes which deployment methods support these requirements.
Runtime independence
SovAIHub favors models that can be served across multiple runtimes (Ollama, vLLM, OpenShift AI) without vendor lock-in. Application code should route to a model endpoint, not a provider.
Next Step
Need help selecting and deploying the right model stack?
Model selection depends on your data classification, infrastructure, compliance requirements, and team capability. SovAIHub can help you map the right path.
Select the right model. Deploy it on infrastructure you control.
Share your use case, data sensitivity, infrastructure environment, and compliance requirements. We will help you identify a practical sovereign model deployment path.