Sovereign AI and Data Residency: Building for Borders

Contributor
Sep 4, 2025
6 min read

The internet was designed to be borderless. Data flows wherever the network takes it, routed through whichever path is fastest, stored wherever capacity is cheapest. For decades, this was a feature. Now it is a compliance problem.

Governments around the world are asserting sovereignty over data — requiring that certain types of information be stored, processed, and sometimes trained on within their jurisdictions. For AI systems, which depend on large datasets and centralized compute, these requirements create architectural challenges that cannot be solved with a simple configuration change.

What Data Sovereignty Means in Practice

Data sovereignty is the principle that data is subject to the laws of the country where it is collected or where it resides. In practice, this translates into three types of requirements:

Data residency — Data must be stored within a specific geographic boundary. GDPR does not strictly require EU data to stay in the EU, but it imposes conditions on transfers that make residency the simplest compliance path.

Data localization — Stricter than residency, data localization requires that data not leave the jurisdiction at all. Russia's data localization law requires personal data of Russian citizens to be stored on servers physically located in Russia. China's Cybersecurity Law and Data Security Law impose similar requirements.

Processing restrictions — Some regulations restrict where data can be processed, not just where it is stored. If your data is stored in the EU but processed by a US-based service, that processing may constitute a data transfer that requires legal justification.

The Regulatory Landscape

GDPR and European Data Protection

GDPR is the most influential data protection framework globally. It does not mandate that EU personal data stay in the EU, but it requires that any transfer to a country outside the EU have an adequate legal basis.

The invalidation of Privacy Shield (Schrems II, 2020) and the subsequent adoption of the EU-US Data Privacy Framework (2023) have created a complex landscape. Standard Contractual Clauses (SCCs) remain the most common mechanism for data transfers, but they require Transfer Impact Assessments that evaluate whether the destination country provides adequate protection.

For AI systems, GDPR creates several specific challenges. Training data that includes EU personal data may need to stay in the EU or have a valid transfer mechanism. Inference requests that contain personal data are subject to the same transfer rules. And the right to erasure creates technical challenges when personal data is encoded in model weights.

Emerging Regulations

India's Digital Personal Data Protection Act (2023) includes provisions for data localization that the government can activate for specific categories of data. The details are still being defined through rules and notifications.

Brazil's LGPD mirrors GDPR in many respects but has its own enforcement body and interpretation. Cross-border transfers require adequacy decisions or contractual safeguards.

Middle East — Saudi Arabia, UAE, and Qatar have all enacted data protection laws with varying data localization requirements. The Saudi PDPL requires certain categories of data to remain in the Kingdom.

Southeast Asia — Vietnam, Indonesia, and Thailand have data localization provisions at various stages of implementation. The regulatory landscape is evolving rapidly.

The trend is clear: more countries are asserting data sovereignty, and the requirements are becoming more specific and more strictly enforced.

Architectural Patterns for Data Residency

Regional Deployments

The most straightforward approach is deploying your application stack in each required region. Data generated in the EU stays in the EU deployment. Data generated in Asia stays in the Asian deployment. Each deployment is a complete, independent instance.

This works but it is expensive. You are running parallel infrastructure, managing multiple deployments, and dealing with the operational complexity of keeping them all in sync. For large organizations with significant revenue in each region, this is manageable. For startups, it can be prohibitive.

Data Partitioning

A more nuanced approach is partitioning data by residency requirement while sharing infrastructure where possible. Application logic runs in a central location, but data is stored in region-specific databases. Queries are routed to the appropriate data store based on the data's residency requirements.

This requires careful design. Your data access layer needs to be residency-aware. Your caching strategy needs to ensure that cached data respects residency boundaries. Your backup and disaster recovery processes need to maintain residency guarantees.

Edge Processing with Central Orchestration

For AI inference, edge processing can help. Run inference at the edge — in the region where the data lives — while managing models and orchestration centrally. The data never leaves the region. Only the model (which typically does not contain personal data) is distributed.

This pattern works well for inference but is challenging for training. Training typically requires aggregating large datasets, which conflicts with data residency requirements. Federated learning — where models are trained on distributed data without centralizing it — is a potential solution but remains immature for most production use cases.

Sovereign Cloud Options

Major cloud providers have responded to data sovereignty demands with sovereign cloud offerings.

AWS offers dedicated regions and Local Zones in many countries. AWS GovCloud provides isolated infrastructure for US government workloads. European sovereign cloud partnerships (like the one with Deutsche Telekom) provide EU-operated cloud infrastructure.

Microsoft Azure offers sovereign clouds for government (Azure Government) and specific countries. Azure confidential computing provides encryption of data in use, addressing some sovereignty concerns even when infrastructure is shared.

Google Cloud offers region-specific infrastructure and has partnered with local providers (like T-Systems in Germany) for sovereign cloud offerings.

European alternatives like OVHcloud, Scaleway, and Hetzner provide EU-headquartered cloud infrastructure that is not subject to US jurisdiction (a concern raised by Schrems II regarding US cloud providers' obligations under FISA and the CLOUD Act).

The sovereign cloud market is growing rapidly, and the options are improving. But sovereign clouds are generally more expensive, have fewer services, and are less mature than the hyperscaler alternatives.

AI-Specific Considerations

Where Does Training Happen?

If you fine-tune a model on data that is subject to residency requirements, the training compute must be in the required region (or you need a valid legal basis for the transfer). This limits your hardware options — not every region has the GPU capacity you need.

Cloud providers are expanding GPU availability across regions, but capacity is unevenly distributed. If your residency requirements restrict you to a region with limited GPU availability, you may face queuing delays or need to adjust your training approach.

Where Does Inference Happen?

For real-time inference on data subject to residency requirements, the inference endpoint must be in the required region. This affects latency, cost, and model availability. Running inference in multiple regions means deploying and managing models in each region.

Model Weights and Data Sovereignty

An interesting edge case: do model weights constitute personal data if the model was trained on personal data? The answer is unclear and jurisdiction-dependent. If a model memorizes personal information (which large language models can do), the weights could arguably contain personal data. This is an unsettled area of law that could have significant implications for model distribution.

Using Third-Party AI APIs

When you send data to a third-party AI API (OpenAI, Anthropic, Google), you are transferring data to wherever that API processes requests. For data subject to residency requirements, you need to verify where the API provider processes and stores data. Some providers offer regional endpoints. Others process everything centrally.

Read the data processing agreements carefully. "We don't use your data for training" is different from "your data stays in your region."

Cost and Complexity Tradeoffs

Data sovereignty compliance is not free. Regional deployments multiply infrastructure costs. Sovereign cloud options are more expensive than standard offerings. Engineering complexity increases with each additional region.

For most organizations, the approach should be pragmatic. Identify which data is subject to residency requirements. Implement residency controls for that data specifically. Use standard global infrastructure for data that is not regulated.

The organizations that handle this well are the ones that classify their data early — before it is spread across systems — and build residency into their data architecture from the start rather than retrofitting it later.

Data sovereignty is not going away. The regulatory trend is toward more localization, not less. Building for borders now is an investment in future compliance. Building without borders is a debt that compounds with every new regulation.

ShiftQuality