Table of Contents

Introduction

Modern geospatial platforms demand flexible infrastructure to ingest, analyse and serve large amounts of data. When designing the platform architecture and services involved, an important factor that we have to take into account is digital sovereigntythe ability to run software on our own terms and avoid cloud vendor lock‑in. We do this by relying on open‑source technologies and self‑hosted deployments.

This article describes a reference architecture for an Amazon Web Services (AWS) hosted platform built entirely from open‑source components. Instead of going for a full-blown Kubernetes control plane, we opted for using Nomad in order to simplify orchestration, and also to have the possibility to keep some services running on bare‑metal virtual machines, as opposed to containers.

The resulting platform provides multi‑region resilience, predictable costs and interoperability for geospatial data processing, analysis, hosting and management.

Open Source and Digital Sovereignty

All major components of the platform are free and open source. This provides autonomy and freedom to run software on infrastructure under your control.

As a company based in Norway, we support the EU Commission's ambitions to make open‑source software a central pillar of Europe’s strategy for digital autonomy. In the recent EuroStack initiative, the Commission has published an open letter which argues that the continent must build strategic autonomy across all layers of the stack – from chips and AI frameworks to cloud platforms.

Our platform uses tools and services like PlonePostgreSQL/PostGISGeoServerPrefectFastAPI, and Grafana which are released under permissive licences. The orchestrator Nomad works together with Consul and Vault to provide service discovery, configuration and secrets management. Storage is handled by Amazon S3 (or self‑hosted alternatives like MinIO) and EBS, while compute nodes run on EC2 instances.

The platform also stays interoperable with open standards (SQL, OGC WMS/WFS, REST) and allows you to migrate workloads between clouds or to on‑premises machines.

Core architecture design principles:

  • Open source and digital sovereignty – open-source components enable independence and control over data and services. The platform can be deployed in EU regions to comply with data sovereignty requirements.
  • Multi‑region and high availability – critical services are replicated across at least three availability zones (AZs). Secondary stacks can be deployed to another AWS region for disaster recovery. Within a region, auto‑scaling groups distribute workloads across AZs.
  • Separation of concerns – the architecture uses distinct layers for compute, data and observability. Each layer can scale independently and be secured with appropriate policies.
  • Simplified ops – Nomad is used as the orchestrator because it can schedule both containerised workloads and traditional services. Unlike Kubernetes, Nomad consists of a single binary and has minimal dependencies, it's simpler to run and has the ability to run non‑containerised jobs.

Infrastructure as Code and Continuous Deployment

Provisioning and updating the platform is automated using Terraform and Github Actions. A GA workflow runsterraform plan and apply whenever changes are merged. State is stored in an S3 bucket with locking to prevent concurrent writes.

The diagram below illustrates how Terraform reads the infrastructure code, updates the remote state and applies changes to create a VPC and EC2 instances. The.tfvars file supplies region‑specific variables. The pipeline can be extended to deploy application code into Nomad via nomad job run.

Figure 1 – Diagram of Terraform deployment flow

Infrastructure Layers

The platform is organised in three-layer architecture which are specified below, together with containerisation status. 

Layer

Key Services

Deployment Mode

Orchestration

Compute Layer

API Gateway, Web UI, APIs, Plone CMS

Docker Containers

Nomad (Container Orchestration)

Data Layer

PostgreSQL, PostGIS, ERDDAP, GeoServer

Bare Metal (EC2 VMs) or Docker Containers

Nomad (raw_exec & system jobs or containers)

Observability Layer

Prometheus, Grafana, Loki, Alertmanager

Docker Containers

Nomad (Container Orchestration)

Compute Layer (Application & Data Processing)

The compute layer runs both long‑lived services and batch jobs. EC2 instances are grouped into auto‑scaling groups across multiple AZs. Edge proxies (Traefik) listen on public subnets and route requests to services running on Nomad clients in private subnets.

Service discovery is handled by Consul, while secrets are managed by Vault. Nomad schedules applications such as Plone (UI/CMS), FastAPI (Backend services), Prefect (ETL) flows and the Traefik ingress gateway. Non‑containerised workloads can be run as raw executables.

The diagram below shows the Nomad cluster architecture with servers and clients spread across three AZs. A failure in one AZ does not disrupt operations because Nomad, Consul and Vault elect new leaders and reschedule workloads on healthy nodes.

Figure 2 – Diagram of AWS multi availability zone scaling

GeoServer Deployment

The platform is oriented towards geospatial data services and applications and so it requires some specialised services.

GeoServer serves as the primary tool for rendering and delivering geospatial data layers. It can interact directly with the PostgreSQL database (with PostGIS extensions) to pull geospatial data, which can then be visualised on the frontend using Leaflet or OpenLayers.

It provides INSPIRE-compliant geospatial data visualisation via OGC Web Services (WMS, WFS, WMTS). It supports the foundation for the map viewer implementation, fully compatible with platforms such as EMODnet – European Marine Observation and Data Network.

Data Stores
GeoServer will be configured to connect to the PostgreSQL/PostGIS database as a data store. This connection enables GeoServer to access geospatial data stored in the database, such as campaign locations, sampling areas, and other relevant spatial features.

Layer Configuration
Each dataset within the database will be exposed as a layer in GeoServer. These layers will be configured with appropriate styles (SLD - Styled Layer Descriptor) to represent different types of data (e.g., points for campaign locations, polygons for sampling areas). Layers can be cached using GeoWebCache for faster rendering.

Deployment

Component

Implementation

Hosting Environment

Containerised service, orchestrated by Nomad.

Storage Backend

Uses PostGIS-enabled PostgreSQL for geospatial data

Security

OAuth2-based authentication via Keycloak

Load Balancing

Handled by Traefik

Cache Optimization

TileCache (GeoWebCache) optimises WMS responses

Availability

Multiple GeoServer instances support horizontal scaling

Observability

Logs and metrics are centralised in the Observability service (Loki + Grafana + Prometheus)

Authentication & Authorisation

Identity and access control are managed by Keycloak, an open‑source identity and access management system. Keycloak provides single sign‑on and single sign‑out so that users authenticate once and can access multiple applications without re‑login, and logging out from one application logs them out everywhere.

💡
This approach centralises authentication, simplifies audit logging and aligns with our digital sovereignty mission.

Responsibilities:

  • Enables SSO across all platform components (Plone CMS, API services, ETL pipelines, dashboards)
  • Supports OAuth 2, OpenID Connect (OIDC), and SAML 2.0 authentication flows
  • Provides Role-Based Access Control (RBAC) for API endpoints
  • Compliance with EU data sovereignty and Zero Trust security models.

The zero‑trust model is enforced via Traefik and AWS Secrets Manager, ensuring credentials and secrets remain out of application code. Keycloak also supports identity brokering and can federate with external LDAP or social identity providers when required.

Deployment Model:

Component

Implementation

Hosting Environment

Containerised service, orchestrated by Nomad.

Storage Backend

Uses PostgreSQL for persistent identity storage

Networking

Runs behind Traefik for TLS termination and API routing

Load Balancing

Managed by Traefik

Security

Multi-Factor Authentication (MFA) and Role-Based Access Control (RBAC)

Scalability Considerations:

  • Horizontal scaling: additional KeyCloak instances auto-provisioned via Nomad
  • PostgreSQL replication: high-availability and read performance optimisation
  • Session clustering: KeyCloak HA mode ensures session persistence across multiple instances
  • Token Caching: optimised caching reduces API authentication latency.

Example Authentication Workflow with EU Login

KeyCloak integrates with EU Login using OAuth 2.0, a widely adopted authorisation framework.

When a user attempts to log in, they are redirected to the EU Login portal, where they authenticate using their credentials. Upon successful authentication, EU Login issues an OAuth 2.0 token, which is passed back to KeyCloak. KeyCloak then uses this token to authenticate the user and establish a session.

Figure 3 – Diagram of the login flow with EU Login

The authentication process follows the OAuth 2.0 authorisation flow, with KeyCloak managing the access token issues by the EU Login service, and issuing a session token to the web application.

Data Layer (Storage & Processing)

The Data Layer contains databases, geospatial services, and external dataset integration. The platform needs to serve structured, unstructured and semi‑structured datasets.

The Data Layer contains databases, geospatial services, and external dataset integration.

Service

Instance Type

Autoscaling (Min - Max)

VPC Setup

PostgreSQL + PostGIS

r5.large

2 - 4 (HA Cluster)

Private VPC

GeoServer

t3.medium

2 - 4

Private VPC

ERDDAP
Environmental Data Server

m5.medium

1 - 2

Private VPC

Amazon S3

Managed

Auto-Scaling

Private IAM-restricted

  • Relational data. A primary PostgreSQL/PostGIS instance stores user accounts, metadata and application data. One or more read replicas serve API queries. Streaming replication ensures minimal data loss during failover.
  • Object storage. Amazon S3 (or MinIO on self‑hosted servers) holds large files such as raw sonar data, Cloud Optimised GeoTIFFs, Zarr archives and Parquet tables. We organise data into versioned buckets and use lifecycle policies to move older data to lower‑cost storage classes.
  • File systems. EBS volumes store the Postgres database, logs and caches. Because EBS volumes are bound to an AZ, we schedule backups to S3 and replicate snapshots to other regions for durability.
  • Data services. Geospatial datasets are published via GeoServer (OGC WMS/WFS/WCS) and ERDDAP. These services run in Nomad tasks on their own EC2 instances. APIs implemented in FastAPI provide programmatic access.
  • Data processing and pipelines. Data ingestion, conversion and analytics are orchestrated with Prefect 3. Python libraries such as echopypeZarrxarray and Dask process sonar data into analysis‑ready formats. Prefect tasks run in Nomad with concurrency control and retries.

Observability Layer (Monitoring & Logging)

The observability stack records metrics, logs and alerts. It is deployed in a separate security group and accessible only through VPN or IAM‑authenticated users.

The core components are:

Service

Instance Type

Autoscaling (Min - Max)

VPC Setup

Prometheus
Metrics Collection

t3.medium

1 - 3

Private VPC

Grafana
Visualisation Dashboard

t3.medium

1 - 2

Private VPC

Loki
Log Aggregation

t3.medium

1 - 3

Private VPC

Alertmanager 

Failure Alerts

t3.small

1 - 2

Private VPC

Security Measures

  • Prometheus collects system and application metrics. Node exporters on each EC2 instance publish CPU, memory and disk usage. Application metrics are exposed via HTTP endpoints. Prometheus stores time‑series data locally and replicates it to S3 for long‑term retention.
  • Grafana provides real‑time dashboards. It queries Prometheus and Loki to display health metrics and logs. Grafana dashboards include CPU and memory utilisation per service, request latency, database throughput and queue depths. An alerting framework notifies operators via email, Slack or PagerDuty.
  • Loki aggregates logs from Nomad tasks and systemd journal via promtail. Logs are indexed by labels such as jobnamespace and instance but not full‑text, keeping storage efficient. Queries allow you to filter logs by service, time and severity.
  • Alertmanager deduplicates and groups alerts from Prometheus and pushes them to notification channels. Alerts are configured to trigger when thresholds are exceeded (e.g., CPU > 80 %, database replication lag > 30 seconds).

Beyond metrics and logs, the stack can be extended with OpenTelemetry to collect traces from microservices and measure end‑to‑end latency. This facilitates root‑cause analysis when multiple services are involved in a request.

Security and Certificate Management

The platform follows zero‑trust principles. All services run within private subnets and communicate via encrypted channels. Network traffic is restricted by security groups and NACLs.

TLS certificates are issued by the controlling authority and stored securely in AWS Secrets Manager. Traefik terminates TLS connections and routes requests to backend services.

The following table summarises SSL/TLS certificate management:

Function

Implementation

Certificate Issuance

Certificates issued by the relevant authority.

Certificate Storage

Stored securely in AWS Secrets Manager via Terraform.

SSL Termination

Traefik manages certificate-based HTTPS termination.

Renewal Strategy

Manual or automated renewal; updated in Secrets Manager and reloaded by Traefik.

High Availability & Failover Strategy

Nomad provides a lightweight, easy-to-manage alternative to Kubernetes while supporting both containerised and non-containerised workloads.

In case of an AZ outage, Traefik automatically redirects traffic to services in the remaining AZs. PostgreSQL uses streaming replication to maintain a warm standby in another AZ.

A rolling update strategy ensures zero downtime when deploying new versions: Nomad starts new instances, Traefik routes traffic to the healthy pods and old instances are gracefully terminated after passing health checks.

The following diagram illustrates the rolling update sequence.

Figure 4 – Flow diagram of rolling update strategy

Multi‑region Resilience and Disaster Recovery

For mission‑critical applications, a single region may not be sufficient. AWS regions occasionally experience power failures or network outages. To protect against region‑level events, the platform can be duplicated in a secondary region. Data replication is achieved via asynchronous database replication (streaming WAL logs to a replica), cross‑region object replication (S3 replicates objects to a bucket in the failover region) and reusable Terraform modules to provision identical stacks. Vault can be unsealed in the new region and secrets injected into Nomad tasks.

Disaster Recovery Principles

The architecture follows a fault-tolerant and auto-recoverable infrastructure, using Terraform for provisioning, Nomad for orchestration, and high-availability databases and services.

Requirement

Implementation

System Downtime < 2 hours

Auto-recovery workflows, multi-AZ deployment.

Data Loss Prevention

Multi-level backups, read-replicas, continuous WAL archiving.

Regular Maintenance

Security patches, OS updates, application versioning.

Periodic Usage Reporting

Automated AWS resource allocation & monitoring logs.

Recovery Time and Recovery Point Objectives

Metric

Target

Recovery Time Objective (RTO)

≤ 2 hours

Recovery Point Objective (RPO)

≤ 5 minutes (for databases), ≤ 24 hours (for archived datasets)

When designing for multi‑region failover, it is important to consider data transfer fees. AWS charges for inter‑region data replication; typical rates range from $0.02 to $0.17 per GB depending on the source and destination[8]. In our cost model we assume 1 TB per month replicated at $0.02/GB.

Cost Estimation

One advantage of running your own platform is the ability to predict and control costs. The table below summarises monthly costs for the example architecture using on‑demand pricing.

Prices are approximate and vary by region and time; we use publicly available per‑hour costs for EC2 instances as of 2025. Storage costs are based on list prices for EBS gp3 and S3 Standard‑Infrequent Access.

Service

Instance type

Count

Price (USD/h)

Monthly cost (USD)

Traefik proxy

t3.medium

2

0.0416

≈60.74

Nomad servers

t3.large

3

0.083

≈181.77

Consul servers

t3.small

3

0.0208

≈45.55

Vault servers

t3.medium

3

0.0416

≈91.10

Backend services

t3.large

4

0.083

≈242.36

PostgreSQL primary

r5.large

1

0.126

≈91.98

PostgreSQL replica

r5.large

1

0.126

≈91.98

GeoServer

t3.medium

2

0.0416

≈60.74

ERDDAP

m5.medium

1

0.048

≈35.04

Prometheus

t3.medium

1

0.0416

≈30.37

Grafana

t3.medium

1

0.0416

≈30.37

Loki

t3.medium

1

0.0416

≈30.37

Alertmanager

t3.small

2

0.0208

≈30.37

Compute total

≈1,022.73

EBS gp3 storage (2×1 TB + 200 GB logs)

0.08 per GB‑month

≈179.84

S3 Standard‑IA (10 TB)

0.0125 per GB‑month

≈128.00

Inter‑region data transfer (1 TB)

0.02 per GB

≈20.48

Estimated monthly total

≈1,351.05

The compute total assumes on‑demand pricing. Reserved or spot instances can reduce costs by 30–70 %. Storage costs scale linearly with capacity; storing 5 TB instead of 10 TB would halve the S3 cost. Data transfer fees depend on replication volume and the number of read requests served from the secondary region.

Conclusion

Building your own cloud platform may seem like a large effort, but with careful design and open‑source tools it becomes manageable and cost‑effective.

Using Nomad instead of Kubernetes simplifies operations and run both containerised and bare‑metal workloads. A multi‑region AWS deployment with EC2, EBS and S3 provides high availability and disaster recovery.

Data sovereignty is preserved by choosing EU regions and self‑hosted services. The estimated monthly cost of around $1.35 k is competitive with managed solutions, and could be reduced further by using reserved or spot instances.

The architecture described here is only a starting point. Real‑world deployments will be tailored to specific workloads, compliance requirements and organisational constraints. Nonetheless, it demonstrates that an open‑source stack can power a modern geospatial data platform at scale.

CONTACT US

We can help your organisation deploy cloud-native data platforms using open-source services, on both AWS and Azure. Contact us to discuss your use-case and setup a collaboration plan.

Get in touch