MariaDB Galera: The HA database that works

Some years ago, I was looking for a good free (and if possible open source) alternative to enterprise software that barely worked. Specifically - a reliable High Availability database technology, because there surely must be at least one, right?

Turns out there is. I stumbled into MariaDB Galera clusters and it actually works - not in a "technically qualifies as HA if you squint" way (like DB2 😂) - but actually holds up in production at 3 AM when something in your stack decides to have an episode but you're on vacation in Italy, so now you have to get eaten alive by mosquitos in the hotel lobby because the WiFi in your room sucks.

What I found was genuinely impressive: stable multi-master replication with HA that holds up in practice. Since then I've set up multiple geographically distributed clusters for clients and have easily resolved every problem I've encountered with them - which, for a technology running mission-critical systems, is the bar.

Some providers with extremely expensive products like IBM don't seem to understand that there IS a bar, but let's not judge a milti-billion dollar company for failing to deliver a reliable database product for over 30 years. We have better things to do...

Back to Galera 👉

Eventually, when in 2025 Bitnami got absorbed into Broadcom and pivoted toward a more corporate model 🫰🤑, I started maintaining my own containers repository to continue the mariadb-galera work - both for my own use and for the community. As of writing it's sitting at 1.3k Docker Hub pulls, which tells me I'm not the only one who wanted a reliable, community-maintained build to keep depending on.

This article is about why I reach for Galera, what the honest trade-offs are, and how to actually deploy it.


The Postgres Situation

Let me be clear upfront: PostgreSQL is excellent. It's my go-to for high-performance database work, especially when I need extended datatype support, PostGIS, or plugins like pgvector for LLM vector search. Postgres does things MariaDB simply can't, even though since recently MariaDB implemented vector search too.

But Postgres is great until you need HA.

Here's what a production-grade Postgres HA stack looks like:

                        ┌─────────────┐
                        │   Clients   │
                        └──────┬──────┘
                               │
                        ┌──────v──────┐
                        │   HAProxy   │  << Routes writes to primary,
                        │             │    reads to replicas
                        └──────┬──────┘
                               │
                        ┌──────v──────┐
                        │  PgBouncer  │  << Connection pooling
                        │   (pool)    │
                        └──────┬──────┘
                               │
          ┌────────────────────┼────────────────────┐
          │                    │                    │
   ┌──────v──────┐      ┌──────v──────┐      ┌──────v──────┐
   │  PostgreSQL │      │  PostgreSQL │      │  PostgreSQL │
   │   Primary   │─────>│  Replica 1  │      │  Replica 2  │
   │  (R/W)      │──────┼────────────────────>  (R/O)      │
   └──────┬──────┘      └──────┬──────┘      └──────┬──────┘
          │                    │                    │
   ┌──────v──────┐      ┌──────v──────┐      ┌──────v──────┐
   │   Patroni   │      │   Patroni   │      │   Patroni   │
   └──────┬──────┘      └──────┬──────┘      └──────┬──────┘
          │                    │                    │
          └────────────────────┼────────────────────┘
                               │
                        ┌──────v──────┐
                        │  etcd/      │  << Distributed consensus
                        │  Consul/ZK  │    (leader election, config)
                        └─────────────┘

   ─────────────────────────────────────────────────────────
   WAL Archiving >>  Object Storage / NFS  (for PITR)

That's a lot of moving parts before you've written a single line of application code.

Let's count: Patroni (or Stolon, or repmgr) for failover orchestration, etcd/Consul/ZooKeeper for distributed consensus, HAProxy to route writes vs reads to the right backends, PgBouncer for connection pooling, pg_rewind or pg_basebackup for rebuilding failed primaries, and WAL archiving for point-in-time recovery.

Each of those is a separate project, maintained by separate people, on separate release schedules. I've tried Patroni, pgAutoFailover, and EDB's commercial solutions - and I've hit version compatibility or reliability problems slightly more often than I'd like.

And even when the whole thing is running and healthy, there's a fundamental architectural constraint you don't escape: replicas are read-only. 🤷 Just, why?

You have one primary accepting writes, and everything else is just catching up. True multi-master isn't built into Postgres. You can do logical replication and handle conflicts yourself, or pay for EDB's commercial product (asynchronous multi-master with conflict resolution - not free, and not synchronous). After a failover, the old primary needs careful re-joining or a full rebuild. HAProxy has to distinguish the primary from the replicas with separate backend pools. The fundamental issue is that Postgres was designed primary-replica and multi-master is bolted on top. Logical replication doesn't give you the certification-based conflict detection Galera has, so you're always working around the edges of the model rather than with it.

Then there's Citus (another layer on top 🤦)
Horizontally shards Postgres across nodes. Each shard has one primary, so it's not true multi-master - but it distributes writes across the cluster by routing each write to the node that owns that shard. More of a scaling solution than an HA one.

I still use Postgres when it fits. I don't trust it for HA-critical systems without accepting that you're signing up to maintain a small distributed systems project on top of your database.


Enter Galera

MariaDB Galera Cluster is a synchronous multi-master clustering solution for MariaDB. The short version of what it gives you:

  • All nodes accept reads and writes - true multi-master, not "primary + some read-only replicas"
  • Synchronous replication - if a transaction commits, it exists on every node
  • Automatic conflict resolution - built-in, not your problem
  • Automatic node rejoin - a node goes down, comes back, and re-syncs via State Snapshot Transfer without you doing anything
  • Simple failover - HAProxy just routes to any healthy node, no special primary/replica distinction needed
  • One dependency - HAProxy. That's it.

The Galera overview looks like this:

                        ┌─────────────┐
                        │   Clients   │
                        └──────┬──────┘
                               │
                        ┌──────v──────┐
                        │   HAProxy   │  << Routes to any healthy node
                        └──────┬──────┘    (all nodes are equal)
                               │
          ┌────────────────────┼────────────────────┐
          │                    │                    │
   ┌──────v──────┐      ┌──────v──────┐      ┌──────v──────┐
   │   MariaDB   │      │   MariaDB   │      │   MariaDB   │
   │   Node 1    │      │   Node 2    │      │   Node 3    │
   │   (R/W)     │      │   (R/W)     │      │   (R/W)     │
   └──────┬──────┘      └──────┬──────┘      └──────┬──────┘
          │                    │                    │
          │         Galera Replication              │
          │      (wsrep / group communication)      │
          │                    │                    │
          └────────────────────┴────────────────────┘
                     ^
                     │
               Synchronous
               certification-based
               replication (all nodes
               commit together)

Let's compare the stacks side by side:

Component PostgreSQL HA Galera
HAProxy ✅ (must route writes to primary) ✅ (route to any healthy node)
PgBouncer ✅ (PG connections are heavy) ❌ (MariaDB handles connections better)
Patroni ✅ (failover orchestration) ❌ (built-in)
etcd/Consul/ZK ✅ (for Patroni's DCS) ❌ (Galera handles its own membership)
Separate replication config ✅ (WAL streaming setup) ❌ (just enable wsrep)

The version compatibility nightmares I described earlier? Galera's HA is part of the database. There's no external orchestrator to fall out of sync with.


Topology Choices

Galera gives you two main ways to configure node discovery, and it's worth understanding both.

Star Topology (Primary-Join)

One designated primary bootstraps the cluster. Other nodes join through it and then discover each other dynamically.

# Primary
MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://
MARIADB_GALERA_CLUSTER_BOOTSTRAP=yes

# Secondary nodes
MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://primary

Simple mental model, easier to add nodes (just point them at the primary), and the recovery procedure is always the same: "start the primary first." The trade-off is that after a full cluster crash, you must start the primary first - you can't arbitrarily pick the node with the most recent data.

Full-Mesh Topology

Every node knows about every other node upfront.

# All nodes have identical config
MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://node1,node2,node3

More symmetric, more flexible recovery - after a full crash you check each node's grastate.dat for the highest seqno and bootstrap whichever one has the most recent state. Adding or removing nodes requires updating all configs, which is more operational overhead, but the recovery story is cleaner.

Scenario Full-Mesh Star
Adding a node Update all nodes' config + restart Just add new node pointing to primary
Full cluster crash Choose any node with best data Must restart primary first
Primary node failure No impact, symmetric No impact during runtime, but affects recovery
Split-brain resolution Explicit quorum (2/3) Relies on dynamic discovery
Config complexity Higher (all nodes need updates) Lower (star topology)

For most setups I start with the star topology because the operational simplicity is worth it. If the primary is on weaker hardware than the other nodes, or you're in a geographically distributed setup where the primary's datacenter could be entirely unreachable, full-mesh is the better call.


Deployment: Docker Compose

Here's a working two-node cluster with HAProxy on a single host. Good for development, testing, or small deployments where you don't need cross-host HA.

Prep

mkdir -p /opt/services/database/galera-test/nulldb0 \
         /opt/services/database/galera-test/nulldb1 \
         /opt/services/database/galera-test/backup/nulldb \
         /opt/services/haproxy-galera-test

chown -R 1001:1001 /opt/services/database/galera-test
chmod -R 750 /opt/services/database/galera-test /opt/services/haproxy-galera-test

Compose file

---
networks:
    database-test:
        name: database-test
        driver: bridge
        ipam:
            driver: default
            config:
                - subnet: 185.24.237.0/24

services:
    nulldb0:
        image: nullata/mariadb-galera:latest
        volumes:
            - "/opt/services/database/galera-test/nulldb0:/nullata/mariadb"
            - "/opt/services/database/galera-test/backup/nulldb:/backup"
        environment:
            - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://
            - MARIADB_GALERA_CLUSTER_BOOTSTRAP=yes
            - MARIADB_GALERA_CLUSTER_NAME=nullGDC
            - MARIADB_USER=mahlul
            - MARIADB_PASSWORD=mahlulpass
            - MARIADB_GALERA_MARIABACKUP_USER=mahbackup
            - MARIADB_GALERA_MARIABACKUP_PASSWORD=mahbackuppass
            - MARIADB_ROOT_PASSWORD=mahrootpass
            - MARIADB_DATABASE=nulldb
            - TZ=Europe/Copenhagen
            # force a (re)bootstrap after a total outage:
            # - MARIADB_GALERA_FORCE_SAFETOBOOTSTRAP=yes
        healthcheck:
            test: ["CMD", "/opt/nullata/scripts/mariadb-galera/healthcheck.sh"]
        restart: unless-stopped
        networks:
            - database-test

    nulldb1:
        image: nullata/mariadb-galera:latest
        volumes:
            - /opt/services/database/galera-test/nulldb1/:/nullata/mariadb/
            - /opt/services/database/galera-test/backup/nulldb/:/backup/
        environment:
            - MARIADB_GALERA_CLUSTER_NAME=nullGDC
            - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://nulldb0
            - MARIADB_GALERA_MARIABACKUP_USER=mahbackup
            - MARIADB_GALERA_MARIABACKUP_PASSWORD=mahbackuppass
            - MARIADB_ROOT_PASSWORD=mahrootpass
            - TZ=Europe/Copenhagen
        depends_on:
            nulldb0:
                condition: service_healthy
        healthcheck:
            test: ["CMD", "/opt/nullata/scripts/mariadb-galera/healthcheck.sh"]
        restart: unless-stopped
        networks:
            - database-test

    nulldc-haproxy:
        image: haproxy:latest
        restart: unless-stopped
        links:
            - nulldb0:nulldb0
            - nulldb1:nulldb1
        depends_on:
            nulldb0:
                condition: service_healthy
            nulldb1:
                condition: service_healthy
        ports:
            - "3306:3306"
        volumes:
            - ./apps/haproxy/nulldc/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
        networks:
            - database-test

HAProxy config

global
    log 127.0.0.1 local0 notice

defaults
    retries 3
    option  redispatch
    timeout client 600s
    timeout connect 10s
    timeout server 600s
    timeout http-request 10s
    timeout http-keep-alive 2s
    timeout queue 5s
    timeout tunnel 2m
    timeout client-fin 1s
    timeout server-fin 1s

frontend galera_cluster_frontend
    bind *:3306
    mode tcp
    option tcplog
    default_backend galera_cluster_backend

backend galera_cluster_backend
    mode tcp
    option tcpka
    balance leastconn
    server nulldb0 nulldb0:3306 check weight 1
    server nulldb1 nulldb1:3306 check weight 1

balance leastconn distributes connections to whichever node has the fewest active ones. Since all nodes are equal, this is the right call - there's no reason to pin writes to a specific node.


Deployment: Docker Swarm (Multi-Host)

For actual cross-host HA, Docker Swarm with an overlay network is a clean option. Two physical servers, one Swarm cluster, one overlay network - both Galera nodes discover each other through it.

ServerA (manager)          ServerB (worker)
----------------------------------------------
nulldb0 (primary)          nulldb1 (joiner)
haproxy                    (optional haproxy)

Initialize the Swarm

On ServerA:

docker swarm init --advertise-addr <ServerA_IP>

It prints a join token. On ServerB:

docker swarm join --token <token> <ServerA_IP>:2377

Create the overlay network

docker network create \
  --driver=overlay \
  --attachable \
  galera-net

Prep directories

On ServerA:

mkdir -p \
  /opt/services/database/galera-test/nulldb0 \
  /opt/services/database/galera-test/backup/nulldb \
  /opt/services/haproxy-galera-test

chown -R 1001:1001 /opt/services/database/galera-test
chmod -R 750 /opt/services/database/galera-test /opt/services/haproxy-galera-test

On ServerB:

mkdir -p \
  /opt/services/database/galera-test/nulldb1 \
  /opt/services/database/galera-test/backup/nulldb

chown -R 1001:1001 /opt/services/database/galera-test
chmod -R 750 /opt/services/database/galera-test

Stack file

# galera-stack.yaml
networks:
  galera-net:
    driver: overlay
    attachable: true
    ipam:
      config:
        - subnet: 180.10.0.0/24

services:

  galera_primary:
    image: nullata/mariadb-galera:latest
    networks:
      galera-net:
        ipv4_address: 180.10.0.10
    volumes:
      - /opt/services/database/galera-test/nulldb0:/nullata/mariadb
      - /opt/services/database/galera-test/backup/nulldb:/backup
    environment:
      - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://
      - MARIADB_GALERA_CLUSTER_BOOTSTRAP=yes
      - MARIADB_GALERA_CLUSTER_NAME=nullGDC
      - MARIADB_ROOT_PASSWORD=mahrootpass
      - MARIADB_USER=mahlul
      - MARIADB_PASSWORD=mahlulpass
      - MARIADB_GALERA_MARIABACKUP_USER=mahbackup
      - MARIADB_GALERA_MARIABACKUP_PASSWORD=mahbackuppass
      - MARIADB_DATABASE=nulldb
      - TZ=Europe/Copenhagen
    deploy:
      placement:
        constraints:
          - node.hostname == serverA
      restart_policy:
        condition: on-failure

  galera_secondary:
    image: nullata/mariadb-galera:latest
    networks:
      galera-net:
        ipv4_address: 180.10.0.11
    volumes:
      - /opt/services/database/galera-test/nulldb1:/nullata/mariadb
      - /opt/services/database/galera-test/backup/nulldb:/backup
    environment:
      - MARIADB_GALERA_CLUSTER_NAME=nullGDC
      - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://tasks.galera_primary
      - MARIADB_ROOT_PASSWORD=mahrootpass
      - MARIADB_GALERA_MARIABACKUP_USER=mahbackup
      - MARIADB_GALERA_MARIABACKUP_PASSWORD=mahbackuppass
      - TZ=Europe/Copenhagen
    deploy:
      placement:
        constraints:
          - node.hostname == serverB
      restart_policy:
        condition: on-failure

  haproxy:
    image: haproxy:latest
    networks:
      - galera-net
    ports:
      - "3306:3306"
    volumes:
      - /opt/services/haproxy-galera-test/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
    deploy:
      placement:
        constraints:
          - node.hostname == serverA
      restart_policy:
        condition: on-failure

The HAProxy config for Swarm is identical to the Compose version - just update the server names to match the service names (galera_primary and galera_secondary).

Deploy the stack on ServerA:

docker stack deploy -c galera-stack.yml galera

Voilà.


Recovery: What Actually Happens When Things Go Wrong

Here's where Galera earns its reputation. During normal operations, node failures are essentially a non-event:

  • A node goes down >> cluster keeps running (remaining nodes hold quorum)
  • The node comes back >> it rejoins automatically via SST/IST, syncs the missed data, and starts accepting traffic again
  • Multiple nodes go down but not all >> same story, they rejoin without ceremony

The only scenario that requires manual intervention is a full cluster crash - all nodes go down simultaneously. This can happen after a datacenter power event, a botched maintenance window, or that one intern who docker stop's the wrong thing. In that case:

Star topology: ALWAYS start the primary first. The other nodes will join after.

Full-mesh topology: Check grastate.dat on each node, find the highest seqno, and bootstrap that one:

# Check which node has the most recent data
cat /opt/services/database/galera-test/nulldb0/grastate.dat

# If safe_to_bootstrap: 1 is set, that node is the candidate
# Otherwise, force it with:
# MARIADB_GALERA_FORCE_SAFETOBOOTSTRAP=yes

Then start the other nodes and let them sync.


The Honest Trade-offs

Galera isn't magic. A few things worth knowing before you do a 3 AM deployment:

Galera requires InnoDB. MyISAM tables don't replicate. This is almost never a problem in 2026, but worth stating.

Write performance scales differently. In a Galera cluster, every write is certified across all nodes before committing. More nodes doesn't mean more write throughput - it means more write latency as quorum size grows. For read-heavy workloads this is a non-issue; for extremely write-heavy systems, you need to think about it.

Two nodes is a bad idea. A two-node Galera cluster has no quorum - if one node loses contact with the other (network partition), neither can determine if the other is down or just unreachable. Three nodes is the minimum for real HA. Two nodes is fine for development or as a stepping stone, but don't run it in production and call it HA. (IBM take notes 📝)

Schema changes need care. DDL statements in Galera are replicated but can lock the cluster in some configurations. Total Order Isolation (TOI) is the default and is safe; just be aware that a heavy migration on a large table will block writes cluster-wide.


The Bottom Line

If I'm building something where HA matters and I don't need Postgres-specific features, Galera is my default. Postgres is great, but the operational simplicity of Galera's built-in HA vs assembling a Patroni + etcd + PgBouncer stack is not even close. Less to configure, less to monitor, less to break, less to stay compatible across versions.

For peace of mind on distributed systems, "it just rejoins when it comes back" is a sentence I want to be able to say about my database.


More recently I've also started publishing hardened images on Docker Hub for production use cases where image security matters - check the repository for the latest available builds.

If you relied on a Bitnami-maintained image that no longer gets the attention it deserves and want to see it picked up, open an issue on GitHub and I'll have a look when possible.


Live long and prosper. 🖖👽

Share this article

Copied!

Join the conversation

Like & Comment on