Skip to main content

Merkle CRDTs in DefraDB

Overview

Merkle Conflict-free Replicated Data Types (CRDTs) are a class of data structures that simplify data synchronization across distributed systems. They allow multiple entities to update the same data structure simultaneously—even while offline—without risk of data conflicts.

Merkle CRDTs enable software to update data locally without human intervention and later synchronize it with other systems in a conflict-free manner. This is achieved by encoding deterministic merge rules and causal history directly in the data.

First introduced in 2011, CRDTs have since become essential to local-first and offline-capable software. Merkle CRDTs enhance traditional CRDTs by incorporating Merkle clocks, making them especially useful for high-churn distributed infrastructure.


Background on Regular CRDTs

Conflict-free Replicated Data Types (CRDTs) are data structures that support eventual consistency across distributed systems without requiring coordination or locking.

They can represent common structures such as:

  • Registers (single value storage)
  • Counters
  • Sets (e.g. G-Set, OR-Set)
  • Lists (e.g. RGA - Replicated Growable Array)
  • Maps

The defining characteristic of CRDTs is that concurrent, independent changes can be merged deterministically. This enables:

  • Local updates without coordination
  • Automatic conflict resolution
  • Convergence across all replicas

To achieve this, CRDTs must encode causal ordering—the order in which operations happened. In most implementations, this is done using logical clocks, like vector clocks, which are used to track the versions of data at each replica.

However, vector clocks can become inefficient when:

  • The number of peers increases significantly
  • Peers frequently join and leave (high churn)
  • System resources (memory/bandwidth) are limited

Why CRDTs Are Essential in Distributed Systems

Distributed systems require synchronization mechanisms that are robust against network partitions, latency, and peer failures. CRDTs are ideal for:

  • Offline-first applications like note-taking apps, messaging, or CRMs
  • Multi-device sync (phones, tablets, desktops)
  • Peer-to-peer networks (e.g., IPFS, Secure Scuttlebutt)
  • Edge computing and IoT devices with intermittent connectivity

They work by ensuring that:

  1. Local changes are safe — They don't require global consensus.
  2. Synchronization is eventual — Systems can merge states later.
  3. Conflicts are mathematically impossible — Merge rules are built-in.

Traditional CRDTs, however, struggle with scalability due to metadata overhead, particularly in systems with large or dynamic peer populations.


Merkle CRDTs: A Scalable Solution

Merkle CRDTs address the scalability limitations of traditional CRDTs by using Merkle clocks to encode causality instead of peer-specific metadata.

What Is a Merkle Clock?

A Merkle clock is a Directed Acyclic Graph (DAG) where:

  • Each node represents an update or operation.
  • Nodes are linked via parent hashes, forming a causally-ordered chain of updates.
  • Each node has a unique Content Identifier (CID) derived from its contents and parent(s).

Merkle clocks:

  • Are content-addressed (immutable)
  • Encode causal history directly in structure
  • Support branching and merging
  • Avoid the need for tracking individual peers

This makes them ideal for decentralized or high-churn environments.


Anatomy of a Merkle CRDT

A Merkle CRDT is a combination of:

  1. Data type: The base structure (e.g., Set, Map, List)
  2. CRDT type:
    • Operation-based: Sends operations (e.g., add/remove)
    • State-based: Sends the entire state
    • Delta-based: Sends only the minimal state change
  3. Semantic type: The application logic (e.g., counter, register)

Each update is:

  • Encapsulated in a node
  • Assigned a CID
  • Linked to its predecessors (parent nodes)

This model allows for decentralized state evolution and deterministic conflict-free merging.


Delta State Semantics in CRDTs

There are three main semantics in CRDTs:

TypeDescriptionEfficiency
Operation-basedSends intent of operation (e.g., “+1”)Moderate
State-basedSends full current stateLeast efficient
Delta-basedSends minimal change (delta)Most efficient

Delta CRDTs are best suited for resource-constrained systems:

  • Require less bandwidth
  • Lower memory and compute overhead
  • Enable fast and lightweight updates

Example:

Instead of transmitting a whole list, a delta CRDT for a shopping list update may just send: ["+banana"].


Branching and Merging State

Branching in Merkle CRDTs

Branching occurs when multiple updates happen concurrently from the same base node. Each update becomes a new branch in the DAG.

  • Useful in offline-first scenarios
  • Each device maintains its own update stream
  • No coordination needed during branch creation

Merging in Merkle CRDTs

When peers resynchronize, branches are merged using CRDT merge logic:

  1. Find lowest common ancestor (LCA) in the DAG
  2. Walk paths from ancestor to each branch tip
  3. Apply CRDT merge function to combine updates
  4. Create new node with combined state and both parents

This ensures deterministic, conflict-free convergence across peers.


Real-World Applications of Merkle CRDTs

Merkle CRDTs are increasingly being adopted in:

  • Decentralized databases like DefraDB
  • P2P collaboration platforms (e.g., peer-to-peer document editors)
  • Blockchain off-chain storage systems
  • Federated machine learning where models sync asynchronously
  • Edge computing networks needing fault-tolerant sync

They are especially powerful in IPFS-based systems, where Merkle DAGs are native.


Challenges and Considerations

Despite their advantages, Merkle CRDTs come with trade-offs:

  • Storage complexity: DAGs can grow large over time
  • Garbage collection: Old nodes may need pruning
  • Complexity: Requires understanding Merkle structures and CID management
  • Latency: Merge complexity can introduce processing delay in large DAGs

Tools like DefraDB address these issues through DAG pruning, snapshotting, and efficient storage engines.


Summary

Merkle CRDTs are an evolution of traditional CRDTs designed for scalability, efficiency, and decentralized synchronization. By combining CRDT semantics with Merkle clocks, they:

  • Encode causality in a scalable, metadata-free manner
  • Support seamless branching and merging
  • Optimize for bandwidth and compute constraints
  • Provide deterministic, conflict-free sync

They are ideal for local-first applications, high-churn networks, and edge or embedded systems where robustness and autonomy are essential.


Further Reading