Connecting Kubernetes Clusters Intelligently with Cilium Cluster Mesh

05 Mar 2026 7 min read

Today, modern Kubernetes architectures no longer stop at a single cluster. Instead, they often adopt multiple clusters to address different needs, such as:

geographic distribution
high availability
environment separation (dev, staging, production)
team isolation
management of very large platforms

However, as soon as we move to a multi-cluster architecture, a new challenge immediately appears.

How can we allow applications distributed across different clusters to communicate as if they were part of the same system?

Pods need to be able to communicate with each other.
Services need to be globally discoverable.
Traffic should be able to be balanced across multiple clusters.
And security policies must continue to work consistently.

This is exactly the problem that Cilium Cluster Mesh aims to solve.

The Kubernetes Multi-Cluster Problem

A single Kubernetes cluster provides many powerful capabilities:

service discovery
pod-to-pod networking
load balancing
network policies

The problem is that all these features are limited to a single cluster.

If we deploy the same application across three clusters, the situation looks like this:

Each cluster operates independently.

A frontend running in cluster A cannot automatically discover or reach a backend running in cluster B or C.

Services, as well as policies, are local, and the network is local too.

In other words, each cluster becomes its own networking island.

This makes it significantly more complex to build:

multi-region services
systems with cross-cluster failover
large-scale distributed applications

And this is where Cluster Mesh comes into play.

A Quick Introduction: What Is Cilium?

Before diving into Cluster Mesh, it’s worth taking a step back and briefly introducing Cilium.

In Kubernetes, networking between pods is not managed directly by the core system. Instead, it is delegated to external components called CNI plugins (Container Network Interface). These plugins are responsible for providing fundamental networking capabilities such as:

pod-to-pod communication
traffic routing
IP address management
enforcement of network policies

Over the years, several widely used CNI solutions have emerged, including Flannel, Calico, Weave, and Canal.

Cilium is one of the most modern and innovative CNIs in this ecosystem. Unlike many traditional solutions that rely heavily on iptables, Cilium uses eBPF, a Linux kernel technology that enables networking, security, and observability to be implemented in a far more efficient and flexible way.

Thanks to this approach, Cilium does much more than simply provide networking between pods. It introduces advanced capabilities such as:

identity-based workload security
advanced network observability
service mesh integration
multi-cluster networking

In recent years, Cilium has become increasingly popular in the Kubernetes ecosystem and is now widely used in many cloud platforms and production environments.

It is a very rich and interesting tool that certainly deserves a deeper exploration (perhaps in a dedicated article). In this article, however, we will focus on one of its most powerful features: Cluster Mesh.

What Is Cilium Cluster Mesh?

Cluster Mesh is a Cilium feature that allows multiple Kubernetes clusters to be connected, creating a single logical multi-cluster network.

The core idea is quite simple:

clusters remain independent
but workloads can discover and communicate with each other

From the perspective of applications, pods distributed across different clusters can behave as if they were running in the same networking environment.

What Cluster Mesh Actually Enables

Communication Between Pods Across Different Clusters

With Cluster Mesh, pods can communicate even when they are running in different clusters. A pod in cluster A can directly reach a pod in cluster B.

This works because clusters share information about:

pod endpoints
nodes
available services

From the application’s perspective, the infrastructure simply expands, rather than becoming more complex.

Multi-Cluster Service Discovery

Cluster Mesh allows clusters to share information about services.

Each cluster can know:

which services exist in other clusters
which endpoints make up those services
where they are located in the network

This makes it possible to implement distributed service discovery across clusters.

Global Services and Cross-Cluster Load Balancing

One of the most interesting features enabled by Cluster Mesh is the ability to distribute traffic across multiple clusters.

To understand how this works, we first need to introduce the concept of a Global Service.

In Kubernetes, a Service is normally limited to the cluster in which it is defined. This means that traffic is load balanced only to pods running within the same cluster.

Cilium introduces the concept of a Global Service: a Service that can aggregate endpoints coming from multiple clusters within the mesh.

When a Service is configured as global, each cluster can see not only the local backends, but also those running in other clusters.

Cross-Cluster Load Balancing

Let’s imagine a service with backends distributed across two clusters.

Without Cluster Mesh:

Each cluster uses only its own backends.
With Cluster Mesh and Global Services, however:

The service can distribute traffic to backends running in different clusters.

Requirements for Connecting Multiple Clusters

To create a Cluster Mesh, a few prerequisites must be met.

Non-Overlapping Pod CIDRs

Each cluster must use different IP ranges for pods.

For example:

Cluster 1 → 11.0.0.0/8
Cluster 2 → 12.0.0.0/8

This is necessary to avoid routing conflicts.

Connectivity Between Nodes

Nodes from different clusters must be able to reach each other over the network.

Cluster Mesh does not create the underlying connectivity; it relies on existing network connectivity.

Unique Cluster Identity

Each cluster must have:

a cluster name
a cluster ID

This allows Cilium components to identify the origin of the traffic.

When Cluster Mesh is enabled, Cilium creates a component called the Cluster Mesh API Server.

This component observes the state of the cluster and collects information about:

services
endpoints
nodes
workload identities

This information is then shared with the other clusters in the mesh.

Thanks to this mechanism, each cluster can have visibility into the other clusters.

The Scalability Problem of the Initial Model

The first Cluster Mesh model worked well in small environments.

However, as the number of clusters and nodes increased, a problem began to emerge.

Let’s imagine the following scenario:

3 clusters
100 nodes per cluster
1 Cilium agent per node

This means we have 300 agents.

In the original model, each agent had to synchronize with the other clusters to obtain information about services, endpoints, and identities.

This generated a large number of synchronization connections, leading to issues such as:

increased load on etcd
higher latency
scalability limitations

A different approach was needed.

The Solution: KV Store Mesh

To solve this problem, KV Store Mesh was introduced. The idea is to move the synchronization process from individual agents to the clusters themselves.

Before:

After:
agent → local cluster

Agents communicate only with their local datastore, while synchronization happens between the clusters’ datastores.

This drastically reduces:

the number of connections
synchronization traffic
the load on etcd

The result is a much more scalable architecture.

Cross-Cluster Security

In a multi-cluster environment, it’s essential that security policies can distinguish the origin of traffic across different clusters.

Cilium’s networking system makes it possible to enforce cluster-aware network policies by using the label:

io.cilium.k8s.policy.cluster

This label identifies which cluster the traffic comes from and can be used inside CiliumNetworkPolicy resources to control communication between workloads distributed across different clusters.

An important aspect of policies in Cluster Mesh is that they are not global:
a policy applies only to the cluster where it is created. If you want to enforce the same rule across multiple clusters, you must distribute the policy manually to each cluster.

Example 1 – Allow Traffic Only from the Frontend

In the following example, we define a policy that allows traffic to the backend only from pods with the label app=frontend.


apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-frontend
spec:
  endpointSelector:
    matchLabels:
      app: my-app
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend

With this configuration:

frontend pods can communicate with the backend
all other pods are automatically blocked

This behavior comes from Cilium’s default-deny model: when a policy selects an endpoint, all traffic that is not explicitly allowed is denied.

Example 2 – Allow Traffic Only from the Frontend of a Specific Cluster

You can make the policy even more restrictive by also filtering based on the origin cluster of the traffic.


apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-frontend-from-cluster-one
spec:
  endpointSelector:
    matchLabels:
      app: my-app
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
        io.cilium.k8s.policy.cluster: cluster-one

With this configuration:

the frontend in cluster-one can communicate with the backend
frontends running in other clusters are blocked

This enables granular security controls in multi-cluster scenarios.

Enforcing Cross-Cluster Security

Using the label io.cilium.k8s.policy.cluster makes it possible to define policies that take into account the origin cluster of the traffic.

This enables you to:

limit which clusters can communicate with a given service
isolate different environments (for example, production and staging)
implement zero-trust security models even in multi-cluster architectures

Why Cluster Mesh Matters

Cluster Mesh is not just a networking feature.
It’s a real architectural enabler.

It makes it possible to build distributed Kubernetes platforms where:

applications can scale across multiple clusters
traffic can be distributed across regions
failover can happen automatically
security remains consistent

All while keeping clusters operationally independent.

Conclusion

As Kubernetes platforms continue to grow, multi-cluster architectures are becoming increasingly common.

Cilium Cluster Mesh offers an elegant way to connect these clusters while maintaining consistent networking across distributed workloads.

Thanks to features such as Global Services and KV Store Mesh, multiple Kubernetes clusters can be treated as a single distributed logical environment.

And that’s exactly the kind of infrastructure modern cloud-native platforms increasingly need.