OpenShift SDN - Additional Concepts | Architecture

Overview
Design on Masters
Design on Nodes
- Packet Flow
- External Access to the Cluster Network

Overview

OpenShift uses a software-defined networking (SDN) approach to provide a unified cluster network that enables communication between containers across the OpenShift cluster. This cluster network is established and maintained by the OpenShift SDN, which configures an overlay network using Open vSwitch (OVS).

OpenShift SDN includes the ovssubnet SDN plug-in for configuring the network, which provides a "flat" pod network where every pod can communicate with every other pod and service.

Following is a detailed discussion of the design and operation of OpenShift SDN, which may be useful for troubleshooting.

Design on Masters

On an OpenShift master, OpenShift SDN maintains a registry of nodes, stored in etcd. When the system administrator registers a node, OpenShift SDN allocates an unused subnet from the cluster network and stores this subnet in the registry. When a node is deleted, OpenShift SDN deletes the subnet from the registry and considers the subnet available to be allocated again.

In the default configuration, the cluster network is the 10.1.0.0/16 class B network, and nodes are allocated /24 subnets (i.e., 10.1.0.0/24, 10.1.1.0/24, 10.1.2.0/24, and so on). This means that the cluster network has 256 subnets available to assign to nodes, and a given node is allocated 254 addresses that it can assign to the containers running on it. The size and address range of the cluster network are configurable, as is the host subnet size.

Note that OpenShift SDN on a master does not configure the local (master) host to have access to any cluster network. Consequently, a master host does not have access to containers via the cluster network, unless it is also running as a node.

Design on Nodes

On a node, OpenShift SDN first registers the local host with the SDN master in the aforementioned registry so that the master allocates a subnet to the node.

Next, OpenShift SDN creates and configures six network devices:

br0, the OVS bridge device that containers will be attached to. OpenShift SDN also configures a set of non-subnet-specific flow rules on this bridge. The ovssubnet plug-in waits to do so until the SDN master announces the creation of the new node subnet.
lbr0, a Linux bridge device, which is configured as Docker’s bridge and given the cluster subnet gateway address (eg, 10.1.x.1/24).
tun0, an OVS internal port (port 2 on br0). This also gets assigned the cluster subnet gateway address, and is used for external network access. OpenShift SDN configures netfilter and routing rules to enable access from the cluster subnet to the external network via NAT.
vlinuxbr and vovsbr, two Linux peer virtual Ethernet interfaces. vlinuxbr is added to lbr0 and vovsbr is added to br0 (port 9), to provide connectivity for containers created directly with Docker outside of OpenShift.
vxlan0, the OVS VXLAN device (port 1 on br0), which provides access to containers on remote nodes.

Each time a pod is started on the host, OpenShift SDN:

moves the host side of the pod’s veth interface pair from the lbr0 bridge (where Docker placed it when starting the container) to the OVS bridge br0.
adds OpenFlow rules to the OVS database to route traffic addressed to the new pod to the correct OVS port.

The pod is allocated an IP address in the cluster subnet by Docker itself because Docker is told to use the lbr0 bridge, which OpenShift SDN has assigned the cluster gateway address (eg. 10.1.x.1/24). Note that the tun0 is also assigned the cluster gateway IP address because it is the default gateway for all traffic destined for external networks, but these two interfaces do not conflict because the lbr0 interface is only used for IPAM and no OpenShift SDN pods are connected to it.

OpenShift SDN nodes also watch for subnet updates from the SDN master. When a new subnet is added, the node adds OpenFlow rules on br0 so that packets with a destination IP address the remote subnet go to vxlan0 (port 1 on br0) and thus out onto the network.

Packet Flow

Suppose we have two containers A and B where the peer virtual Ethernet device for container A’s eth0 is named vethA and the peer for container B’s eth0 is named vethB.

If Docker’s use of peer virtual Ethernet devices is not already familiar to you, review Docker’s advanced networking documentation.

Now suppose first that container A is on the local host and container B is also on the local host. Then the flow of packets from container A to container B is as follows:

eth0 (in A’s netns) → vethA → br0 → vethB → eth0 (in B’s netns)

Next, suppose instead that container A is on the local host and container B is on a remote host on the cluster network. Then the flow of packets from container A to container B is as follows:

eth0 (in A’s netns) → vethA → br0 → vxlan0 → network ^[1] → vxlan0 → br0 → vethB → eth0 (in B’s netns)

Finally, if container A connects to an external host, the traffic looks like:

eth0 (in A’s netns) → vethA → br0 → tun0 → (NAT) → eth0 (physical device) → Internet

Almost all packet delivery decisions are performed with OpenFlow rules in the OVS bridge br0, which simplifies the plug-in network architecture and provides flexible routing.

External Access to the Cluster Network

If a host that is external to OpenShift requires access to the cluster network, you have two options:

Configure the host as an OpenShift node but mark it unschedulable so that the master does not schedule containers on it.
Create a tunnel between your host and a host that is on the cluster network.

Both options are presented as part of a practical use-case in the documentation for configuring routing from an edge load-balancer to containers within OpenShift SDN.

1. After this point, device names refer to devices on container B’s host.