Routing from Edge Load Balancers | Administration

Overview
Including the Load Balancer in the SDN
Establishing a Tunnel Using a Ramp Node
- Configuring a Highly-Available Ramp Node

Overview

Pods inside of an OpenShift cluster are only reachable via their IP addresses on the cluster network. An edge load balancer can be used to accept traffic from outside networks and proxy the traffic to pods inside the OpenShift cluster. In cases where the load balancer is not part of the cluster network, routing becomes a hurdle as the internal cluster network is not accessible to the edge load balancer.

To solve this problem where the OpenShift cluster is using OpenShift SDN as the cluster networking solution, there are two ways to achieve network access to the pods.

Including the Load Balancer in the SDN

If possible, run an OpenShift node instance on the load balancer itself that uses OpenShift SDN as the networking plug-in. This way, the edge machine gets its own Open vSwitch bridge that the SDN automatically configures to provide access to the pods and nodes that reside in the cluster. The routing table is dynamically configured by the SDN as pods are created and deleted, and thus the routing software is able to reach the pods.

Mark the load balancer machine as an unschedulable node so that no pods end up on the load balancer itself:

$ oadm manage-node <load_balancer_hostname> --schedulable=false

If the load balancer comes packaged as a Docker container, then it is even easier to integrate with OpenShift: Simply run the load balancer as a pod with the host port exposed. The pre-packaged HAProxy router in OpenShift runs in precisely this fashion.

Establishing a Tunnel Using a Ramp Node

In some cases, the previous solution is not possible. For example, an F5 BIG-IP® host cannot run an OpenShift node instance or the OpenShift SDN because F5® uses a custom, incompatible Linux kernel and distribution.

Instead, to enable F5 BIG-IP® to reach pods, you can choose an existing node within the cluster network as a ramp node and establish a tunnel between the F5 BIG-IP® host and the designated ramp node. Because it is otherwise an ordinary OpenShift node, the ramp node has the necessary configuration to route traffic to any pod on any node in the cluster network. The ramp node thus assumes the role of a gateway through which the F5 BIG-IP® host has access to the entire cluster network.

Following is an example of establishing an ipip tunnel between an F5 BIG-IP® host and a designated ramp node.

On the F5 BIG-IP® host:

Set the following variables:

# F5_IP=10.3.89.66 (1)
# RAMP_IP=10.3.89.89 (1)
# TUNNEL_IP1=10.3.91.216 (2)
# CLUSTER_NETWORK=10.1.0.0/16 (3)

1	The `F5_IP` and `RAMP_IP` variables refer to the F5 BIG-IP® host’s and the ramp node’s IP addresses, respectively, on a shared, internal network.
2	An arbitrary, non-conflicting IP address for the F5® host’s end of the ipip tunnel.
3	The overlay network CIDR that the OpenShift SDN uses to assign addresses to pods.

Delete any old route, self, tunnel and SNAT pool:

# tmsh delete net route $CLUSTER_NETWORK || true
# tmsh delete net self SDN || true
# tmsh delete net tunnels tunnel SDN || true
# tmsh delete ltm snatpool SDN_snatpool || true

Create the new tunnel, self, route and SNAT pool and use the SNAT pool in the virtual servers:

# tmsh create net tunnels tunnel SDN \
    \{ description "OpenShift SDN" local-address \
    $F5_IP profile ipip remote-address $RAMP_IP \}
# tmsh create net self SDN \{ address \
    ${TUNNEL_IP1}/24 allow-service all vlan SDN \}
# tmsh create net route $CLUSTER_NETWORK interface SDN
# tmsh create ltm snatpool SDN_snatpool members add { $TUNNEL_IP1 }
# tmsh modify ltm virtual  ose-vserver source-address-translation { type snat pool SDN_snatpool }
# tmsh modify ltm virtual  https-ose-vserver source-address-translation { type snat pool SDN_snatpool }

On the ramp node:

Set the following variables:
# F5_IP=10.3.89.66 # TUNNEL_IP1=10.3.91.216 # TUNNEL_IP2=10.3.91.217 (1)
1 A second, arbitrary IP address for the ramp node’s end of the ipip tunnel.
Delete any old tunnel:
# ip tunnel del tun1 || true

Create the ipip tunnel on the ramp node, using a suitable L2-connected interface (e.g., eth0):

# ip tunnel add tun1 mode ipip \
    remote $F5_IP dev eth0
# ip addr add $TUNNEL_IP2 dev tun1
# ip link set tun1 up
# ip route add $TUNNEL_IP1 dev tun1
# ping -c 5 $TUNNEL_IP1

SNAT the tunnel IP with an unused IP from the SDN subnet:

# source /etc/openshift-sdn/config.env
# subaddr=$(echo $OPENSHIFT_SDN_TAP1_ADDR | cut -d "." -f 1,2,3)
# export RAMP_SDN_IP=${subaddr}.254

Assign this RAMP_SDN_IP as an additional address to tun0 (the local SDN’s gateway):
# ip addr add ${RAMP_SDN_IP} dev tun0

Modify the OVS rules for SNAT:

# ovs-ofctl -O OpenFlow13 add-flow br0 \
    "cookie=0x999,ip,nw_src=${TUNNEL_IP1},actions=mod_nw_src:${RAMP_SDN_IP},resubmit(,0)"
# ovs-ofctl -O OpenFlow13 add-flow br0 \
    "cookie=0x999,ip,nw_dst=${RAMP_SDN_IP},actions=mod_nw_dst:${TUNNEL_IP1},resubmit(,0)"
# ovs-ofctl -O OpenFlow13 add-flow br0 \
    "cookie=0x999, table=0, arp, arp_tpa=${RAMP_SDN_IP}, actions=output:2"

Mark the ramp node as an unschedulable node so that no pods end up on the ramp node itself:
$ oadm manage-node <ramp_node_hostname> --schedulable=false

Configuring a Highly-Available Ramp Node

You can use OpenShift’s ipfailover feature, which uses keepalived internally, to make the ramp node highly available from F5 BIG-IP®'s point of view. To do so, first bring up two nodes, for example called ramp-node-1 and ramp-node-2, on the same L2 subnet.

Then, choose some unassigned IP address from within the same subnet to use for your virtual IP, or VIP. This will be set as the RAMP_IP variable with which you will configure your tunnel on F5 BIG-IP®.

For example, suppose you are using the 10.20.30.0/24 subnet for your ramp nodes, and you have assigned 10.20.30.2 to ramp-node-1 and 10.20.30.3 to ramp-node-2. For your VIP, choose some unassigned address from the same 10.20.30.0/24 subnet, for example 10.20.30.4. Then, to configure ipfailover, mark both nodes with a label, such as f5rampnode:

$ oc label node ramp-node-1 f5rampnode=true
$ oc label node ramp-node-2 f5rampnode=true

Similar to instructions from the ipfailover documentation, you must now create a service account and add it to the privileged SCC. First, create the f5ipfailover service account:

$ echo '
    { "kind": "ServiceAccount",
      "apiVersion": "v1",
      "metadata": { "name": "f5ipfailover" }
    }
  ' | oc create -f -

Next, you can manually edit the privileged SCC and add the f5ipfailover service account, or you can script editing the privileged SCC if you have jq installed. To manually edit the privileged SCC, run:

$ oc edit scc privileged

Then add the f5ipfailover service account in form system:serviceaccount:<project>:<name> to the users section:

...
users:
- system:serviceaccount:openshift-infra:build-controller
- system:serviceaccount:default:router
- system:serviceaccount:default:f5ipfailover

Alternatively, to script editing privileged SCC if you have jq installed, run:

$ oc get scc privileged -o json |
    jq '.users |= .+ ["system:serviceaccount:default:f5ipfailover"]' |
      oc replace scc -f -

Finally, configure ipfailover using your chosen VIP (the RAMP_IP variable) and the f5ipfailover service account, assigning the VIP to your two nodes using the f5rampnode label you set earlier:

# RAMP_IP=10.20.30.4
# IFNAME=eth0 (1)
# oadm ipfailover <name-tag> \
    --virtual-ips=$RAMP_IP \
    --interface=$IFNAME \
    --watch-port=0 \
    --replicas=2 \
    --service-account=f5ipfailover  \
    --selector='f5rampnode=true'

1	The interface where `RAMP_IP` should be configured.

With the above setup, the VIP (the RAMP_IP variable) is automatically re-assigned when the ramp node host that currently has it assigned fails.