Linux Audio

Check our new training course

Loading...
  1Open vSwitch datapath developer documentation
  2=============================================
  3
  4The Open vSwitch kernel module allows flexible userspace control over
  5flow-level packet processing on selected network devices.  It can be
  6used to implement a plain Ethernet switch, network device bonding,
  7VLAN processing, network access control, flow-based network control,
  8and so on.
  9
 10The kernel module implements multiple "datapaths" (analogous to
 11bridges), each of which can have multiple "vports" (analogous to ports
 12within a bridge).  Each datapath also has associated with it a "flow
 13table" that userspace populates with "flows" that map from keys based
 14on packet headers and metadata to sets of actions.  The most common
 15action forwards the packet to another vport; other actions are also
 16implemented.
 17
 18When a packet arrives on a vport, the kernel module processes it by
 19extracting its flow key and looking it up in the flow table.  If there
 20is a matching flow, it executes the associated actions.  If there is
 21no match, it queues the packet to userspace for processing (as part of
 22its processing, userspace will likely set up a flow to handle further
 23packets of the same type entirely in-kernel).
 24
 25
 26Flow key compatibility
 27----------------------
 28
 29Network protocols evolve over time.  New protocols become important
 30and existing protocols lose their prominence.  For the Open vSwitch
 31kernel module to remain relevant, it must be possible for newer
 32versions to parse additional protocols as part of the flow key.  It
 33might even be desirable, someday, to drop support for parsing
 34protocols that have become obsolete.  Therefore, the Netlink interface
 35to Open vSwitch is designed to allow carefully written userspace
 36applications to work with any version of the flow key, past or future.
 37
 38To support this forward and backward compatibility, whenever the
 39kernel module passes a packet to userspace, it also passes along the
 40flow key that it parsed from the packet.  Userspace then extracts its
 41own notion of a flow key from the packet and compares it against the
 42kernel-provided version:
 43
 44    - If userspace's notion of the flow key for the packet matches the
 45      kernel's, then nothing special is necessary.
 46
 47    - If the kernel's flow key includes more fields than the userspace
 48      version of the flow key, for example if the kernel decoded IPv6
 49      headers but userspace stopped at the Ethernet type (because it
 50      does not understand IPv6), then again nothing special is
 51      necessary.  Userspace can still set up a flow in the usual way,
 52      as long as it uses the kernel-provided flow key to do it.
 53
 54    - If the userspace flow key includes more fields than the
 55      kernel's, for example if userspace decoded an IPv6 header but
 56      the kernel stopped at the Ethernet type, then userspace can
 57      forward the packet manually, without setting up a flow in the
 58      kernel.  This case is bad for performance because every packet
 59      that the kernel considers part of the flow must go to userspace,
 60      but the forwarding behavior is correct.  (If userspace can
 61      determine that the values of the extra fields would not affect
 62      forwarding behavior, then it could set up a flow anyway.)
 63
 64How flow keys evolve over time is important to making this work, so
 65the following sections go into detail.
 66
 67
 68Flow key format
 69---------------
 70
 71A flow key is passed over a Netlink socket as a sequence of Netlink
 72attributes.  Some attributes represent packet metadata, defined as any
 73information about a packet that cannot be extracted from the packet
 74itself, e.g. the vport on which the packet was received.  Most
 75attributes, however, are extracted from headers within the packet,
 76e.g. source and destination addresses from Ethernet, IP, or TCP
 77headers.
 78
 79The <linux/openvswitch.h> header file defines the exact format of the
 80flow key attributes.  For informal explanatory purposes here, we write
 81them as comma-separated strings, with parentheses indicating arguments
 82and nesting.  For example, the following could represent a flow key
 83corresponding to a TCP packet that arrived on vport 1:
 84
 85    in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
 86    eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
 87    frag=no), tcp(src=49163, dst=80)
 88
 89Often we ellipsize arguments not important to the discussion, e.g.:
 90
 91    in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
 92
 93
 94Basic rule for evolving flow keys
 95---------------------------------
 96
 97Some care is needed to really maintain forward and backward
 98compatibility for applications that follow the rules listed under
 99"Flow key compatibility" above.
100
101The basic rule is obvious:
102
103    ------------------------------------------------------------------
104    New network protocol support must only supplement existing flow
105    key attributes.  It must not change the meaning of already defined
106    flow key attributes.
107    ------------------------------------------------------------------
108
109This rule does have less-obvious consequences so it is worth working
110through a few examples.  Suppose, for example, that the kernel module
111did not already implement VLAN parsing.  Instead, it just interpreted
112the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
113packet.  The flow key for any packet with an 802.1Q header would look
114essentially like this, ignoring metadata:
115
116    eth(...), eth_type(0x8100)
117
118Naively, to add VLAN support, it makes sense to add a new "vlan" flow
119key attribute to contain the VLAN tag, then continue to decode the
120encapsulated headers beyond the VLAN tag using the existing field
121definitions.  With this change, an TCP packet in VLAN 10 would have a
122flow key much like this:
123
124    eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
125
126But this change would negatively affect a userspace application that
127has not been updated to understand the new "vlan" flow key attribute.
128The application could, following the flow compatibility rules above,
129ignore the "vlan" attribute that it does not understand and therefore
130assume that the flow contained IP packets.  This is a bad assumption
131(the flow only contains IP packets if one parses and skips over the
132802.1Q header) and it could cause the application's behavior to change
133across kernel versions even though it follows the compatibility rules.
134
135The solution is to use a set of nested attributes.  This is, for
136example, why 802.1Q support uses nested attributes.  A TCP packet in
137VLAN 10 is actually expressed as:
138
139    eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
140    ip(proto=6, ...), tcp(...)))
141
142Notice how the "eth_type", "ip", and "tcp" flow key attributes are
143nested inside the "encap" attribute.  Thus, an application that does
144not understand the "vlan" key will not see either of those attributes
145and therefore will not misinterpret them.  (Also, the outer eth_type
146is still 0x8100, not changed to 0x0800.)
147
148Handling malformed packets
149--------------------------
150
151Don't drop packets in the kernel for malformed protocol headers, bad
152checksums, etc.  This would prevent userspace from implementing a
153simple Ethernet switch that forwards every packet.
154
155Instead, in such a case, include an attribute with "empty" content.
156It doesn't matter if the empty content could be valid protocol values,
157as long as those values are rarely seen in practice, because userspace
158can always forward all packets with those values to userspace and
159handle them individually.
160
161For example, consider a packet that contains an IP header that
162indicates protocol 6 for TCP, but which is truncated just after the IP
163header, so that the TCP header is missing.  The flow key for this
164packet would include a tcp attribute with all-zero src and dst, like
165this:
166
167    eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
168
169As another example, consider a packet with an Ethernet type of 0x8100,
170indicating that a VLAN TCI should follow, but which is truncated just
171after the Ethernet type.  The flow key for this packet would include
172an all-zero-bits vlan and an empty encap attribute, like this:
173
174    eth(...), eth_type(0x8100), vlan(0), encap()
175
176Unlike a TCP packet with source and destination ports 0, an
177all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
178VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
179attribute expressly to allow this situation to be distinguished.
180Thus, the flow key in this second example unambiguously indicates a
181missing or malformed VLAN TCI.
182
183Other rules
184-----------
185
186The other rules for flow keys are much less subtle:
187
188    - Duplicate attributes are not allowed at a given nesting level.
189
190    - Ordering of attributes is not significant.
191
192    - When the kernel sends a given flow key to userspace, it always
193      composes it the same way.  This allows userspace to hash and
194      compare entire flow keys that it may not be able to fully
195      interpret.