blob: 1440ab93d8442e344dd409f66bf52b83c8196bfc [file] [log] [blame]
===============
MacVTap support
===============
.. contents:: :depth: 3
This is a design document detailing the implementation of `MacVTap`
support in Ganeti. The initial implementation targets the KVM
hypervisor, but it is intended to be ported to the XEN hypervisor as
well.
Current state and shortcomings
==============================
Currently, Ganeti provides a number of options for networking a virtual
machine, that are the ``bridged``, ``routed``, and ``openvswitch``
modes. ``MacVTap``, is another virtual network interface in Linux, that
is not supported by Ganeti and that could be added to the currently
supported solutions. It is an interface that acts as a regular TUN/TAP
device, and thus it is transparently supported by QEMU. Because of its
design, it can greatly simplify Ganeti setups using bridged instances.
In brief, the MacVTap interface is based on the ``MacVLan`` Linux
driver, which basically allows a single physical interface to be
associated with multiple IPs and MAC addresses. It is meant to replace
the combination of the TUN/TAP and bridge drivers with a more
lightweight setup that doesn't require any extra configuration on the
host. MacVTap driver is supposed to be more efficient than using a
regular bridge. Unlike bridges, it doesn't need to do STP or to
discover/learn MAC addresses of other connected devices on a given
domain, as it it knows every MAC address it can receive. In fact, it
introduces a bridge-like behavior for virtual machines but without the
need to have a real bridge setup on the host. Instead, each virtual
interface extends an existing network device by attaching directly to
it, having its own MAC address, and providing a separate virtual
interface to be used by the userspace processes. The MacVTap MAC address
is used on the external network and the guest OS cannot spoof or change
that address.
Background
==========
This section provides some extra information for the MacVTap interface,
that we took into account for the rest of this design document.
MacVTap modes of operation
--------------------------
A MacVTap device can operate in one of four modes, just like the MacVLan
driver does. These modes determine how the tap endpoints communicate
between each other providing various levels of isolation between them.
Those modes are the following:
* `VEPA (Virtual Ethernet Port Aggregator) mode`: The default mode that
is compatible with virtualization-enabled switches. The communication
between endpoints on the same lower device, happens through the
external switch.
* `Bridge mode`: It works almost like a traditional bridge, connecting
all endpoints directly to each other.
* `Private mode`: An endpoint in this mode can never communicate to any
other endpoint on the same lower device.
* `Passthru mode`: This mode was added later to work on some limitations
on MacVLans (more details here_).
MacVTap internals
-----------------
The creation of a MacVTap device is *not* done by opening the
`/dev/net/tun` device and issuing a corresponding `ioctl()` to register
a network device as happens in tap devices. Instead, there are two ways
to create a MacVTap device. The first one is using the `rtnetlink(7)`
interface directly, just like the `libvirt` or the `iproute2` utilities
do, and the second one is to use the high-level `ip-link` command. Since
creating a MacVTap interface programmatically using the netlink protocol
is a bit more complicated than creating a normal TUN/TAP device, we
propose using the ip-link tool for the MacVTap handling, which it is
much simpler and straightforward in use, and also fulfills all our
needs. Additionally, since Ganeti already depends on `iproute2` being
installed in the system, this does not introduces an extra dependency.
The following example, creates a MacVTap device using the `ip-link`
tool, named `macvtap0`, operating in `bridge` mode, and which is using
`eth0` as its lower device:
::
ip link add link eth0 name macvtap0 address 1a:36:1b:aa:b3:77 type macvtap mode bridge
Once a MacVTap interface is created, an actual character device appears
under `/dev`, called ``/dev/tapXX``, where ``XX`` is the interface index
of the device.
Proposed changes
================
In order to be able to create instances using the MacVTap device driver,
we propose some modifications that affect the ``nicparams`` slot of the
Ganeti's configuration ``NIC`` object, and also the code part regarding
to the KVM hypervisor, as detailed in the following sections.
Configuration changes
---------------------
The nicparams ``mode`` attribute will be extended to support the
``macvtap`` mode. When using the MacVTap mode, the ``link`` attribute
will specify the network device where the MacVTap interfaces will be
attached to, the *lower device*. Note that the lower device should
exists, otherwise the operation will fail. If no link is specified, the
cluster-wide default NIC `link` param will be used instead.
We propose the MacVTap mode to be configurable, and so the nicparams
object will be extended with an extra slot named ``mvtap_mode``. This
parameter will only be used if the network mode is set to MacVTap since
it does not make sense in other modes, similarly to the `vlan` slot of
the `openvswitch` mode.
Below there is a snippet of some of the ``gnt-network`` commands'
output:
Network connection
~~~~~~~~~~~~~~~~~~
::
gnt-network connect -N mode=macvtap,link=eth0,mvtap_mode=bridge vtap-net vtap_group
Network listing
~~~~~~~~~~~~~~~
::
gnt-network list
Network Subnet Gateway MacPrefix GroupList
br-net 10.48.1.0/24 10.48.1.254 - default (bridged, br0, , )
vtap-net 192.168.100.0/24 192.168.100.1 - vtap_group (macvtap, eth0, , bridge)
Network information
~~~~~~~~~~~~~~~~~~~
::
gnt-network info
Network name: vtap-net
UUID: 4f139b48-3f08-46b1-911f-d37de7e12dcf
Serial number: 1
Subnet: 192.168.100.0/28
Gateway: 192.168.100.1
IPv6 Subnet: 2001:db8:2ffc::/64
IPv6 Gateway: 2001:db8:2ffc::1
Mac Prefix: None
size: 16
free: 10 (62.50%)
usage map:
0 XXXXX..........X 63
(X) used (.) free
externally reserved IPs:
192.168.100.0, 192.168.100.1, 192.168.100.15
connected to node groups:
vtap_group (mode:macvtap link:eth0 vlan: mvtap_mode:bridge)
used by 2 instances:
inst1.example.com: 0:192.168.100.2
inst2.example.com: 0:192.168.100.3
Hypervisor changes
------------------
A new method will be introduced in the KVM's `netdev.py` module, named
``OpenVTap``, similar to the ``OpenTap`` method, that will be
responsible for creating a MacVTap device using the `ip-link` command,
and returning its file descriptor. The ``OpenVtap`` method will receive
as arguments the network's `link`, the mode of the MacVTap device
(``mvtap_mode``), and also the ``interface name`` of the device to be
created, otherwise we will not be able to retrieve it, and so opening
the created device.
Since we want the names among the MacVTap devices to be unique on the
same node, we will make use of the existing ``_GenerateKvmTapName``
method to generate device names but with some modifications, to be
adapted to our needs. This method is actually a wrapper over the
``GenerateTapName`` method which currently is being used to generate TAP
interface names for NICs meant to be used in instance communication
using the ``gnt.com`` prefix. We propose extending this method to
generate names for the MacVTap interface too, using the ``vtap`` prefix.
To do so, we could add an extra boolean argument in that method, named
`inst_comm`, to differentiate the two cases, so that the method will
return the appropriate name depending on its usage. This argument will
be optional and defaulted to `True`, to not affect the existing API.
Currently, the `OpenTap` method handles the `vhost-net`, `mq`, and the
`vnet_hdr` features. The `vhost-net` feature will be normally supported
for the MacVTap devices too, and so is the `multiqueue` feature, which
can be enabled using the `numrxqueues` and `numtxqueues` parameters of
the `ip-link` command. The only drawback seems to be the `vnet_hdr`
feature modification. For a MacVTap device this flag is enabled by
default, and it can not be disabled if a user requests to.
A last hypervisor change will be the introduction of a new method named
``_RemoveStaleMacvtapDevs`` that will remove any remaining MacVTap
devices, and which is detailed in the following section.
Tools changes
-------------
Some of the Ganeti tools should also be extended to support MacVTap
devices. Those are the ``kvm-ifup`` and ``net-common`` scripts. These
modifications will include a new method named ``setup_macvtap`` that
will simply change the device status to `UP` just before and instance is
started:
::
ip link set $INTERFACE up
As mentioned in the `Background` section, MacVTap devices are
persistent. So, we have to manually delete the MacVTap device after an
instance shutdown. To do so, we propose creating a ``kvm-ifdown``
script, that will be invoked after an instance shutdown in order to
remove the relevant MacVTap devices. The ``kvm-ifdown`` script should
explicitly call the following commands and currently will be functional
for MacVTap NICs only:
::
ip link set $INTERFACE down
ip link delete $INTERFACE
To be able to call the `kvm-ifdown` script we should extend the KVM's
``_ConfigureNIC`` method with an extra argument that is the name of the
script to be invoked, instead of calling by default the `kvm-ifup`
script, as it currently happens.
The invocation of the `kvm-ifdown` script will be made through a
separate method that we will create, named ``_RemoveStaleMacvtapDevs``.
This method will read the NIC runtime files of an instance and will
remove any devices using the MacVTap interface. This method will be
included in the ``CleanupInstance`` method in order to cover all the
cases where an instance using MacVTap NICs needs to be cleaned up.
Besides the instance shutdown, there are a couple of cases where the
MacVTap NICs will need to be cleaned up too. In case of an internal
instance shutdown, where the ``kvmd`` is not enabled, the instance will
be in ``ERROR_DOWN`` state. In that case, when the instance is started
either by the `ganeti-watcher` or by the admin, the ``CleanupInstance``
method, and consequently the `kvm-ifdown` script, will not be called and
so the MacVTap NICs will have to manually be deleted. Otherwise starting
the instance will result in more than one MacVTap devices using the same
MAC address. An instance migration is another case where deleting an
instance will keep stale MacVTap devices on the source node. In order
to solve those potential issues, we will explicitly call the
``_RemoveStaleMacvtapDevs`` method after a successful instance migration
on the source node, and also before creating a new device for a NIC that
is using the MacVTap interface to remove any stale devices.
.. _here: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/61824/)
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End: