blob: fd14b2be86bc2f7e1ae4b50a18e1f0c1ead8fc77 [file] [log] [blame]
============================
Network Management (revised)
============================
.. contents:: :depth: 4
This is a design document detailing how to extend the existing network
management and make it more flexible and able to deal with more generic
use cases.
Current state and shortcomings
------------------------------
Currently in Ganeti, networks are tightly connected with IP pools,
since creation of a network implies the existence of one subnet
and the corresponding IP pool. This design does not allow common
scenarios like:
- L2 only networks
- IPv6 only networks
- Networks without an IP pool
- Networks with an IPv6 pool
- Networks with multiple IP pools (alternative to externally reserving
IPs)
Additionally one cannot have multiple IP pools inside one network.
Finally, from the instance perspective, a NIC cannot get more than one
IPs (v4 and v6).
Proposed changes
----------------
In order to deal with the above shortcomings, we propose to extend
the existing networks in Ganeti and support:
a) Networks with multiple subnets
b) Subnets with multiple IP pools
c) NICs with multiple IPs from various subnets of a single network
These changes bring up some design and implementation issues that we
discuss in the following sections.
Semantics
++++++++++
Quoting the initial network management design doc "an IP pool consists
of two bitarrays. Specifically the ``reservations`` bitarray which holds
all IP addresses reserved by Ganeti instances and the ``external
reservations`` bitarray with all IPs that are excluded from the IP pool
and cannot be assigned automatically by Ganeti to instances (via
ip=pool)".
Without violating those semantics, here, we clarify the following
definitions.
**network**: A cluster level taggable configuration object with a
user-provider name, (e.g. network1, network2), UUID and MAC prefix.
**L2**: The `mode` and `link` with which we connect a network to a
nodegroup. A NIC attached to a network will inherit this info, just like
connecting an Ethernet cable to a physical NIC. In this sense we only
have one L2 info per NIC.
**L3**: A CIDR and a gateway related to the network. Since a NIC can
have multiple IPs on the same cable each network can have multiple L3
info with the restriction that they do not overlap with each other.
The gateway is optional (just like with current implementation). No
gateway can be used for private networks that do not have a default
route.
**subnet**: A subnet is the above L3 info plus some additional information
(see below).
**ip**: A valid IP should reside in a network's subnet, and should not
be used by more than one instance. An IP can be either obtained dynamically
from a pool or requested explicitly from a subnet (or a pool).
**range**: Sequential IPs inside one subnet calculated either from the
first IP and a size (e.g. start=192.0.2.10, size=10) or the first IP and
the last IP (e.g. start=192.0.2.10, end=192.0.2.19). A single IP can
also be thought of as an IP range with size=1 (see configuration
changes).
**reservations**: All IPs that are used by instances in the cluster at
any time.
**external reservations**: All IPs that are supposed to be reserved
by the admin for either some external component or specific instances.
If one instance requests an external IP explicitly (ip=192.0.2.100),
Ganeti will allow the operation only if ``--force`` is given. Still, the
admin can externally reserve an IP that is already in use by an
instance, as happens now. This helps to reserve an IP for future use and
at the same time prevent any possible race between the instance that
releases this IP and another that tries to retrieve it.
**pool**: A (range, reservations, name) tuple from which instances can
dynamically obtain an IP. Reservations is a bitarray with
length the size of the range, and is needed so that we know which IPs
are available at any time without querying all instances. The use of
name is explained below. A subnet can have multiple pools.
Split L2 from L3
++++++++++++++++
Currently networks in Ganeti do not separate L2 from L3. This means
that one cannot use L2 only networks. The reason is because the CIDR
(passed currently with the ``--network`` option) and the derived IP pool
are mandatory. This design makes L3 info optional. This way we can have
an L2 only network just by connecting a Ganeti network to a nodegroup
with the desired `mode` and `link`. Then one could add one or more subnets
to the existing network.
Multiple Subnets per Network
++++++++++++++++++++++++++++
Currently the IPv4 CIDR is mandatory for a network. Also a network can
obtain at most one IPv4 CIDR and one IPv6 CIDR. These restrictions will
be lifted.
This design doc introduces support for multiple subnets per network. The
L3 info will be moved inside the subnet. A subnet will have a `name` and
a `uuid` just like NIC and Disk config objects. Additionally it will contain
the `dhcp` flag which is explained below, and the `pools` and `external`
fields which are mentioned in the next section. Only the `cidr` will be
mandatory.
Any subnet related actions will be done via the new ``--subnet`` option.
Its syntax will be similar to ``--net``.
The network's subnets must not overlap with each other. Logic will
validate any operations related to reserving/releasing of IPs and check
whether a requested IP is included inside one of the network's subnets.
Just like currently, the L3 info will be exported to NIC configuration
hooks and scripts as environment variables. The example below adds
subnets to a network:
::
gnt-network modify --subnet add:cidr=10.0.0.0/24,gateway=10.0.0.1,dhcp=true net1
gnt-network modify --subnet add:cidr=2001::/64,gateway=2001::1,dhcp=true net1
To remove a subnet from a network one should use:
::
gnt-network modify --subnet some-ident:remove network1
where some-ident can be either a CIDR, a name or a UUID. Ganeti will
allow this operation only if no instances use IPs from this subnet.
Since DHCP is allowed only for a single CIDR on the same cable, the
subnet must have a `dhcp` flag. Logic must not allow more that one
subnets of the same version (4 or 6) in the same network to have DHCP enabled.
To modify a subnet's name or the dhcp flag one could use:
::
gnt-network modify --subnet some-ident:modify,dhcp=false,name=foo network1
This would search for a registered subnet that matches the identifier,
disable DHCP on it and change its name.
The ``dhcp`` parameter is used only for validation purposes and does not
make Ganeti starting a DHCP service. It will just be exported to
external scripts (ifup and hooks) and handled accordingly.
Changing the CIDR or the gateway of a subnet should also be supported.
::
gnt-network modify --subnet some-ident:modify,cidr=192.0.2.0/22 net1
gnt-network modify --subnet some-ident:modify,cidr=192.0.2.32/28 net1
gnt-network modify --subnet some-ident:modify,gateway=192.0.2.40 net1
Before expanding a subnet logic should should check for overlapping
subnets. Shrinking the subnet should be allowed only if the ranges
that are about to be trimmed are not included either in pool
reservations or external ranges.
Multiple IP pools per Subnet
++++++++++++++++++++++++++++
Currently IP pools are automatically created during network creation and
include the whole subnet. Some IPs can be excluded from the pool by
passing them explicitly with ``--add-reserved-ips`` option.
Still for IPv6 subnets or even big IPv4 ones this might be insufficient.
It is impossible to have two bitarrays for a /64 prefix. Even for IPv4
networks a /20 subnet currently requires 8K long bitarrays. And the
second 4K is practically useless since the external reservations are way
less than the actual reservations.
This design extract IP pool management from the network logic, and pools
will become optional. Currently the pool is created based on the
network's CIDR. With multiple subnets per network, we should be able to
create and add IP pools to a network (and eventually to the
corresponding subnet). Each pool will have an optional user friendly
`name` so that the end user can refer to it (see instance related
operations).
The user will be able to obtain dynamically an IP only if we have
already defined a pool for a network's subnet. One would use ``ip=pool``
for the first available IP of the first available pool, or
``ip=some-pool-name`` for the first available IP of a specific pool.
Any pool related actions will be done via the new ``--pool`` option.
In order to add a pool a relevant subnet should pre-exist. Overlapping
pools won't be allowed. For example:
::
gnt-network modify --pool add:192.0.2.10-192.0.2.100,name=pool1 net1
gnt-network modify --pool add:10.0.0.7-10.0.0.20 net1
gnt-network modify --pool add:10.0.0.100 net1
will first parse and find the ranges. Then for each range, Ganeti will
try to find a matching subnet meaning that a pool must be a subrange of
the subnet. If found, the range with empty reservations will be appended
to the list of the subnet's pools. Moreover, logic must be added to
reserve the IPs that are currently in use by instances of this network.
Adding a pool can be easier if we associate it directly with a subnet.
For example on could use the following shortcuts:
::
gnt-network modify --subnet add:cidr=10.0.0.0/27,pool net1
gnt-network modify --pool add:subnet=some-ident
gnt-network modify --pool add:10.0.0.0/27 net1
During pool removal, logic should be added to split pools if ranges
given overlap existing ones. For example:
::
gnt-network modify --pool remove:192.0.2.20-192.0.2.50 net1
will split the pool previously added (10-100) into two new ones;
10-19 and 51-100. The corresponding bitarrays will be trimmed
accordingly. The name will be preserved.
The same things apply to external reservations. Just like now,
modifications will take place via the ``--add|remove-reserved-ips``
option. Logic must be added to support IP ranges.
::
gnt-network modify --add-reserved-ips 192.0.2.20-192.0.2.50 net1
Based on the aforementioned we propose the following changes:
1) Change the IP pool representation in config data.
Existing `reservations` and `external_reservations` bitarrays will be
removed. Instead, for each subnet we will have:
* `pools`: List of (IP range, reservations bitarray) tuples.
* `external`: List of IP ranges
For external ranges the reservations bitarray is not needed
since this will be all 1's.
A configuration example could be::
net1 {
subnets [
uuid1 {
name: subnet1
cidr: 192.0.2.0/24
pools: [
{range:Range(192.0.2.10, 192.0.2.15), reservations: 00000, name:pool1}
]
reserved: [192.0.2.15]
}
uuid2 {
name: subnet2
cidr: 10.0.0.0/24
pools: [
{range:10.0.0.8/29, reservations: 00000000, name:pool3}
{range:10.0.0.40-10.0.0.45, reservations: 000000, name:pool3}
]
reserved: [Range(10.0.0.8, 10.0.0.15), 10.2.4.5]
}
]
}
Range(start, end) will be some json representation of an IPRange().
We decide not to store external reservations as pools (and in the
same list) since we get the following advantages:
- Keep the existing semantics for pools and external reservations.
- Each list has similar entries: one has pools the other has ranges.
The pool must have a bitarray, and has an optional name. It is
meaningless to add a name and a bitarray to external ranges.
- Each list must not have overlapping ranges. Still external
reservations can overlap with pools.
- The --pool option supports add|remove|modify command just like
`--net` and `--disk` and operate on single entities (a restriction that
is not needed for external reservations).
- Another thing, and probably the most important, is that in order to
get the first available IP, only the reserved list must be checked for
conflicts. The ipaddr.summarize_address_range(first, last) could be very
helpful.
2) Change the network module logic.
The above changes should be done in the network module and be transparent
to the rest of the Ganeti code. If a random IP from the networks is
requested, Ganeti searches for an available IP from the first pool of
the first subnet. If it is full it gets to the next pool. Then to the
next subnet and so on. Of course the `external` IP ranges will be
excluded. If an IP is explicitly requested, Ganeti will try to find a
matching subnet. Its pools and external will be checked for
availability. All this logic will be extracted in a separate class
with helper methods for easier manipulation of IP ranges and
bitarrays.
Bitarray processing can be optimized too. The usage of bitarrays will
be reduced since (a) we no longer have `external_reservations` and (b)
pools will have shorter bitarrays (i.e. will not have to cover the whole
subnet). Besides that, we could keep the bitarrays in memory, so that
in most cases (e.g. adding/removing reservations, querying), we don't
keep converting strings to bitarrays and vice versa. Also, the Haskell
code could as well keep this in memory as a bitarray, and validate it
on load.
3) Changes in config module.
We should not have instances with the same IP inside the same network.
We introduce _AllIPs() helper config method that will hold all existing
(IP, network) tuples. Config logic will check this list as well
before passing it to TemporaryReservationManager.
4) Change the query mechanism.
Since we have more that one subnets the new `subnets` field will
include a list of:
* cidr: IPv4 or IPv6 CIDR
* gateway: IPv4 or IPv6 address
* dhcp: True or False
* name: The user friendly name for the subnet
Since we want to support small pools inside big subnets, current query
results are not practical as far as the `map` field is concerned. It
should be replaced with the new `pools` field for each subnet, which will
contain:
* start: The first IP of the pool
* end: The last IP of the pool
* map: A string with 'X' for reserved IPs (either external or not) and
with '.' for all available ones inside the pool
Multiple IPs per NIC
++++++++++++++++++++
Currently IP is a simple string inside the NIC object and there is a
one-to-one mapping between the `ip` and the `network` slots. The whole
logic behind this is that a NIC belongs to a network (cable) and
inherits its mode and link. This rational will not change.
Since this design adds support for multiple subnets per network, a NIC
must be able to obtain multiple IPs from various subnets of the same
network. Thus we change the `ip` slot into list.
We introduce a new `ipX` attribute. For backwards compatibility `ip`
will denote `ip0`.
During instance related operations one could use something like:
::
gnt-instance add --net 0:ip0=192.0.2.4,ip1=pool,ip2=some-pool-name,network=network1 inst1
gnt-instance add --net 0:ip=pool,network1 inst1
This will be parsed, converted to a proper list (e.g. ip = [192.0.2.4,
"pool", "some-pool-name"]) and finally passed to the corresponding opcode.
Based on the previous example, here the first IP will match subnet1, the
second IP will be retrieved from the first available pool of the first
available subnet, and the third from the pool with the some-pool name.
During instance modification, the `ip` option will refer to the first IP
of the NIC, whereas the `ipX` will refer to the X'th IP. As with NICs
we start counting from 0 so `ip1` will refer to the second IP. For example
one should pass:
::
--net 0:modify,ip1=1.2.3.10
to change the second IP of the first NIC to 1.2.3.10,
::
--net -1:add,ip0=pool,ip1=1.2.3.4,network=test
to add a new NIC with two IPs, and
::
--net 1:modify,ip1=none
to remove the second IP of the second NIC.
Configuration changes
---------------------
IPRange config object:
Introduce new config object that will hold ranges needed by pools, and
reservations. It will be either a tuple of (start, size, end) or a
simple string. The `end` is redundant and can derive from start and
size in runtime, but will appear in the representation for readability
and debug reasons.
Pool config object:
Introduce new config object to represent a single subnet's pool. It
will have the `range`, `reservations`, `name` slots. The range slot
will be an IPRange config object, the reservations a bitarray and the
name a simple string.
Subnet config object:
Introduce new config object with slots: `name`, `uuid`, `cidr`,
`gateway`, `dhcp`, `pools`, `external`. Pools is a list of Pool config
objects. External is a list of IPRange config objects. All ranges must
reside inside the subnet's CIDR. Only `cidr` will be mandatory. The
`dhcp` attribute will be False by default.
Network config objects:
The L3 and the IP pool representation will change. Specifically all
slots besides `name`, `mac_prefix`, and `tag` will be removed. Instead
the slot `subnets` with a list of Subnet config objects will be added.
NIC config objects:
NIC's network slot will be removed and the `ip` slot will be modified
to a list of strings.
KVM runtime files:
Any change done in config data must be done also in KVM runtime files.
For this purpose the existing _UpgradeSerializedRuntime() can be used.
Exported variables
------------------
The exported variables during instance related operations will be just
like Linux uses aliases for interfaces. Specifically:
``IP:i`` for the ith IP.
``NETWORK_*:i`` for the ith subnet. * is SUBNET, GATEWAY, DHCP.
In case of hooks those variables will be prefixed with ``INSTANCE_NICn``
for the nth NIC.
Backwards Compatibility
-----------------------
The existing networks representation will be internally modified.
They will obtain one subnet, and one pool with range the whole subnet.
During `gnt-network add` if the deprecated ``--network`` option is passed
will still create a network with one subnet, and one IP pool with the
size of the subnet. Otherwise ``--subnet`` and ``--pool`` options
will be needed.
The query mechanism will also include the deprecated `map` field. For the
newly created network this will contain only the mapping of the first
pool. The deprecated `network`, `gateway`, `network6`, `gateway6` fields
will point to the first IPv4 and IPv6 subnet accordingly.
During instance related operation the `ip` argument of the ``--net``
option will refer to the first IP of the NIC.
Hooks and scripts will still have the same environment exported in case
of single IP per NIC.
This design allows more fine-grained configurations which in turn yields
more flexibility and a wider coverage of use cases. Still basic cases
(the ones that are currently available) should be easy to set up.
Documentation will be enriched with examples for both typical and
advanced use cases of gnt-network.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End: