| =============================== |
| Ganeti OS installation redesign |
| =============================== |
| |
| .. contents:: :depth: 3 |
| |
| This is a design document detailing a new OS installation procedure, which is |
| more secure, able to provide more features and easier to use for many common |
| tasks w.r.t. the current one. |
| |
| Current state and shortcomings |
| ============================== |
| |
| As of Ganeti 2.10, each instance is associated with an OS definition. An OS |
| definition is a set of scripts (i.e., ``create``, ``export``, ``import``, |
| ``rename``) that are executed with root privileges on the primary host of the |
| instance. These scripts are responsible for performing all the OS-related |
| tasks, namely, create an instance, setup an operating system on the instance's |
| disks, export/import the instance, and rename the instance. |
| |
| These scripts receive, through environment variables, a fixed set of instance |
| parameters (such as, the hypervisor, the name of the instance, the number of |
| disks and their location) and a set of user defined parameters. Both the |
| instance and user defined parameters are written in the configuration file of |
| Ganeti, to allow future reinstalls of the instance, and in various log files, |
| namely: |
| |
| * node daemon log file: contains DEBUG strings of the ``/os_validate``, |
| ``/instance_os_add`` and ``/instance_start`` RPC calls. |
| |
| * master daemon log file: DEBUG strings related to the same RPC calls are stored |
| here as well. |
| |
| * commands log: the CLI commands that create a new instance, including their |
| parameters, are logged here. |
| |
| * RAPI log: the RAPI commands that create a new instance, including their |
| parameters, are logged here. |
| |
| * job logs: the job files stored in the job queue, or in its archive, contain |
| the parameters. |
| |
| The current situation presents a number of shortcomings: |
| |
| * Having the installation scripts run as root on the nodes does not allow |
| user-defined OS scripts, as they would pose a huge security risk. |
| Furthermore, even a script without malicious intentions might end up |
| disrupting a node because of due to a bug. |
| |
| * Ganeti cannot be used to create instances starting from user provided disk |
| images: even in the (hypothetical) case in which the scripts are completely |
| secure and run not by root but by an unprivileged user with only the power to |
| mount arbitrary files as disk images, this is still a security issue. It has |
| been proven that a carefully crafted file system might exploit kernel |
| vulnerabilities to gain control of the system. Therefore, directly mounting |
| images on the Ganeti nodes is not an option. |
| |
| * There is no way to inject files into an existing disk image. A common use case |
| is for the system administrator to provide a standard image of the system, to |
| be later personalized with the network configuration, private keys identifying |
| the machine, ssh keys of the users, and so on. A possible workaround would be |
| for the scripts to mount the image (only if this is trusted!) and to receive |
| the configurations and ssh keys as user defined OS parameters. Unfortunately, |
| this is also not an option for security sensitive material (such as the ssh |
| keys) because the OS parameters are stored in many places on the system, as |
| already described above. |
| |
| * Most other virtualization software allow only instance images, but no |
| installation scripts. This difference makes the interaction between Ganeti and |
| other software difficult. |
| |
| Proposed changes |
| ================ |
| |
| In order to fix the shortcomings of the current state, we plan to introduce the |
| following changes. |
| |
| OS parameter categories |
| +++++++++++++++++++++++ |
| |
| Change the OS parameters to have three categories: |
| |
| * ``public``: the current behavior. The parameter is logged and stored freely. |
| |
| * ``private``: the parameter is saved inside the Ganeti configuration (to allow |
| for instance reinstall) but it is not shown in logs, job logs, or passed back |
| via RAPI. |
| |
| * ``secret``: the parameter is not saved inside the Ganeti configuration. |
| Reinstalls are impossible unless the data is passed again. The parameter will |
| not appear in any log file. When a functionality is performed jointly by |
| multiple daemons (such as MasterD and LuxiD), currently Ganeti sometimes |
| serializes jobs on disk and later reloads them. Secret parameters will not be |
| serialized to disk. They will be passed around as part of the LUXI calls |
| exchanged by the daemons, and only kept in memory, in order to reduce their |
| accessibility as much as possible. In case of failure of the master node, |
| these parameters will be lost and cannot be recovered because they are not |
| serialized. As a result, the job cannot be taken over by the new master. This |
| is an expected and accepted side effect of jobs with secret parameters: if |
| they fail, they'll have to be restarted manually. |
| |
| Metadata |
| ++++++++ |
| |
| In order to allow metadata to be sent inside the instance, a communication |
| mechanism between the instance and the host will be created. This mechanism |
| will be bidirectional (e.g.: to allow the setup process going on inside the |
| instance to communicate its progress to the host). Each instance will have |
| access exclusively to its own metadata, and it will be only able to communicate |
| with its host over this channel. This is the approach followed the |
| ``cloud-init`` tool and more details will be provided in the `Communication |
| mechanism`_ and `Metadata service`_ sections. |
| |
| Installation procedure |
| ++++++++++++++++++++++ |
| |
| A new installation procedure will be introduced. There will be two sets of |
| parameters, namely, installation parameters, which are used mainly for installs |
| and reinstalls, and execution parameters, which are used in all the other runs |
| that are not part of an installation procedure. Also, it will be possible to |
| use an installation medium and/or run the OS scripts in an optional virtualized |
| environment, and optionally use a personalization package. This section details |
| all of these options. |
| |
| The set of installation parameters will allow, for example, to attach an |
| installation floppy/cdrom/network, change the boot device order, or specify a |
| disk image to be used. Through this set of parameters, the administrator will |
| have to provide the hypervisor a location for an installation medium for the |
| instance (e.g., a boot disk, a network image, etc). This medium will carry out |
| the installation of the instance onto the instance's disks and will then be |
| responsible for getting the parameters for configuring the instance, such as, |
| network interfaces, IP address, and hostname. These parameters are taken from |
| the metadata. The installation parameters will be stored in the configuration |
| of Ganeti and used in future reinstalls, but not during normal execution. |
| |
| The instance is reinstalled using the same installation parameters from the |
| first installation. However, it will be the administrator's responsibility to |
| ensure that the installation media is still available at the proper location |
| when a reinstall occurs. |
| |
| The parameter ``--os-parameters`` can still be used to specify the OS |
| parameters. However, without OS scripts, Ganeti cannot do more than a syntactic |
| check to validate the supplied OS parameter string. As a result, this string |
| will be passed directly to the instance as part of the metadata. If OS scripts |
| are used and the installation procedure is running inside a virtualized |
| environment, Ganeti will take these parameters from the metadata and pass them |
| to the OS scripts as environment variables. |
| |
| Ganeti allows the following installation options: |
| |
| * Use a disk image: |
| |
| Currently, it is already possible to specify an installation medium, such as, |
| a cdrom, but not a disk image. Therefore, a new parameter ``--os-image`` will |
| be used to specify the location of a disk image which will be dumped to the |
| instance's first disk before the instance is started. The location of the |
| image can be a URL and, if this is the case, Ganeti will download this image. |
| |
| * Run OS scripts: |
| |
| The parameter ``--os-type`` (short version: ``-o``), is currently used to |
| specify the OS scripts. This parameter will still be used to specify the OS |
| scripts with the difference that these scripts may optionally run inside a |
| virtualized environment for safety reasons, depending on whether they are |
| trusted or not. For more details on trusted and untrusted OS scripts, refer |
| to the `Installation process in a virtualized environment`_ section. Note |
| that this parameter will become optional thus allowing a user to create an |
| instance specifying only, for example, a disk image or a cdrom image to boot |
| from. |
| |
| * Personalization package |
| |
| As part of the instance creation command, it will be possible to indicate a |
| URL for a "personalization package", which is an archive containing a set of |
| files meant to be overlayed on top of the OS file system at the end of the |
| setup process and before the VM is started for the first time in normal mode. |
| Ganeti will provide a mechanism for receiving and unpacking this archive, |
| independently of whether the installation is being performed inside the |
| virtualized environment or not. |
| |
| The archive will be in TAR-GZIP format (with extension ``.tar.gz`` or |
| ``.tgz``) and contain the files according to the directory structure that will |
| be recreated on the installation disk. Files contained in this archive will |
| overwrite files with the same path created during the installation procedure |
| (if any). The URL of the "personalization package" will have to specify an |
| extension to identify the file format (in order to allow for more formats to |
| be supported in the future). The URL will be stored as part of the |
| configuration of the instance (therefore, the URL should not contain |
| confidential information, but the files there available can). |
| |
| It is up to the system administrator to ensure that a package is actually |
| available at that URL at install and reinstall time. The contents of the |
| package are allowed to change. E.g.: a system administrator might create a |
| package containing the private keys of the instance being created. When the |
| instance is reinstalled, a new package with new keys can be made available |
| there, thus allowing instance reinstall without the need to store keys. A |
| username and a password can be specified together with the URL. If the URL is |
| a HTTP(S) URL, they will be used as basic access authentication credentials to |
| access that URL. The username and password will not be saved in the config, |
| and will have to be provided again in case a reinstall is requested. |
| |
| The downloaded personalization package will not be stored locally on the node |
| for longer than it is needed while unpacking it and adding its files to the |
| instance being created. The personalization package will be overlayed on top |
| of the instance filesystem after the scripts that created it have been |
| executed. In order for the files in the package to be automatically overlayed |
| on top of the instance filesystem, it is required that the appliance is |
| actually able to mount the instance's disks. As a result, this will not work |
| for every filesystem. |
| |
| * Combine a disk image, OS scripts, and a personalization package |
| |
| It will possible to combine a disk image, OS scripts, and a personalization |
| package, both with or without a virtualized environment (see the exception |
| below). At least, an installation medium or OS scripts should be specified. |
| |
| The disk image of the actual virtual appliance, which bootstraps the virtual |
| environment used in the installation procedure, will be read only, so that a |
| pristine copy of the appliance can be started every time a new instance needs |
| to be created and to further increase security. The data the instance needs |
| to write at runtime will only be stored in RAM and disappear as soon as the |
| instance is stopped. |
| |
| The parameter ``--enable-safe-install=yes|no`` will be used to give the |
| administrator control over whether to use a virtualized environment for the |
| installation procedure. By default, a virtualized environment will be used. |
| Note that some feature combinations, such as, using untrusted scripts, will |
| require the virtualized environment. In this case, Ganeti will not allow |
| disabling the virtualized environment. |
| |
| Implementation |
| ============== |
| |
| The implementation of this design will happen as an ordered sequence of steps, |
| of increasing impact on the system and, in some cases, dependent on each other: |
| |
| #. Private and secret instance parameters |
| #. Communication mechanism between host and instance |
| #. Metadata service |
| #. Personalization package (inside a virtualization environment) |
| #. Instance creation via a disk image |
| #. Instance creation inside a virtualized environment |
| |
| Some of these steps need to be more deeply specified w.r.t. what is already |
| written in the `Proposed changes`_ Section. Extra details will be provided in |
| the following subsections. |
| |
| Communication mechanism |
| +++++++++++++++++++++++ |
| |
| The communication mechanism will be an exclusive, generic, bidirectional |
| communication channel between Ganeti hosts and guests. |
| |
| exclusive |
| The communication mechanism allows communication between a guest and its host, |
| but it does not allow a guest to communicate with other guests or reach the |
| outside world. |
| |
| generic |
| The communication mechanism allows a guest to reach any service on the host, |
| not just the metadata service. Examples of valid communication include, but |
| are not limited to, access to the metadata service, send commands to Ganeti, |
| request changes to parameters, such as, those related to the distribution |
| upgrades, and let Ganeti control a helper instance, such as, the one for |
| performing OS installs inside a safe environment. |
| |
| bidirectional |
| The communication mechanism allows communication to be initiated from either |
| party, namely, from a host to a guest or guest to host. |
| |
| Note that Ganeti will allow communication with any service (e.g., daemon) running |
| on the host and, as a result, Ganeti will not be responsible for ensuring that |
| only the metadata service is reachable. It is the responsibility of each system |
| administrator to ensure that the extra firewalling and routing rules specified |
| on the host provide the necessary protection on a given Ganeti installation and, |
| at the same time, do not accidentally override the behaviour hereby described |
| which makes the communication between the host and the guest exclusive, generic, |
| and bidirectional, unless intended. |
| |
| The communication mechanism will be enabled automatically during an installation |
| procedure that requires a virtualized environment, but, for backwards |
| compatibility, it will be disabled when the instance is running normally, unless |
| explicitly requested. Specifically, a new parameter ``--communication=yes|no`` |
| (short version: ``-C``) will be added to ``gnt-instance add`` and ``gnt-instance |
| modify``. This parameter will determine whether the communication mechanism is |
| enabled for a particular instance. The value of this parameter will be saved as |
| part of the instance's configuration. |
| |
| The communication mechanism will be implemented through network interfaces on |
| the host and the guest, and Ganeti will be responsible for the host side, |
| namely, creating a TAP interface for each guest and configuring these interfaces |
| to have name ``gnt.com.%d``, where ``%d`` is a unique number within the host |
| (e.g., ``gnt.com.0`` and ``gnt.com.1``), IP address ``169.254.169.254``, and |
| netmask ``255.255.255.255``. The interface's name allows DHCP servers to |
| recognize which interfaces are part of the communication mechanism. |
| |
| This network interface will be connected to the guest's last network interface, |
| which is meant to be used exclusively for the communication mechanism and is |
| defined after all the used-defined interfaces. The last interface was chosen |
| (as opposed to the first one, for example) because the first interface is |
| generally understood and the main gateway out, and also because it minimizes the |
| impact on existing systems, for example, in a scenario where the system |
| administrator has a running cluster and wants to enable the communication |
| mechanism for already existing instances, which might have been created with |
| older versions of Ganeti. Further, DBus should assist in keeping the guest |
| network interfaces more stable. |
| |
| On the guest side, each instance will have its own MAC address and IP address. |
| Both the guest's MAC address and IP address must be unique within a single |
| cluster. An IP is unique within a single cluster, and not within a single host, |
| in order to minimize disruption of connectivity, for example, during live |
| migration, in particular since an instance is not aware when it changes host. |
| Unfortunately, a side-effect of this decision is that a cluster can have a |
| maximum of a ``/16`` network allowed instances (with communication enabled). If |
| necessary to overcome this limit, it should be possible to allow different |
| networks to be configured link-local only. |
| |
| The guest will use the DHCP protocol on its last network interface to contact a |
| DHCP server running on the host and thus determine its IP address. The DHCP |
| server is configured, started, and stopped, by Ganeti and it will be listening |
| exclusively on the TAP network interfaces of the guests in order not to |
| interfere with a potential DHCP server running on the same host. Furthermore, |
| the DHCP server will only recognize MAC and IP address pairs that have been |
| approved by Ganeti. |
| |
| The TAP network interfaces created for each guest share the same IP address. |
| Therefore, it will be necessary to extend the routing table with rules specific |
| to each guest. This can be achieved with the following command, which takes the |
| guest's unique IP address and its TAP interface:: |
| |
| route add -host <ip> dev <ifname> |
| |
| This rule has the additional advantage of preventing guests from trying to lease |
| IP addresses from the DHCP server other than the own that has been assigned to |
| them by Ganeti. The guest could lie about its MAC address to the DHCP server |
| and try to steal another guest's IP address, however, this routing rule will |
| block traffic (i.e., IP packets carrying the wrong IP) from the DHCP server to |
| the malicious guest. Similarly, the guest could lie about its IP address (i.e., |
| simply assign a predefined IP address, perhaps from another guest), however, |
| replies from the host will not be routed to the malicious guest. |
| |
| This routing rule ensures that the communication channel is exclusive but, as |
| mentioned before, it will not prevent guests from accessing any service on the |
| host. It is the system administrator's responsibility to employ the necessary |
| ``iptables`` rules. In order to achieve this, Ganeti will provide ``ifup`` |
| hooks associated with the guest network interfaces which will give system |
| administrator's the opportunity to customize their own ``iptables``, if |
| necessary. Ganeti will also provide examples of such hooks. However, these are |
| meant to personalized to each Ganeti installation and not to be taken as |
| production ready scripts. |
| |
| For KVM, an instance will be started with a unique MAC address and the file |
| descriptor for the TAP network interface meant to be used by the communication |
| mechanism. Ganeti will be responsible for generating a unique MAC address for |
| the guest, opening the TAP interface, and passing its file descriptor to KVM:: |
| |
| kvm -net nic,macaddr=<mac> -net tap,fd=<tap-fd> ... |
| |
| For Xen, a network interface will be created on the host (using the ``vif`` |
| parameter of the Xen configuration file). Each instance will have its |
| corresponding ``vif`` network interface on the host. The ``vif-route`` script |
| of Xen might be helpful in implementing this. |
| |
| dnsmasq |
| +++++++ |
| |
| The previous section describes the communication mechanism and explains the role |
| of the DHCP server. Note that any DHCP server can be used in the implementation |
| of the communication mechanism. However, the DHCP server employed should not |
| violate the properties described in the previous section, which state that the |
| communication mechanism should be exclusive, generic, and bidirectional, unless |
| this is intentional. |
| |
| In our experiments, we have used dnsmasq. In this section, we describe how to |
| properly configure dnsmasq to work on a given Ganeti installation. This is |
| particularly important if, in this Ganeti installation, dnsmasq will share the |
| node with one or more DHCP servers running in parallel. |
| |
| First, it is important to become familiar with the operational modes of dnsmasq, |
| which are well explained in the `FAQ |
| <http://www.thekelleys.org.uk/dnsmasq/docs/FAQ>`_ under the question ``What are |
| these strange "bind-interface" and "bind-dynamic" options?``. The rest of this |
| section assumes the reader is familiar with these operational modes. |
| |
| bind-dynamic |
| dnsmasq SHOULD be configured in the ``bind-dynamic`` mode (if supported) in |
| order to allow other DHCP servers to run on the same node. In this mode, |
| dnsmasq can listen on the TAP interfaces for the communication mechanism by |
| listening on the TAP interfaces that match the pattern ``gnt.com.*`` (e.g., |
| ``interface=gnt.com.*``). For extra safety, interfaces matching the pattern |
| ``eth*`` and the name ``lo`` should be configured such that dnsmasq will |
| always ignore them (e.g., ``except-interface=eth*`` and |
| ``except-interface=lo``). |
| |
| bind-interfaces |
| dnsmasq MAY be configured in the ``bind-interfaces`` mode (if supported) in |
| order to allow other DHCP servers to run on the same node. Unfortunately, |
| because dnsmasq cannot dynamically adjust to TAP interfaces that are created |
| and destroyed by the system, dnsmasq must be restarted with a new |
| configuration file each time an instance is created or destroyed. |
| |
| Also, the interfaces cannot be patterns, such as, ``gnt.com.*``. Instead, the |
| interfaces must be explictly specified, for example, |
| ``interface=gnt.com.0,gnt.com.1``. Moreover, dnsmasq cannot bind to the TAP |
| interfaces if they have all the same IPv4 address. As a result, it is |
| necessary to configure these TAP interfaces to enable IPv6 and an IPv6 address |
| must be assigned to them. |
| |
| wildcard |
| dnsmasq CANNOT be configured in the ``wildcard`` mode if there is |
| (at least) another DHCP server running on the same node. |
| |
| Metadata service |
| ++++++++++++++++ |
| |
| An instance will be able to reach metadata service on ``169.254.169.254:80`` in |
| order to, for example, retrieve its metadata. This IP address and port were |
| chosen for compatibility with the OpenStack and Amazon EC2 metadata service. |
| The metadata service will be provided by a single daemon, which will determine |
| the source instance for a given request and reply with the metadata pertaining |
| to that instance. |
| |
| Where possible, the metadata will be provided in a way compatible with Amazon |
| EC2, at:: |
| |
| http://169.254.169.254/<version>/meta-data/* |
| |
| Ganeti-specific metadata, that does not fit this structure, will be provided |
| at:: |
| |
| http://169.254.169.254/ganeti/<version>/meta_data.json |
| |
| where ``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to |
| indicate the most recent available protocol version. |
| |
| If needed in the future, this structure also allows us to support OpenStack's |
| metadata at:: |
| |
| http://169.254.169.254/openstack/<version>/meta_data.json |
| |
| A bi-directional, pipe-like communication channel will also be provided. The |
| instance will be able to receive data from the host by a GET request at:: |
| |
| http://169.254.169.254/ganeti/<version>/read |
| |
| and to send data to the host by a POST request at:: |
| |
| http://169.254.169.254/ganeti/<version>/write |
| |
| As in a pipe, once the data are read, they will not be in the buffer anymore, so |
| subsequent GET requests to ``read`` will not return the same data. However, |
| unlike a pipe, it will not be possible to perform blocking I/O operations. |
| |
| The OS parameters will be accessible through a GET request at:: |
| |
| http://169.254.169.254/ganeti/<version>/os/parameters.json |
| |
| as a JSON serialized dictionary having the parameter name as the key, and the |
| pair ``(<value>, <visibility>)`` as the value, where ``<value>`` is the |
| user-provided value of the parameter, and ``<visibility>`` is either ``public``, |
| ``private`` or ``secret``. |
| |
| The installation scripts to be run inside the virtualized environment will be |
| available at:: |
| |
| http://169.254.169.254/ganeti/<version>/os/scripts/<script_name> |
| |
| where ``<script_name>`` is the name of the script. |
| |
| Rationale |
| --------- |
| |
| The choice of using a network interface for instance-host communication, as |
| opposed to VirtIO, XenBus or other methods, is due to the will of having a |
| generic, hypervisor-independent way of creating a communication channel, that |
| doesn't require unusual (para)virtualization drivers. |
| At the same time, a network interface was preferred over solutions involving |
| virtual floppy or USB devices because the latter tend to be detected and |
| configured by the guest operating systems, sometimes even in prominent positions |
| in the user interface, whereas it is fairly common to have an unconfigured |
| network interface in a system, usually without any negative side effects. |
| |
| Installation process in a virtualized environment |
| +++++++++++++++++++++++++++++++++++++++++++++++++ |
| |
| In the new OS installation scenario, we distinguish between trusted and |
| untrusted code. |
| |
| The trusted installation code maintains the behavior of the current one and |
| requires no modifications, with the scripts running on the node the instance is |
| being created on. The untrusted code is stored in a subdirectory of the OS |
| definition called ``untrusted``. This directory contains scripts that are |
| equivalent to the already existing ones (``create``, ``export``, ``import``, |
| ``rename``) but that will be run inside an virtualized environment, to protect |
| the host from malicious tampering. |
| |
| The ``untrusted`` code is meant to either be untrusted itself, or to be trusted |
| code running operations that might be dangerous (such as mounting a |
| user-provided image). |
| |
| By default, all new OS definitions will have to be explicitly marked as trusted |
| by the cluster administrator (with a new ``gnt-os modify`` command) before they |
| can run code on the host. Otherwise, only the untrusted part of the code will be |
| allowed to run, inside the virtual appliance. For backwards compatibility |
| reasons, when upgrading an existing cluster, all the installed OSes will be |
| marked as trusted, so that they can keep running with no changes. |
| |
| In order to allow for the highest flexibility, if both a trusted and an |
| untrusted script are provided for the same operation (i.e. ``create``), both of |
| them will be executed at the same time, one on the host, and one inside the |
| installation appliance. They will be allowed to communicate with each other |
| through the already described communication mechanism, in order to orchestrate |
| their execution (e.g.: the untrusted code might execute the installation, while |
| the trusted one receives status updates from it and delivers them to a user |
| interface). |
| |
| The cluster administrator will have an option to completely disable scripts |
| running on the host, leaving only the ones running in the VM. |
| |
| Ganeti will provide a script to be run at install time that can be used to |
| create the virtualized environment that will perform the OS installation of new |
| instances. |
| This script will build a debootstrapped basic Debian system including a software |
| that will read the metadata, setup the environment variables and launch the |
| installation scripts inside the virtualized environment. The script will also |
| provide hooks for personalization. |
| |
| It will also be possible to use other self-made virtualized environments, as |
| long as they connect to Ganeti over the described communication mechanism and |
| they know how to read and use the provided metadata to create a new instance. |
| |
| While performing an installation in the virtualized environment, a customizable |
| timeout will be used to detect possible problems with the installation process, |
| and to kill the virtualized environment. The timeout will be optional and set on |
| a cluster basis by the administrator. If set, it will be the total time allowed |
| to setup an instance inside the appliance. It is mainly meant as a safety |
| measure to prevent an instance taken over by malicious scripts to be available |
| for a long time. |
| |
| Alternatives to design and implementation |
| ========================================= |
| |
| This section lists alternatives to design and implementation, which came up |
| during the development of this design document, that will not be implemented. |
| Please read carefully through the limitations and security concerns of each of |
| these alternatives. |
| |
| Port forwarding in KVM |
| ++++++++++++++++++++++ |
| |
| The communication mechanism could have been implemented in KVM using guest port |
| forwarding, as opposed to network interfaces. There are two alternatives in |
| KVM's guest port forwarding, namely, creating a forwarding device, such as, a |
| TCP/IP connection, or executing a command. However, we have determined that |
| both of these options are not viable. |
| |
| A TCP/IP forwarding device can be created through the following KVM invocation:: |
| |
| kvm -net nic -net \ |
| user,restrict=on,net=169.254.0.0/16,host=169.254.169.253, |
| guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 ... |
| |
| This invocation even has the advantage that it can block undesired traffic |
| (i.e., traffic that is not explicitly specified in the arguments) and it can |
| remap ports, which would have allowed the metadata service daemon to run in port |
| 8080 instead of 80. However, in this scheme, KVM opens the TCP connection only |
| once, when it is started, and, if the connection breaks, KVM will not |
| reestablish the connection. Furthermore, opening the TCP connection only once |
| interferes with the HTTP protocol, which needs to dynamically establish and |
| close connections. |
| |
| The alternative to the TCP/IP forwarding device is to execute a command. The |
| KVM invocation for this is, for example, the following:: |
| |
| kvm -net nic -net \ |
| "user,restrict=on,net=169.254.0.0/16,host=169.254.169.253, |
| guestfwd=tcp:169.254.169.254:80-netcat 127.0.0.1 8080" ... |
| |
| The advantage of this approach is that the command is executed each time the |
| guest initiates a connection. This is the ideal situation, however, it is only |
| supported in KVM 1.2 and above, and, therefore, not viable because we want to |
| provide support for at least KVM version 1.0, which is the version provided by |
| Ubuntu LTS. |
| |
| Alternatives to the DHCP server |
| +++++++++++++++++++++++++++++++ |
| |
| There are alternatives to using the DHCP server, for example, by assigning a |
| fixed IP address to guests, such as, the IP address ``169.254.169.253``. |
| However, this introduces a routing problem, namely, how to route incoming |
| packets from the same source IP to the host. This problem can be overcome in a |
| number of ways. |
| |
| The first solution is to use NAT to translate the incoming guest IP address, for |
| example, ``169.254.169.253``, to a unique IP address, for example, |
| ``169.254.0.1``. Given that NAT through ``ip rule`` is deprecated, users can |
| resort to ``iptables``. Note that this has not yet been tested. |
| |
| Another option, which has been tested, but only in a prototype, is to connect |
| the TAP network interfaces of the guests to a bridge. The bridge takes the |
| configuration from the TAP network interfaces, namely, IP address |
| ``169.254.169.254`` and netmask ``255.255.255.255``, thus leaving those |
| interfaces without an IP address. Note that in this setting, guests will be |
| able to reach each other, therefore, if necessary, additional ``iptables`` rules |
| can be put in place to prevent it. |