doc/security.rst - ganeti - Git at Google

 Security in Ganeti
 ==================

 Documents Ganeti version 2.16

 Ganeti was developed to run on internal, trusted systems. As such, the
 security model is all-or-nothing.

 Up to version 2.3 all Ganeti code ran as root. Since version 2.4 it is
 possible to run all daemons except the node daemon and the monitoring daemon
 as non-root users by specifying user names and groups at build time.
 The node daemon continues to require root privileges to create logical volumes,
 DRBD devices, start instances, etc. Cluster commands can be run as root or by
 users in a group specified at build time. The monitoring daemon requires root
 privileges in order to be able to access and present information that are only
 avilable to root (such as the output of the ``xm`` command of Xen).

 Host issues
 -----------

 For a host on which the Ganeti software has been installed, but not
 joined to a cluster, there are no changes to the system.

 For a host that has been joined to the cluster, there are very important
 changes:

 - The host will have its SSH host key replaced with the one of the
   cluster (which is the one the initial node had at the cluster
   creation)
 - A new public key will be added to root's ``authorized_keys`` file,
   granting root access to all nodes of the cluster. The private part of
   the key is also distributed to all nodes. Old files are renamed.
 - Communication between nodes is encrypted using SSL/TLS. A common key
   and certificate combo is shared between all nodes of the cluster.  At
   this time, no CA is used.
 - The Ganeti node daemon will accept RPC requests from any host that is
   master candidate within the cluster, and the operations it will
   do as a result of these requests are:

   - running commands under the ``/etc/ganeti/hooks`` directory
   - creating DRBD disks between it and the IP it has been told
   - overwrite a defined list of files on the host

 As you can see, as soon as a node is joined, it becomes equal to all
 other nodes in the cluster wrt to SSH and equal to all non-master
 candidate nodes wrt to RPC, and the security of the cluster is
 determined by the weakest node.

 Note that only the SSH key will allow other machines to run any command
 on this node; the RPC method will run only:

 - well defined commands to create, remove, activate logical volumes,
   drbd devices, start/stop instances, etc;
 - run well-defined SSH commands on other nodes in the cluster
 - scripts under the ``/etc/ganeti/hooks`` directory
 - scripts under the ``/etc/ganeti/restricted-commands`` directory, if
   this feature has been enabled at build time (see below)

 It is therefore important to make sure that the contents of the
 ``/etc/ganeti/hooks`` and ``/etc/ganeti/restricted-commands``
 directories are supervised and only trusted sources can populate them.

 Restricted commands
 ~~~~~~~~~~~~~~~~~~~

 The restricted commands feature is new in Ganeti 2.7. It enables the
 administrator to run any commands in the
 ``/etc/ganeti/restricted-commands`` directory, if the feature has been
 enabled at build time, subject to the following restrictions:

 - No parameters may be passed
 - No absolute or relative path may be passed, only a filename
 - The ``/etc/ganeti/restricted-commands`` directory must
   be owned by root:root and have mode 0755 or stricter
 - Executables must be regular files or symlinks, and must be executable
   by root:root

 Note that it's not possible to list the contents of the directory, and
 there is an intentional delay when trying to execute a non-existing
 command (to slow-down dictionary attacks).

 Since for Ganeti itself this functionality is not needed, and is only
 provided as a way to help administrate or recover nodes, it is a local
 site decision whether to enable or not the restricted commands feature.

 By default, this feature is disabled.


 Cluster issues
 --------------

 As mentioned above, there are multiple ways of communication between
 cluster nodes:

 - SSH-based, for high-volume traffic like image dumps or for low-level
   command, e.g. restarting the Ganeti node daemon
 - RPC communication between master and nodes
 - DRBD real-time disk replication traffic

 The SSH traffic is protected (after the initial login to a new node) by
 the cluster-wide shared SSH key.

 RPC communication between the master and nodes is protected using
 SSL/TLS encryption. The server must have must have the cluster-wide
 shared SSL/TLS certificate. When acting as a client, the nodes use an
 individual SSL/TLS certificate. On incoming requests, the server checks
 whether the client's certificate is that of a master candidate by
 verifying its finterprint to a list of known master candidate
 certificates. We decided not to use a CA (yet) to simplify the key
 handling.

 The DRBD traffic is not protected by encryption, as DRBD does not
 support this. It's therefore recommended to implement host-level
 firewalling or to use a separate range of IP addresses for the DRBD
 traffic (this is supported in Ganeti through the use of a secondary
 interface) which is not routed outside the cluster. DRBD connections are
 protected from erroneous connections to other machines (as may happen
 due to software issues), and from accepting connections from other
 machines, by using a shared secret, exchanged via RPC requests from the
 master to the nodes when configuring the device.

 Master daemon
 -------------

 The command-line tools to master daemon communication is done via a
 UNIX socket, whose permissions are reset to ``0660`` after listening but
 before serving requests. This permission-based protection is documented
 and works on Linux, but is not-portable; however, Ganeti doesn't work on
 non-Linux system at the moment.

 Luxi daemon
 -----------

 The ``luxid`` daemon (automatically enabled if ``confd`` is enabled at
 build time) serves local (UNIX socket) queries about the run-time
 configuration. Answering these means talking to other cluster nodes,
 exactly as ``masterd`` does. See the notes for ``masterd`` regarding
 permission-based protection.

 Conf daemon
 -----------

 In Ganeti 2.8, the ``confd`` daemon (if enabled at build time), serves
 network-originated queries about parts of the static cluster
 configuration.

 If Ganeti is not configured (at build time) to use separate users,
 ``confd`` has access to all Ganeti related files (including internal RPC
 SSL certificates). This makes it a bit more sensitive to bugs (a remote
 attacker could get direct access to the intra-cluster RPC), so to harden
 security it's recommended to:

 - disable confd at build time if it (and ``luxid``) is not needed in
   your setup.
 - configure Ganeti (at build time) to use separate users, so that the
   confd daemon doesn't also have access to the server SSL/TLS
   certificates.
 - add firewall rules to protect the ``confd`` port or bind it to a
   trusted address. Make sure that all nodes can access the daemon, as
   the monitoring daemon requires it.

 Monitoring daemon
 -----------------

 The monitoring daemon provides information about the status and the
 performance of the cluster over HTTP.
 It is currently unencrypted and non-authenticated, therefore it is strongly
 advised to set proper firewalling rules to prevent unwanted access.

 The monitoring daemon runs as root, because it needs to be able to access
 privileged information (such as the state of the instances as provided by
 the Xen hypervisor). Nevertheless, the security implications are mitigated
 by the fact that the agent only provides reporting functionalities,
 without the ability to actually modify the state of the cluster.

 Remote API
 ----------

 Starting with Ganeti 2.0, Remote API traffic is encrypted using SSL/TLS
 by default. It supports Basic authentication as per :rfc:`2617`. Users
 can be granted different capabilities. Details can be found in the
 :ref:`RAPI documentation <rapi-users>`.

 Paths for certificate, private key and CA files required for SSL/TLS
 will be set at source configure time. Symlinks or command line
 parameters may be used to use different files.

 The RAPI binds to all interfaces by default, and allows read-only
 requests without the need for authentication. In the case that one of
 the interfaces RAPI binds to is publicly exposed, this will allow
 anyone in the world to read the state of the cluster, divulging
 potentially useful data such as the names of instances, their IP
 addresses, etc. Since the RAPI daemon resides on the master node as
 well, DoS attacks can result in Ganeti outages or issues with instances
 located on the master node.

 We recommend that you reduce the attack surface by either placing RAPI
 in an environment where you can control access to it, or should you
 need to expose it publicly, use various RAPI daemon options to lock
 functionality down to only what you need. RAPI daemon options are best
 added to ``/etc/default/ganeti``, the ``RAPI_ARGS`` variable. Some
 examples of situations where you might want to expose the RAPI are
 cross-cluster instance moves, which can be done only via the RAPI.

 If you do not use RAPI at all, we recommend that you lock it down by
 binding it to the loopback interface. This can be done by passing the
 ``-b 127.0.0.1`` parameter to the RAPI daemon. Preventing the RAPI
 from starting or making it unreachable on the master node is not
 recommended, as the watcher performs health checks and will attempt to
 restart the daemon repeatedly.

 If you intend to use the RAPI and to expose it to the public, make sure
 to use the ``--require-authentication`` flag, disabling anonymous HTTP
 requests.

 Ganeti currently cannot protect users adequately from DoS attacks based
 on client-side HTTPS parameter renegotiation due to the Python OpenSSL
 library lacking necessary features. To protect yourself from these, the
 use of a HTTPS proxy handling this correctly is needed (e.g. nginx).
 Useful options for setting RAPI up for cooperation with the proxy are:

 - ``-p PORT`` for allowing the default RAPI port to be used by the
   proxy
 - ``--no-ssl`` to disable SSL as it will be handled by the proxy anyway

 Inter-cluster instance moves
 ----------------------------

 To move instances between clusters, different clusters must be able to
 communicate with each other over a secure channel. Up to and including
 Ganeti 2.1, clusters were self-contained entities and had no knowledge
 of other clusters. With Ganeti 2.2, clusters can exchange data if tokens
 (an encryption certificate) was exchanged by a trusted third party
 before.

 KVM Security
 ------------

 When running KVM instances under Ganeti three security models ara
 available: "none", "user" and "pool".

 Under security model "none" instances run by default as root. This means
 that, if an instance gets jail broken, it will be able to own the host
 node, and thus the ganeti cluster. This is the default model, and the
 only one available before Ganeti 2.1.2.

 Under security model "user" an instance is run as the user specified by
 the hypervisor parameter "security_domain". This makes it easy to run
 all instances as non privileged users, and allows one to manually
 allocate specific users to specific instances or sets of instances. If
 the specified user doesn't have permissions a jail broken instance will
 need some local privilege escalation before being able to take over the
 node and the cluster. It's possible though for a jail broken instance to
 affect other ones running under the same user.

 Under security model "pool" a global cluster-level uid pool is used to
 start each instance on the same node under a different user. The uids in
 the cluster pool can be set with ``gnt-cluster init`` and ``gnt-cluster
 modify``, and must correspond to existing users on all nodes. Ganeti
 will then allocate one to each instance, as needed. This way a jail
 broken instance won't be able to affect any other. Since the users are
 handed out by ganeti in a per-node randomized way, in this mode there is
 no way to make sure a particular instance is always run as a certain
 user. Use mode "user" for that.

 In addition to these precautions, if you want to avoid instances sending
 traffic on your node network, you can use an iptables rule such as::

   iptables -A OUTPUT -m owner --uid-owner <uid>[-<uid>] -j LOG \
     --log-prefix "ganeti uid pool user network traffic"
   iptables -A OUTPUT -m owner --uid-owner <uid>[-<uid>] -j DROP

 This won't affect regular instance traffic (that comes out of the tapX
 allocated to the instance, and can be filtered or subject to appropriate
 policy routes) but will stop any user generated traffic that might come
 from a jailbroken instance.

 .. vim: set textwidth=72 :
 .. Local Variables:
 .. mode: rst
 .. fill-column: 72
 .. End:
	Security in Ganeti
	==================

	Documents Ganeti version 2.16

	Ganeti was developed to run on internal, trusted systems. As such, the
	security model is all-or-nothing.

	Up to version 2.3 all Ganeti code ran as root. Since version 2.4 it is
	possible to run all daemons except the node daemon and the monitoring daemon
	as non-root users by specifying user names and groups at build time.
	The node daemon continues to require root privileges to create logical volumes,
	DRBD devices, start instances, etc. Cluster commands can be run as root or by
	users in a group specified at build time. The monitoring daemon requires root
	privileges in order to be able to access and present information that are only
	avilable to root (such as the output of the ``xm`` command of Xen).

	Host issues
	-----------

	For a host on which the Ganeti software has been installed, but not
	joined to a cluster, there are no changes to the system.

	For a host that has been joined to the cluster, there are very important
	changes:

	- The host will have its SSH host key replaced with the one of the
	cluster (which is the one the initial node had at the cluster
	creation)
	- A new public key will be added to root's ``authorized_keys`` file,
	granting root access to all nodes of the cluster. The private part of
	the key is also distributed to all nodes. Old files are renamed.
	- Communication between nodes is encrypted using SSL/TLS. A common key
	and certificate combo is shared between all nodes of the cluster. At
	this time, no CA is used.
	- The Ganeti node daemon will accept RPC requests from any host that is
	master candidate within the cluster, and the operations it will
	do as a result of these requests are:

	- running commands under the ``/etc/ganeti/hooks`` directory
	- creating DRBD disks between it and the IP it has been told
	- overwrite a defined list of files on the host

	As you can see, as soon as a node is joined, it becomes equal to all
	other nodes in the cluster wrt to SSH and equal to all non-master
	candidate nodes wrt to RPC, and the security of the cluster is
	determined by the weakest node.

	Note that only the SSH key will allow other machines to run any command
	on this node; the RPC method will run only:

	- well defined commands to create, remove, activate logical volumes,
	drbd devices, start/stop instances, etc;
	- run well-defined SSH commands on other nodes in the cluster
	- scripts under the ``/etc/ganeti/hooks`` directory
	- scripts under the ``/etc/ganeti/restricted-commands`` directory, if
	this feature has been enabled at build time (see below)

	It is therefore important to make sure that the contents of the
	``/etc/ganeti/hooks`` and ``/etc/ganeti/restricted-commands``
	directories are supervised and only trusted sources can populate them.

	Restricted commands
	~~~~~~~~~~~~~~~~~~~

	The restricted commands feature is new in Ganeti 2.7. It enables the
	administrator to run any commands in the
	``/etc/ganeti/restricted-commands`` directory, if the feature has been
	enabled at build time, subject to the following restrictions:

	- No parameters may be passed
	- No absolute or relative path may be passed, only a filename
	- The ``/etc/ganeti/restricted-commands`` directory must
	be owned by root:root and have mode 0755 or stricter
	- Executables must be regular files or symlinks, and must be executable
	by root:root

	Note that it's not possible to list the contents of the directory, and
	there is an intentional delay when trying to execute a non-existing
	command (to slow-down dictionary attacks).

	Since for Ganeti itself this functionality is not needed, and is only
	provided as a way to help administrate or recover nodes, it is a local
	site decision whether to enable or not the restricted commands feature.

	By default, this feature is disabled.


	Cluster issues
	--------------

	As mentioned above, there are multiple ways of communication between
	cluster nodes:

	- SSH-based, for high-volume traffic like image dumps or for low-level
	command, e.g. restarting the Ganeti node daemon
	- RPC communication between master and nodes
	- DRBD real-time disk replication traffic

	The SSH traffic is protected (after the initial login to a new node) by
	the cluster-wide shared SSH key.

	RPC communication between the master and nodes is protected using
	SSL/TLS encryption. The server must have must have the cluster-wide
	shared SSL/TLS certificate. When acting as a client, the nodes use an
	individual SSL/TLS certificate. On incoming requests, the server checks
	whether the client's certificate is that of a master candidate by
	verifying its finterprint to a list of known master candidate
	certificates. We decided not to use a CA (yet) to simplify the key
	handling.

	The DRBD traffic is not protected by encryption, as DRBD does not
	support this. It's therefore recommended to implement host-level
	firewalling or to use a separate range of IP addresses for the DRBD
	traffic (this is supported in Ganeti through the use of a secondary
	interface) which is not routed outside the cluster. DRBD connections are
	protected from erroneous connections to other machines (as may happen
	due to software issues), and from accepting connections from other
	machines, by using a shared secret, exchanged via RPC requests from the
	master to the nodes when configuring the device.

	Master daemon
	-------------

	The command-line tools to master daemon communication is done via a
	UNIX socket, whose permissions are reset to ``0660`` after listening but
	before serving requests. This permission-based protection is documented
	and works on Linux, but is not-portable; however, Ganeti doesn't work on
	non-Linux system at the moment.

	Luxi daemon
	-----------

	The ``luxid`` daemon (automatically enabled if ``confd`` is enabled at
	build time) serves local (UNIX socket) queries about the run-time
	configuration. Answering these means talking to other cluster nodes,
	exactly as ``masterd`` does. See the notes for ``masterd`` regarding
	permission-based protection.

	Conf daemon
	-----------

	In Ganeti 2.8, the ``confd`` daemon (if enabled at build time), serves
	network-originated queries about parts of the static cluster
	configuration.

	If Ganeti is not configured (at build time) to use separate users,
	``confd`` has access to all Ganeti related files (including internal RPC
	SSL certificates). This makes it a bit more sensitive to bugs (a remote
	attacker could get direct access to the intra-cluster RPC), so to harden
	security it's recommended to:

	- disable confd at build time if it (and ``luxid``) is not needed in
	your setup.
	- configure Ganeti (at build time) to use separate users, so that the
	confd daemon doesn't also have access to the server SSL/TLS
	certificates.
	- add firewall rules to protect the ``confd`` port or bind it to a
	trusted address. Make sure that all nodes can access the daemon, as
	the monitoring daemon requires it.

	Monitoring daemon
	-----------------

	The monitoring daemon provides information about the status and the
	performance of the cluster over HTTP.
	It is currently unencrypted and non-authenticated, therefore it is strongly
	advised to set proper firewalling rules to prevent unwanted access.

	The monitoring daemon runs as root, because it needs to be able to access
	privileged information (such as the state of the instances as provided by
	the Xen hypervisor). Nevertheless, the security implications are mitigated
	by the fact that the agent only provides reporting functionalities,
	without the ability to actually modify the state of the cluster.

	Remote API
	----------

	Starting with Ganeti 2.0, Remote API traffic is encrypted using SSL/TLS
	by default. It supports Basic authentication as per :rfc:`2617`. Users
	can be granted different capabilities. Details can be found in the
	:ref:`RAPI documentation <rapi-users>`.

	Paths for certificate, private key and CA files required for SSL/TLS
	will be set at source configure time. Symlinks or command line
	parameters may be used to use different files.

	The RAPI binds to all interfaces by default, and allows read-only
	requests without the need for authentication. In the case that one of
	the interfaces RAPI binds to is publicly exposed, this will allow
	anyone in the world to read the state of the cluster, divulging
	potentially useful data such as the names of instances, their IP
	addresses, etc. Since the RAPI daemon resides on the master node as
	well, DoS attacks can result in Ganeti outages or issues with instances
	located on the master node.

	We recommend that you reduce the attack surface by either placing RAPI
	in an environment where you can control access to it, or should you
	need to expose it publicly, use various RAPI daemon options to lock
	functionality down to only what you need. RAPI daemon options are best
	added to ``/etc/default/ganeti``, the ``RAPI_ARGS`` variable. Some
	examples of situations where you might want to expose the RAPI are
	cross-cluster instance moves, which can be done only via the RAPI.

	If you do not use RAPI at all, we recommend that you lock it down by
	binding it to the loopback interface. This can be done by passing the
	``-b 127.0.0.1`` parameter to the RAPI daemon. Preventing the RAPI
	from starting or making it unreachable on the master node is not
	recommended, as the watcher performs health checks and will attempt to
	restart the daemon repeatedly.

	If you intend to use the RAPI and to expose it to the public, make sure
	to use the ``--require-authentication`` flag, disabling anonymous HTTP
	requests.

	Ganeti currently cannot protect users adequately from DoS attacks based
	on client-side HTTPS parameter renegotiation due to the Python OpenSSL
	library lacking necessary features. To protect yourself from these, the
	use of a HTTPS proxy handling this correctly is needed (e.g. nginx).
	Useful options for setting RAPI up for cooperation with the proxy are:

	- ``-p PORT`` for allowing the default RAPI port to be used by the
	proxy
	- ``--no-ssl`` to disable SSL as it will be handled by the proxy anyway

	Inter-cluster instance moves
	----------------------------

	To move instances between clusters, different clusters must be able to
	communicate with each other over a secure channel. Up to and including
	Ganeti 2.1, clusters were self-contained entities and had no knowledge
	of other clusters. With Ganeti 2.2, clusters can exchange data if tokens
	(an encryption certificate) was exchanged by a trusted third party
	before.

	KVM Security
	------------

	When running KVM instances under Ganeti three security models ara
	available: "none", "user" and "pool".

	Under security model "none" instances run by default as root. This means
	that, if an instance gets jail broken, it will be able to own the host
	node, and thus the ganeti cluster. This is the default model, and the
	only one available before Ganeti 2.1.2.

	Under security model "user" an instance is run as the user specified by
	the hypervisor parameter "security_domain". This makes it easy to run
	all instances as non privileged users, and allows one to manually
	allocate specific users to specific instances or sets of instances. If
	the specified user doesn't have permissions a jail broken instance will
	need some local privilege escalation before being able to take over the
	node and the cluster. It's possible though for a jail broken instance to
	affect other ones running under the same user.

	Under security model "pool" a global cluster-level uid pool is used to
	start each instance on the same node under a different user. The uids in
	the cluster pool can be set with ``gnt-cluster init`` and ``gnt-cluster
	modify``, and must correspond to existing users on all nodes. Ganeti
	will then allocate one to each instance, as needed. This way a jail
	broken instance won't be able to affect any other. Since the users are
	handed out by ganeti in a per-node randomized way, in this mode there is
	no way to make sure a particular instance is always run as a certain
	user. Use mode "user" for that.

	In addition to these precautions, if you want to avoid instances sending
	traffic on your node network, you can use an iptables rule such as::

	iptables -A OUTPUT -m owner --uid-owner <uid>[-<uid>] -j LOG \
	--log-prefix "ganeti uid pool user network traffic"
	iptables -A OUTPUT -m owner --uid-owner <uid>[-<uid>] -j DROP

	This won't affect regular instance traffic (that comes out of the tapX
	allocated to the instance, and can be filtered or subject to appropriate
	policy routes) but will stop any user generated traffic that might come
	from a jailbroken instance.

	.. vim: set textwidth=72 :
	.. Local Variables:
	.. mode: rst
	.. fill-column: 72
	.. End: