doc/design-n-m-redundancy.rst - ganeti - Git at Google

 ===========================
 Checking for N+M redundancy
 ===========================

 .. contents:: :depth: 4

 This document describes how the level of redundancy is estimated
 in Ganeti.


 Current state and shortcomings
 ==============================

 Ganeti keeps the cluster N+1 redundant, also taking into account
 :doc:`design-shared-storage-redundancy`. In other words, Ganeti
 tries to keep the cluster in a state, where after failure of a single
 node, no matter which one, all instances can be started immediately.
 However, e.g., for planning
 maintenance, it is sometimes desirable to know from how many node
 losses the cluster can recover from. This is also useful information,
 when operating big clusters and expecting long times for hardware repair.


 Proposed changes
 ================

 Higher redundancy as a sequential concept
 -----------------------------------------

 The intuitive meaning of an N+M redundant cluster is that M nodes can
 fail without instances being lost. However, when DRBD is used, already
 failure of 2 nodes can cause complete loss of an instance. Therefore, the
 best we can hope for, is to be able to recover from M sequential failures.
 This intuition that a cluster is N+M redundant, if M nodes can fail one-by-one,
 leaving enough time for a rebalance in between, without losing instances, is
 formalized in the next definition.

 Definition of N+M redundancy
 ----------------------------

 We keep the definition of :doc:`design-shared-storage-redundancy`. Moreover,
 for M a non-negative integer, we define a cluster to be N+(M+2) redundant,
 if after draining any node the standard rebalancing procedure (as, e.g.,
 provided by `hbal`) will fully evacuate that node and result in an N+(M+1)
 redundant cluster.

 Independence of Groups
 ----------------------

 Immediately from the definition, we see that the redundancy level, i.e.,
 the maximal M such that the cluster is N+M redundant, can be computed
 in a group-by-group manner: the standard balancing algorithm will never
 move instances between node groups. The redundancy level of the cluster
 is then the minimum of the redundancy level of the independent groups.

 Estimation of the redundancy level
 ----------------------------------

 The definition of N+M redundancy requires to consider M failures in
 arbitrary order, thus considering super-exponentially many cases for
 large M. As, however, balancing moves instances anyway, the redundancy
 level mainly depends on the amount of node resources available to the
 instances in a node group. So we can get a good approximation of the
 redundancy level of a node group by only considering draining one largest
 node in that group. This is how Ganeti will estimate the redundancy level.

 Modifications to existing tools
 -------------------------------

 As redundancy levels higher than N+1 are mainly about planning capacity,
 they level of redundancy only needs to be computed on demand. Hence, we
 keep the tool changes minimal.

 - ``hcheck`` will report the level of redundancy for each node group as
   a new output parameter

 The rest of Ganeti will not be changed.
	===========================
	Checking for N+M redundancy
	===========================

	.. contents:: :depth: 4

	This document describes how the level of redundancy is estimated
	in Ganeti.


	Current state and shortcomings
	==============================

	Ganeti keeps the cluster N+1 redundant, also taking into account
	:doc:`design-shared-storage-redundancy`. In other words, Ganeti
	tries to keep the cluster in a state, where after failure of a single
	node, no matter which one, all instances can be started immediately.
	However, e.g., for planning
	maintenance, it is sometimes desirable to know from how many node
	losses the cluster can recover from. This is also useful information,
	when operating big clusters and expecting long times for hardware repair.


	Proposed changes
	================

	Higher redundancy as a sequential concept
	-----------------------------------------

	The intuitive meaning of an N+M redundant cluster is that M nodes can
	fail without instances being lost. However, when DRBD is used, already
	failure of 2 nodes can cause complete loss of an instance. Therefore, the
	best we can hope for, is to be able to recover from M sequential failures.
	This intuition that a cluster is N+M redundant, if M nodes can fail one-by-one,
	leaving enough time for a rebalance in between, without losing instances, is
	formalized in the next definition.

	Definition of N+M redundancy
	----------------------------

	We keep the definition of :doc:`design-shared-storage-redundancy`. Moreover,
	for M a non-negative integer, we define a cluster to be N+(M+2) redundant,
	if after draining any node the standard rebalancing procedure (as, e.g.,
	provided by `hbal`) will fully evacuate that node and result in an N+(M+1)
	redundant cluster.

	Independence of Groups
	----------------------

	Immediately from the definition, we see that the redundancy level, i.e.,
	the maximal M such that the cluster is N+M redundant, can be computed
	in a group-by-group manner: the standard balancing algorithm will never
	move instances between node groups. The redundancy level of the cluster
	is then the minimum of the redundancy level of the independent groups.

	Estimation of the redundancy level
	----------------------------------

	The definition of N+M redundancy requires to consider M failures in
	arbitrary order, thus considering super-exponentially many cases for
	large M. As, however, balancing moves instances anyway, the redundancy
	level mainly depends on the amount of node resources available to the
	instances in a node group. So we can get a good approximation of the
	redundancy level of a node group by only considering draining one largest
	node in that group. This is how Ganeti will estimate the redundancy level.

	Modifications to existing tools
	-------------------------------

	As redundancy levels higher than N+1 are mainly about planning capacity,
	they level of redundancy only needs to be computed on demand. Hence, we
	keep the tool changes minimal.

	- ``hcheck`` will report the level of redundancy for each node group as
	a new output parameter

	The rest of Ganeti will not be changed.