doc/design-multi-reloc.rst - ganeti - Git at Google

 ====================================
 Moving instances accross node groups
 ====================================

 This design document explains the changes needed in Ganeti to perform
 instance moves across node groups. Reader familiarity with the following
 existing documents is advised:

 - :doc:`Current IAllocator specification <iallocator>`
 - :doc:`Shared storage model in 2.3+ <design-shared-storage>`

 Motivation and and design proposal
 ==================================

 At the moment, moving instances away from their primary or secondary
 nodes with the ``relocate`` and ``multi-evacuate`` IAllocator calls
 restricts target nodes to those on the same node group. This ensures a
 mobility domain is never crossed, and allows normal operation of each
 node group to be confined within itself.

 It is desirable, however, to have a way of moving instances across node
 groups so that, for example, it is possible to move a set of instances
 to another group for policy reasons, or completely empty a given group
 to perform maintenance operations.

 To implement this, we propose the addition of new IAllocator calls to
 compute inter-group instance moves and group-aware node evacuation,
 taking into account mobility domains as appropriate. The interface
 proposed below should be enough to cover the use cases mentioned above.

 With the implementation of this design proposal, the previous
 ``multi-evacuate`` mode will be deprecated.

 .. _multi-reloc-detailed-design:

 Detailed design
 ===============

 All requests honor the groups' ``alloc_policy`` attribute.

 Changing instance's groups
 --------------------------

 Takes a list of instances and a list of node group UUIDs; the instances
 will be moved away from their current group, to any of the groups in the
 target list. All instances need to have their primary node in the same
 group, which may not be a target group. If the target group list is
 empty, the request is simply "change group" and the instances are placed
 in any group but their original one.

 Node evacuation
 ---------------

 Evacuates instances off their primary nodes. The evacuation mode
 can be given as ``primary-only``, ``secondary-only`` or
 ``all``. The call is given a list of instances whose primary nodes need
 to be in the same node group. The returned nodes need to be in the same
 group as the original primary node.

 .. _multi-reloc-result:

 Result
 ------

 In all storage models, an inter-group move can be modeled as a sequence
 of **replace secondary**, **migration** and **failover** operations
 (when shared storage is used, they will all be failover or migration
 operations within the corresponding mobility domain).

 The result of the operations described above must contain two lists of
 instances and a list of jobs (each of which is a list of serialized
 opcodes) to actually execute the operation. :doc:`Job dependencies
 <design-chained-jobs>` can be used to force jobs to run in a certain
 order while still making use of parallelism.

 The two lists of instances describe which instances could be
 moved/migrated and which couldn't for some reason ("unsuccessful"). The
 union of the instances in the two lists must be equal to the set of
 instances given in the original request. The successful list of
 instances contains elements as follows::

   (instance name, target group name, [chosen node names])

 The choice of names is simply for readability reasons (for example,
 Ganeti could log the computed solution in the job information) and for
 being able to check (manually) for consistency that the generated
 opcodes match the intended target groups/nodes. Note that for the
 node-evacuate operation, the group is not changed, but it should still
 be returned as such (as it's easier to have the same return type for
 both operations).

 The unsuccessful list of instances contains elements as follows::

   (instance name, explanation)

 where ``explanation`` is a string describing why the plugin was not able
 to relocate the instance.

 The client is given a list of job IDs (see the :doc:`design for
 LU-generated jobs <design-lu-generated-jobs>`) which it can watch.
 Failures should be reported to the user.

 .. highlight:: python

 Example job list::

   [
     # First job
     [
       { "OP_ID": "OP_INSTANCE_MIGRATE",
         "instance_name": "inst1.example.com",
       },
       { "OP_ID": "OP_INSTANCE_MIGRATE",
         "instance_name": "inst2.example.com",
       },
     ],
     # Second job
     [
       { "OP_ID": "OP_INSTANCE_REPLACE_DISKS",
         "depends": [
           [-1, ["success"]],
           ],
         "instance_name": "inst2.example.com",
         "mode": "replace_new_secondary",
         "remote_node": "node4.example.com",
       },
     ],
     # Third job
     [
       { "OP_ID": "OP_INSTANCE_FAILOVER",
         "depends": [
           [-2, []],
           ],
         "instance_name": "inst8.example.com",
       },
     ],
   ]

 Accepted opcodes:

 - ``OP_INSTANCE_FAILOVER``
 - ``OP_INSTANCE_MIGRATE``
 - ``OP_INSTANCE_REPLACE_DISKS``

 .. vim: set textwidth=72 :
 .. Local Variables:
 .. mode: rst
 .. fill-column: 72
 .. End:
	====================================
	Moving instances accross node groups
	====================================

	This design document explains the changes needed in Ganeti to perform
	instance moves across node groups. Reader familiarity with the following
	existing documents is advised:

	- :doc:`Current IAllocator specification <iallocator>`
	- :doc:`Shared storage model in 2.3+ <design-shared-storage>`

	Motivation and and design proposal
	==================================

	At the moment, moving instances away from their primary or secondary
	nodes with the ``relocate`` and ``multi-evacuate`` IAllocator calls
	restricts target nodes to those on the same node group. This ensures a
	mobility domain is never crossed, and allows normal operation of each
	node group to be confined within itself.

	It is desirable, however, to have a way of moving instances across node
	groups so that, for example, it is possible to move a set of instances
	to another group for policy reasons, or completely empty a given group
	to perform maintenance operations.

	To implement this, we propose the addition of new IAllocator calls to
	compute inter-group instance moves and group-aware node evacuation,
	taking into account mobility domains as appropriate. The interface
	proposed below should be enough to cover the use cases mentioned above.

	With the implementation of this design proposal, the previous
	``multi-evacuate`` mode will be deprecated.

	.. _multi-reloc-detailed-design:

	Detailed design
	===============

	All requests honor the groups' ``alloc_policy`` attribute.

	Changing instance's groups
	--------------------------

	Takes a list of instances and a list of node group UUIDs; the instances
	will be moved away from their current group, to any of the groups in the
	target list. All instances need to have their primary node in the same
	group, which may not be a target group. If the target group list is
	empty, the request is simply "change group" and the instances are placed
	in any group but their original one.

	Node evacuation
	---------------

	Evacuates instances off their primary nodes. The evacuation mode
	can be given as ``primary-only``, ``secondary-only`` or
	``all``. The call is given a list of instances whose primary nodes need
	to be in the same node group. The returned nodes need to be in the same
	group as the original primary node.

	.. _multi-reloc-result:

	Result
	------

	In all storage models, an inter-group move can be modeled as a sequence
	of replace secondary, migration and failover operations
	(when shared storage is used, they will all be failover or migration
	operations within the corresponding mobility domain).

	The result of the operations described above must contain two lists of
	instances and a list of jobs (each of which is a list of serialized
	opcodes) to actually execute the operation. :doc:`Job dependencies
	<design-chained-jobs>` can be used to force jobs to run in a certain
	order while still making use of parallelism.

	The two lists of instances describe which instances could be
	moved/migrated and which couldn't for some reason ("unsuccessful"). The
	union of the instances in the two lists must be equal to the set of
	instances given in the original request. The successful list of
	instances contains elements as follows::

	(instance name, target group name, [chosen node names])

	The choice of names is simply for readability reasons (for example,
	Ganeti could log the computed solution in the job information) and for
	being able to check (manually) for consistency that the generated
	opcodes match the intended target groups/nodes. Note that for the
	node-evacuate operation, the group is not changed, but it should still
	be returned as such (as it's easier to have the same return type for
	both operations).

	The unsuccessful list of instances contains elements as follows::

	(instance name, explanation)

	where ``explanation`` is a string describing why the plugin was not able
	to relocate the instance.

	The client is given a list of job IDs (see the :doc:`design for
	LU-generated jobs <design-lu-generated-jobs>`) which it can watch.
	Failures should be reported to the user.

	.. highlight:: python

	Example job list::

	[
	# First job
	[
	{ "OP_ID": "OP_INSTANCE_MIGRATE",
	"instance_name": "inst1.example.com",
	},
	{ "OP_ID": "OP_INSTANCE_MIGRATE",
	"instance_name": "inst2.example.com",
	},
	],
	# Second job
	[
	{ "OP_ID": "OP_INSTANCE_REPLACE_DISKS",
	"depends": [
	[-1, ["success"]],
	],
	"instance_name": "inst2.example.com",
	"mode": "replace_new_secondary",
	"remote_node": "node4.example.com",
	},
	],
	# Third job
	[
	{ "OP_ID": "OP_INSTANCE_FAILOVER",
	"depends": [
	[-2, []],
	],
	"instance_name": "inst8.example.com",
	},
	],
	]

	Accepted opcodes:

	- ``OP_INSTANCE_FAILOVER``
	- ``OP_INSTANCE_MIGRATE``
	- ``OP_INSTANCE_REPLACE_DISKS``

	.. vim: set textwidth=72 :
	.. Local Variables:
	.. mode: rst
	.. fill-column: 72
	.. End: