| ======================================== |
| Automatized Upgrade Procedure for Ganeti |
| ======================================== |
| |
| .. contents:: :depth: 4 |
| |
| This is a design document detailing the proposed changes to the |
| upgrade process, in order to allow it to be more automatic. |
| |
| |
| Current state and shortcomings |
| ============================== |
| |
| Ganeti requires to run the same version of Ganeti to be run on all |
| nodes of a cluster and this requirement is unlikely to go away in the |
| foreseeable future. Also, the configuration may change between minor |
| versions (and in the past has proven to do so). This requires a quite |
| involved manual upgrade process of draining the queue, stopping |
| ganeti, changing the binaries, upgrading the configuration, starting |
| ganeti, distributing the configuration, and undraining the queue. |
| |
| |
| Proposed changes |
| ================ |
| |
| While we will not remove the requirement of the same Ganeti |
| version running on all nodes, the transition from one version |
| to the other will be made more automatic. It will be possible |
| to install new binaries ahead of time, and the actual switch |
| between versions will be a single command. |
| |
| While changing the file layout anyway, we install the python |
| code, which is architecture independent, under ``${prefix}/share``, |
| in a way that properly separates the Ganeti libraries of the |
| various versions. |
| |
| Path changes to allow multiple versions installed |
| ------------------------------------------------- |
| |
| Currently, Ganeti installs to ``${PREFIX}/bin``, ``${PREFIX}/sbin``, |
| and so on, as well as to ``${pythondir}/ganeti``. |
| |
| These paths will be changed in the following way. |
| |
| - The python package will be installed to |
| ``${PREFIX}/share/ganeti/${VERSION}/ganeti``. |
| Here ${VERSION} is, depending on configure options, either the full qualified |
| version number, consisting of major, minor, revision, and suffix, or it is |
| just a major.minor pair. All python executables will be installed under |
| ``${PREFIX}/share/ganeti/${VERSION}`` so that they see their respective |
| Ganeti library. ``${PREFIX}/share/ganeti/default`` is a symbolic link to |
| ``${sysconfdir}/ganeti/share`` which, in turn, is a symbolic link to |
| ``${PREFIX}/share/ganeti/${VERSION}``. For all python executatables (like |
| ``gnt-cluster``, ``gnt-node``, etc) symbolic links going through |
| ``${PREFIX}/share/ganeti/default`` are added under ``${PREFIX}/sbin``. |
| |
| - All other files will be installed to the corresponding path under |
| ``${libdir}/ganeti/${VERSION}`` instead of under ``${PREFIX}`` |
| directly, where ``${libdir}`` defaults to ``${PREFIX}/lib``. |
| ``${libdir}/ganeti/default`` will be a symlink to ``${sysconfdir}/ganeti/lib`` |
| which, in turn, is a symlink to ``${libdir}/ganeti/${VERSION}``. |
| Symbolic links to the files installed under ``${libdir}/ganeti/${VERSION}`` |
| will be added under ``${PREFIX}/bin``, ``${PREFIX}/sbin``, and so on. These |
| symbolic links will go through ``${libdir}/ganeti/default`` so that the |
| version can easily be changed by updating the symbolic link in |
| ``${sysconfdir}``. |
| |
| The set of links for ganeti binaries might change between the versions. |
| However, as the file structure under ``${libdir}/ganeti/${VERSION}`` reflects |
| that of ``/``, two links of differnt versions will never conflict. Similarly, |
| the symbolic links for the python executables will never conflict, as they |
| always point to a file with the same basename directly under |
| ``${PREFIX}/share/ganeti/default``. Therefore, each version will make sure that |
| enough symbolic links are present in ``${PREFIX}/bin``, ``${PREFIX}/sbin`` and |
| so on, even though some might be dangling, if a differnt version of ganeti is |
| currently active. |
| |
| The extra indirection through ``${sysconfdir}`` allows installations that choose |
| to have ``${sysconfdir}`` and ``${localstatedir}`` outside ``${PREFIX}`` to |
| mount ``${PREFIX}`` read-only. The latter is important for systems that choose |
| ``/usr`` as ``${PREFIX}`` and are following the Filesystem Hierarchy Standard. |
| For example, choosing ``/usr`` as ``${PREFIX}`` and ``/etc`` as ``${sysconfdir}``, |
| the layout for version 2.10 will look as follows. |
| :: |
| |
| / |
| | |
| +-- etc |
| | | |
| | +-- ganeti |
| | | |
| | +-- lib -> /usr/lib/ganeti/2.10 |
| | | |
| | +-- share -> /usr/share/ganeti/2.10 |
| +-- usr |
| | |
| +-- bin |
| | | |
| | +-- harep -> /usr/lib/ganeti/default/usr/bin/harep |
| | | |
| | ... |
| | |
| +-- sbin |
| | | |
| | +-- gnt-cluster -> /usr/share/ganeti/default/gnt-cluster |
| | | |
| | ... |
| | |
| +-- ... |
| | |
| +-- lib |
| | | |
| | +-- ganeti |
| | | |
| | +-- default -> /etc/ganeti/lib |
| | | |
| | +-- 2.10 |
| | | |
| | +-- usr |
| | | |
| | +-- bin |
| | | | |
| | | +-- htools |
| | | | |
| | | +-- harep -> htools |
| | | | |
| | | ... |
| | ... |
| | |
| +-- share |
| | |
| +-- ganeti |
| | |
| +-- default -> /etc/ganeti/share |
| | |
| +-- 2.10 |
| | |
| + -- gnt-cluster |
| | |
| + -- gnt-node |
| | |
| + -- ... |
| | |
| + -- ganeti |
| | |
| +-- backend.py |
| | |
| +-- ... |
| | |
| +-- cmdlib |
| | | |
| | ... |
| ... |
| |
| |
| |
| gnt-cluster upgrade |
| ------------------- |
| |
| The actual upgrade process will be done by a new command ``upgrade`` to |
| ``gnt-cluster``. If called with the option ``--to`` which take precisely |
| one argument, the version to |
| upgrade (or downgrade) to, given as full string with major, minor, revision, |
| and suffix. To be compatible with current configuration upgrade and downgrade |
| procedures, the new version must be of the same major version and |
| either an equal or higher minor version, or precisely the previous |
| minor version. |
| |
| When executed, ``gnt-cluster upgrade --to=<version>`` will perform the |
| following actions. |
| |
| - It verifies that the version to change to is installed on all nodes |
| of the cluster that are not marked as offline. If this is not the |
| case it aborts with an error. This initial testing is an |
| optimization to allow for early feedback. |
| |
| - An intent-to-upgrade file is created that contains the current |
| version of ganeti, the version to change to, and the process ID of |
| the ``gnt-cluster upgrade`` process. The latter is not used automatically, |
| but allows manual detection if the upgrade process died |
| unintentionally. The intend-to-upgrade file is persisted to disk |
| before continuing. |
| |
| - The Ganeti job queue is drained, and the executable waits till there |
| are no more jobs in the queue. Once :doc:`design-optables` is |
| implemented, for upgrades, and only for upgrades, all jobs are paused |
| instead (in the sense that the currently running opcode continues, |
| but the next opcode is not started) and it is continued once all |
| jobs are fully paused. |
| |
| - All ganeti daemons on the master node are stopped. |
| |
| - It is verified again that all nodes at this moment not marked as |
| offline have the new version installed. If this is not the case, |
| then all changes so far (stopping ganeti daemons and draining the |
| queue) are undone and failure is reported. This second verification |
| is necessary, as the set of online nodes might have changed during |
| the draining period. |
| |
| - All ganeti daemons on all remaining (non-offline) nodes are stopped. |
| |
| - A backup of all Ganeti-related status information is created for |
| manual rollbacks. While the normal way of rolling back after an |
| upgrade should be calling ``gnt-clsuter upgrade`` from the newer version |
| with the older version as argument, a full backup provides an |
| additional safety net, especially for jump-upgrades (skipping |
| intermediate minor versions). |
| |
| - If the action is a downgrade to the previous minor version, the |
| configuration is downgraded now, using ``cfgupgrade --downgrade``. |
| |
| - The ``${sysconfdir}/ganeti/lib`` and ``${sysconfdir}/ganeti/share`` |
| symbolic links are updated. |
| |
| - If the action is an upgrade to a higher minor version, the configuration |
| is upgraded now, using ``cfgupgrade``. |
| |
| - ``ensure-dirs --full-run`` is run on all nodes. |
| |
| - All daemons are started on all nodes. |
| |
| - ``gnt-cluster redist-conf`` is run on the master node. |
| |
| - All daemons are restarted on all nodes. |
| |
| - The Ganeti job queue is undrained. |
| |
| - The intent-to-upgrade file is removed. |
| |
| - ``post-upgrade`` is run with the original version as argument. |
| |
| - ``gnt-cluster verify`` is run and the result reported. |
| |
| |
| Considerations on unintended reboots of the master node |
| ======================================================= |
| |
| During the upgrade procedure, the only ganeti process still running is |
| the one instance of ``gnt-cluster upgrade``. This process is also responsible |
| for eventually removing the queue drain. Therefore, we have to provide |
| means to resume this process, if it dies unintentionally. The process |
| itself will handle SIGTERM gracefully by either undoing all changes |
| done so far, or by ignoring the signal all together and continuing to |
| the end; the choice between these behaviors depends on whether change |
| of the configuration has already started (in which case it goes |
| through to the end), or not (in which case the actions done so far are |
| rolled back). |
| |
| To achieve this, ``gnt-cluster upgrade`` will support a ``--resume`` |
| option. It is recommended |
| to have ``gnt-cluster upgrade --resume`` as an at-reboot task in the crontab. |
| The ``gnt-cluster upgrade --resume`` comand first verifies that |
| it is running on the master node, using the same requirement as for |
| starting the master daemon, i.e., confirmed by a majority of all |
| nodes. If it is not the master node, it will remove any possibly |
| existing intend-to-upgrade file and exit. If it is running on the |
| master node, it will check for the existence of an intend-to-upgrade |
| file. If no such file is found, it will simply exit. If found, it will |
| resume at the appropriate stage. |
| |
| - If the configuration file still is at the initial version, |
| ``gnt-cluster upgrade`` is resumed at the step immediately following the |
| writing of the intend-to-upgrade file. It should be noted that |
| all steps before changing the configuration are idempotent, so |
| redoing them does not do any harm. |
| |
| - If the configuration is already at the new version, all daemons on |
| all nodes are stopped (as they might have been started again due |
| to a reboot) and then it is resumed at the step immediately |
| following the configuration change. All actions following the |
| configuration change can be repeated without bringing the cluster |
| into a worse state. |
| |
| |
| Caveats |
| ======= |
| |
| Since ``gnt-cluster upgrade`` drains the queue and undrains it later, so any |
| information about a previous drain gets lost. This problem will |
| disappear, once :doc:`design-optables` is implemented, as then the |
| undrain will then be restricted to filters by gnt-upgrade. |
| |
| |
| Requirement of opcode backwards compatibility |
| ============================================== |
| |
| Since for upgrades we only pause jobs and do not fully drain the |
| queue, we need to be able to transform the job queue into a queue for |
| the new version. The way this is achieved is by keeping the |
| serialization format backwards compatible. This is in line with |
| current practice that opcodes do not change between versions, and at |
| most new fields are added. Whenever we add a new field to an opcode, |
| we will make sure that the deserialization function will provide a |
| default value if the field is not present. |
| |
| |