| ============================ |
| RADOS/Ceph support in Ganeti |
| ============================ |
| |
| .. contents:: :depth: 4 |
| |
| Objective |
| ========= |
| |
| The project aims to improve Ceph RBD support in Ganeti. It can be |
| primarily divided into following tasks. |
| |
| - Use Qemu/KVM RBD driver to provide instances with direct RBD |
| support. [implemented as of Ganeti 2.10] |
| - Allow Ceph RBDs' configuration through Ganeti. [unimplemented] |
| - Write a data collector to monitor Ceph nodes. [unimplemented] |
| |
| Background |
| ========== |
| |
| Ceph RBD |
| -------- |
| |
| Ceph is a distributed storage system which provides data access as |
| files, objects and blocks. As part of this project, we're interested in |
| integrating ceph's block device (RBD) directly with Qemu/KVM. |
| |
| Primary components/daemons of Ceph. |
| - Monitor - Serve as authentication point for clients. |
| - Metadata - Store all the filesystem metadata (Not configured here as |
| they are not required for RBD) |
| - OSD - Object storage devices. One daemon for each drive/location. |
| |
| RBD support in Ganeti |
| --------------------- |
| |
| Currently, Ganeti supports RBD volumes on a pre-configured Ceph cluster. |
| This is enabled through RBD disk templates. These templates allow RBD |
| volume's access through RBD Linux driver. The volumes are mapped to host |
| as local block devices which are then attached to the instances. This |
| method incurs an additional overhead. We plan to resolve it by using |
| Qemu's RBD driver to enable direct access to RBD volumes for KVM |
| instances. |
| |
| Also, Ganeti currently uses RBD volumes on a pre-configured ceph cluster. |
| Allowing configuration of ceph nodes through Ganeti will be a good |
| addition to its prime features. |
| |
| |
| Qemu/KVM Direct RBD Integration |
| =============================== |
| |
| A new disk param ``access`` is introduced. It's added at |
| cluster/node-group level to simplify prototype implementation. |
| It will specify the access method either as ``userspace`` or |
| ``kernelspace``. It's accessible to StartInstance() in hv_kvm.py. The |
| device path, ``rbd:<pool>/<vol_name>``, is generated by RADOSBlockDevice |
| and is added to the params dictionary as ``kvm_dev_path``. |
| |
| This approach ensures that no disk template specific changes are |
| required in hv_kvm.py allowing easy integration of other distributed |
| storage systems (like Gluster). |
| |
| Note that the RBD volume is mapped as a local block device as before. |
| The local mapping won't be used during instance operation in the |
| ``userspace`` access mode, but can be used by administrators and OS |
| scripts. |
| |
| Updated commands |
| ---------------- |
| :: |
| $ gnt-instance info |
| |
| ``access:userspace/kernelspace`` will be added to Disks category. This |
| output applies to KVM based instances only. |
| |
| Ceph configuration on Ganeti nodes |
| ================================== |
| |
| This document proposes configuration of distributed storage |
| pool (Ceph or Gluster) through ganeti. Currently, this design document |
| focuses on configuring a Ceph cluster. A prerequisite of this setup |
| would be installation of ceph packages on all the concerned nodes. |
| |
| At Ganeti Cluster init, the user will set distributed-storage specific |
| options which will be stored at cluster level. The Storage cluster |
| will be initialized using ``gnt-storage``. For the prototype, only a |
| single storage pool/node-group is configured. |
| |
| Following steps take place when a node-group is initialized as a storage |
| cluster. |
| |
| - Check for an existing ceph cluster through /etc/ceph/ceph.conf file |
| on each node. |
| - Fetch cluster configuration parameters and create a distributed |
| storage object accordingly. |
| - Issue an 'init distributed storage' RPC to group nodes (if any). |
| - On each node, ``ceph`` cli tool will run appropriate services. |
| - Mark nodes as well as the node-group as distributed-storage-enabled. |
| |
| The storage cluster will operate at a node-group level. The ceph |
| cluster will be initiated using gnt-storage. A new sub-command |
| ``init-distributed-storage`` will be added to it. |
| |
| The configuration of the nodes will be handled through an init function |
| called by the node daemons running on the respective nodes. A new RPC is |
| introduced to handle the calls. |
| |
| A new object will be created to send the storage parameters to the node |
| - storage_type, devices, node_role (mon/osd) etc. |
| |
| A new node can be directly assigned to the storage enabled node-group. |
| During the 'gnt-node add' process, required ceph daemons will be started |
| and node will be added to the ceph cluster. |
| |
| Only an offline node can be assigned to storage enabled node-group. |
| ``gnt-node --readd`` needs to be performed to issue RPCs for spawning |
| appropriate services on the newly assigned node. |
| |
| Updated Commands |
| ---------------- |
| |
| Following are the affected commands.:: |
| |
| $ gnt-cluster init -S ceph:disk=/dev/sdb,option=value... |
| |
| During cluster initialization, ceph specific options are provided which |
| apply at cluster-level.:: |
| |
| $ gnt-cluster modify -S ceph:option=value2... |
| |
| For now, cluster modification will be allowed when there is no |
| initialized storage cluster.:: |
| |
| $ gnt-storage init-distributed-storage -s{--storage-type} ceph \ |
| <node-group> |
| |
| Ensure that no other node-group is configured as distributed storage |
| cluster and configure ceph on the specified node-group. If there is no |
| node in the node-group, it'll only be marked as distributed storage |
| enabled and no action will be taken.:: |
| |
| $ gnt-group assign-nodes <group> <node> |
| |
| It ensures that the node is offline if the node-group specified is |
| distributed storage capable. Ceph configuration on the newly assigned |
| node is not performed at this step.:: |
| |
| $ gnt-node --offline |
| |
| If the node is part of storage node-group, an offline call will stop/remove |
| ceph daemons.:: |
| |
| $ gnt-node add --readd |
| |
| If the node is now part of the storage node-group, issue init |
| distributed storage RPC to the respective node. This step is required |
| after assigning a node to the storage enabled node-group:: |
| |
| $ gnt-node remove |
| |
| A warning will be issued stating that the node is part of distributed |
| storage, mark it offline before removal. |
| |
| Data collector for Ceph |
| ----------------------- |
| |
| TBD |
| |
| Future Work |
| ----------- |
| |
| Due to the loopback bug in ceph, one may run into daemon hang issues |
| while performing writes to a RBD volumes through block device mapping. |
| This bug is applicable only when the RBD volume is stored on the OSD |
| running on the local node. In order to mitigate this issue, we can |
| create storage pools on different nodegroups and access RBD |
| volumes on different pools. |
| http://tracker.ceph.com/issues/3076 |
| |
| .. vim: set textwidth=72 : |
| .. Local Variables: |
| .. mode: rst |
| .. fill-column: 72 |
| .. End: |