| ======================= |
| Ganeti monitoring agent |
| ======================= |
| |
| .. contents:: :depth: 4 |
| |
| This is a design document detailing the implementation of a Ganeti |
| monitoring agent report system, that can be queried by a monitoring |
| system to calculate health information for a Ganeti cluster. |
| |
| Current state and shortcomings |
| ============================== |
| |
| There is currently no monitoring support in Ganeti. While we don't want |
| to build something like Nagios or Pacemaker as part of Ganeti, it would |
| be useful if such tools could easily extract information from a Ganeti |
| machine in order to take actions (example actions include logging an |
| outage for future reporting or alerting a person or system about it). |
| |
| Proposed changes |
| ================ |
| |
| Each Ganeti node should export a status page that can be queried by a |
| monitoring system. Such status page will be exported on a network port |
| and will be encoded in JSON (simple text) over HTTP. |
| |
| The choice of JSON is obvious as we already depend on it in Ganeti and |
| thus we don't need to add extra libraries to use it, as opposed to what |
| would happen for XML or some other markup format. |
| |
| Location of agent report |
| ------------------------ |
| |
| The report will be available from all nodes, and be concerned for all |
| node-local resources. This allows more real-time information to be |
| available, at the cost of querying all nodes. |
| |
| Information reported |
| -------------------- |
| |
| The monitoring agent system will report on the following basic information: |
| |
| - Instance status |
| - Instance disk status |
| - Status of storage for instances |
| - Ganeti daemons status, CPU usage, memory footprint |
| - Hypervisor resources report (memory, CPU, network interfaces) |
| - Node OS resources report (memory, CPU, network interfaces) |
| - Node OS CPU load average report |
| - Information from a plugin system |
| |
| Format of the report |
| -------------------- |
| |
| The report of the will be in JSON format, and it will present an array |
| of report objects. |
| Each report object will be produced by a specific data collector. |
| Each report object includes some mandatory fields, to be provided by all |
| the data collectors: |
| |
| ``name`` |
| The name of the data collector that produced this part of the report. |
| It is supposed to be unique inside a report. |
| |
| ``version`` |
| The version of the data collector that produces this part of the |
| report. Built-in data collectors (as opposed to those implemented as |
| plugins) should have "B" as the version number. |
| |
| ``format_version`` |
| The format of what is represented in the "data" field for each data |
| collector might change over time. Every time this happens, the |
| format_version should be changed, so that who reads the report knows |
| what format to expect, and how to correctly interpret it. |
| |
| ``timestamp`` |
| The time when the reported data were gathered. It has to be expressed |
| in nanoseconds since the unix epoch (0:00:00 January 01, 1970). If not |
| enough precision is available (or needed) it can be padded with |
| zeroes. If a report object needs multiple timestamps, it can add more |
| and/or override this one inside its own "data" section. |
| |
| ``category`` |
| A collector can belong to a given category of collectors (e.g.: storage |
| collectors, daemon collector). This means that it will have to provide a |
| minumum set of prescribed fields, as documented for each category. |
| This field will contain the name of the category the collector belongs to, |
| if any, or just the ``null`` value. |
| |
| ``kind`` |
| Two kinds of collectors are possible: |
| `Performance reporting collectors`_ and `Status reporting collectors`_. |
| The respective paragraphs will describe them and the value of this field. |
| |
| ``data`` |
| This field contains all the data generated by the specific data collector, |
| in its own independently defined format. The monitoring agent could check |
| this syntactically (according to the JSON specifications) but not |
| semantically. |
| |
| Here follows a minimal example of a report:: |
| |
| [ |
| { |
| "name" : "TheCollectorIdentifier", |
| "version" : "1.2", |
| "format_version" : 1, |
| "timestamp" : 1351607182000000000, |
| "category" : null, |
| "kind" : 0, |
| "data" : { "plugin_specific_data" : "go_here" } |
| }, |
| { |
| "name" : "AnotherDataCollector", |
| "version" : "B", |
| "format_version" : 7, |
| "timestamp" : 1351609526123854000, |
| "category" : "storage", |
| "kind" : 1, |
| "data" : { "status" : { "code" : 1, |
| "message" : "Error on disk 2" |
| }, |
| "plugin_specific" : "data", |
| "some_late_data" : { "timestamp" : 1351609526123942720, |
| ... |
| } |
| } |
| } |
| ] |
| |
| Performance reporting collectors |
| ++++++++++++++++++++++++++++++++ |
| |
| These collectors only provide data about some component of the system, without |
| giving any interpretation over their meaning. |
| |
| The value of the ``kind`` field of the report will be ``0``. |
| |
| Status reporting collectors |
| +++++++++++++++++++++++++++ |
| |
| These collectors will provide information about the status of some |
| component of ganeti, or managed by ganeti. |
| |
| The value of their ``kind`` field will be ``1``. |
| |
| The rationale behind this kind of collectors is that there are some situations |
| where exporting data about the underlying subsystems would expose potential |
| issues. But if Ganeti itself is able (and going) to fix the problem, conflicts |
| might arise between Ganeti and something/somebody else trying to fix the same |
| problem. |
| Also, some external monitoring systems might not be aware of the internals of a |
| particular subsystem (e.g.: DRBD) and might only exploit the high level |
| response of its data collector, alerting an administrator if anything is wrong. |
| Still, completely hiding the underlying data is not a good idea, as they might |
| still be of use in some cases. So status reporting plugins will provide two |
| output modes: one just exporting a high level information about the status, |
| and one also exporting all the data they gathered. |
| The default output mode will be the status-only one. Through a command line |
| parameter (for stand-alone data collectors) or through the HTTP request to the |
| monitoring agent |
| (when collectors are executed as part of it) the verbose output mode providing |
| all the data can be selected. |
| |
| When exporting just the status each status reporting collector will provide, |
| in its ``data`` section, at least the following field: |
| |
| ``status`` |
| summarizes the status of the component being monitored and consists of two |
| subfields: |
| |
| ``code`` |
| It assumes a numeric value, encoded in such a way to allow using a bitset |
| to easily distinguish which states are currently present in the whole |
| cluster. If the bitwise OR of all the ``status`` fields is 0, the cluster |
| is completely healty. |
| The status codes are as follows: |
| |
| ``0`` |
| The collector can determine that everything is working as |
| intended. |
| |
| ``1`` |
| Something is temporarily wrong but it is being automatically fixed by |
| Ganeti. |
| There is no need of external intervention. |
| |
| ``2`` |
| The collector has failed to understand whether the status is good or |
| bad. Further analysis is required. Interpret this status as a |
| potentially dangerous situation. |
| |
| ``4`` |
| The collector can determine that something is wrong and Ganeti has no |
| way to fix it autonomously. External intervention is required. |
| |
| ``message`` |
| A message to better explain the reason of the status. |
| The exact format of the message string is data collector dependent. |
| |
| The field is mandatory, but the content can be an empty string if the |
| ``code`` is ``0`` (working as intended) or ``1`` (being fixed |
| automatically). |
| |
| If the status code is ``2``, the message should specify what has gone |
| wrong. |
| If the status code is ``4``, the message shoud explain why it was not |
| possible to determine a proper status. |
| |
| The ``data`` section will also contain all the fields describing the gathered |
| data, according to a collector-specific format. |
| |
| Instance status |
| +++++++++++++++ |
| |
| At the moment each node knows which instances are running on it, which |
| instances it is primary for, but not the cause why an instance might not |
| be running. On the other hand we don't want to distribute full instance |
| "admin" status information to all nodes, because of the performance |
| impact this would have. |
| |
| As such we propose that: |
| |
| - Any operation that can affect instance status will have an optional |
| "reason" attached to it (at opcode level). This can be used for |
| example to distinguish an admin request, from a scheduled maintenance |
| or an automated tool's work. If this reason is not passed, Ganeti will |
| just use the information it has about the source of the request. |
| This reason information will be structured according to the |
| :doc:`Ganeti reason trail <design-reason-trail>` design document. |
| - RPCs that affect the instance status will be changed so that the |
| "reason" and the version of the config object they ran on is passed to |
| them. They will then export the new expected instance status, together |
| with the associated reason and object version to the status report |
| system, which then will export those themselves. |
| |
| Monitoring and auditing systems can then use the reason to understand |
| the cause of an instance status, and they can use the timestamp to |
| understand the freshness of their data even in the absence of an atomic |
| cross-node reporting: for example if they see an instance "up" on a node |
| after seeing it running on a previous one, they can compare these values |
| to understand which data is freshest, and repoll the "older" node. Of |
| course if they keep seeing this status this represents an error (either |
| an instance continuously "flapping" between nodes, or an instance is |
| constantly up on more than one), which should be reported and acted |
| upon. |
| |
| The instance status will be on each node, for the instances it is |
| primary for, and its ``data`` section of the report will contain a list |
| of instances, named ``instances``, with at least the following fields for |
| each instance: |
| |
| ``name`` |
| The name of the instance. |
| |
| ``uuid`` |
| The UUID of the instance (stable on name change). |
| |
| ``admin_state`` |
| The status of the instance (up/down/offline) as requested by the admin. |
| |
| ``actual_state`` |
| The actual status of the instance. It can be ``up``, ``down``, or |
| ``hung`` if the instance is up but it appears to be completely stuck. |
| |
| ``uptime`` |
| The uptime of the instance (if it is up, "null" otherwise). |
| |
| ``mtime`` |
| The timestamp of the last known change to the instance state. |
| |
| ``state_reason`` |
| The last known reason for state change of the instance, described according |
| to the JSON representation of a reason trail, as detailed in the :doc:`reason |
| trail design document <design-reason-trail>`. |
| |
| ``status`` |
| It represents the status of the instance, and its format is the same as that |
| of the ``status`` field of `Status reporting collectors`_. |
| |
| Each hypervisor should provide its own instance status data collector, possibly |
| with the addition of more, specific, fields. |
| The ``category`` field of all of them will be ``instance``. |
| The ``kind`` field will be ``1``. |
| |
| Note that as soon as a node knows it's not the primary anymore for an |
| instance it will stop reporting status for it: this means the instance |
| will either disappear, if it has been deleted, or appear on another |
| node, if it's been moved. |
| |
| The ``code`` of the ``status`` field of the report of the Instance status data |
| collector will be: |
| |
| ``0`` |
| if ``status`` is ``0`` for all the instances it is reporting about. |
| |
| ``1`` |
| otherwise. |
| |
| Storage collectors |
| ++++++++++++++++++ |
| |
| The storage collectors will be a series of data collectors |
| that will gather data about storage for the current node. The collection |
| will be performed at different granularity and abstraction levels, from |
| the physical disks, to partitions, logical volumes and to the specific |
| storage types used by Ganeti itself (drbd, rbd, plain, file). |
| |
| The ``name`` of each of these collector will reflect what storage type each of |
| them refers to. |
| |
| The ``category`` field of these collector will be ``storage``. |
| |
| The ``kind`` field will depend on the specific collector. |
| |
| Each ``storage`` collector's ``data`` section will provide collector-specific |
| fields. |
| |
| The various storage collectors will provide keys to join the data they provide, |
| in order to allow the user to get a better understanding of the system. E.g.: |
| through device names, or instance names. |
| |
| Diskstats collector |
| ******************* |
| |
| This storage data collector will gather information about the status of the |
| disks installed in the system, as listed in the /proc/diskstats file. This means |
| that not only physical hard drives, but also ramdisks and loopback devices will |
| be listed. |
| |
| Its ``kind`` in the report will be ``0`` (`Performance reporting collectors`_). |
| |
| Its ``category`` field in the report will contain the value ``storage``. |
| |
| When executed in verbose mode, the ``data`` section of the report of this |
| collector will be a list of items, each representing one disk, each providing |
| the following fields: |
| |
| ``major`` |
| The major number of the device. |
| |
| ``minor`` |
| The minor number of the device. |
| |
| ``name`` |
| The name of the device. |
| |
| ``readsNum`` |
| This is the total number of reads completed successfully. |
| |
| ``mergedReads`` |
| Reads which are adjacent to each other may be merged for efficiency. Thus |
| two 4K reads may become one 8K read before it is ultimately handed to the |
| disk, and so it will be counted (and queued) as only one I/O. This field |
| specifies how often this was done. |
| |
| ``secRead`` |
| This is the total number of sectors read successfully. |
| |
| ``timeRead`` |
| This is the total number of milliseconds spent by all reads. |
| |
| ``writes`` |
| This is the total number of writes completed successfully. |
| |
| ``mergedWrites`` |
| Writes which are adjacent to each other may be merged for efficiency. Thus |
| two 4K writes may become one 8K read before it is ultimately handed to the |
| disk, and so it will be counted (and queued) as only one I/O. This field |
| specifies how often this was done. |
| |
| ``secWritten`` |
| This is the total number of sectors written successfully. |
| |
| ``timeWrite`` |
| This is the total number of milliseconds spent by all writes. |
| |
| ``ios`` |
| The number of I/Os currently in progress. |
| The only field that should go to zero, it is incremented as requests are |
| given to appropriate struct request_queue and decremented as they finish. |
| |
| ``timeIO`` |
| The number of milliseconds spent doing I/Os. This field increases so long |
| as field ``IOs`` is nonzero. |
| |
| ``wIOmillis`` |
| The weighted number of milliseconds spent doing I/Os. |
| This field is incremented at each I/O start, I/O completion, I/O merge, |
| or read of these stats by the number of I/Os in progress (field ``IOs``) |
| times the number of milliseconds spent doing I/O since the last update of |
| this field. This can provide an easy measure of both I/O completion time |
| and the backlog that may be accumulating. |
| |
| Logical Volume collector |
| ************************ |
| |
| This data collector will gather information about the attributes of logical |
| volumes present in the system. |
| |
| Its ``kind`` in the report will be ``0`` (`Performance reporting collectors`_). |
| |
| Its ``category`` field in the report will contain the value ``storage``. |
| |
| The ``data`` section of the report of this collector will be a list of items, |
| each representing one logical volume and providing the following fields: |
| |
| ``uuid`` |
| The UUID of the logical volume. |
| |
| ``name`` |
| The name of the logical volume. |
| |
| ``attr`` |
| The attributes of the logical volume. |
| |
| ``major`` |
| Persistent major number or -1 if not persistent. |
| |
| ``minor`` |
| Persistent minor number or -1 if not persistent. |
| |
| ``kernel_major`` |
| Currently assigned major number or -1 if LV is not active. |
| |
| ``kernel_minor`` |
| Currently assigned minor number or -1 if LV is not active. |
| |
| ``size`` |
| Size of LV in bytes. |
| |
| ``seg_count`` |
| Number of segments in LV. |
| |
| ``tags`` |
| Tags, if any. |
| |
| ``modules`` |
| Kernel device-mapper modules required for this LV, if any. |
| |
| ``vg_uuid`` |
| Unique identifier of the volume group. |
| |
| ``vg_name`` |
| Name of the volume group. |
| |
| ``segtype`` |
| Type of LV segment. |
| |
| ``seg_start`` |
| Offset within the LVto the start of the segment in bytes. |
| |
| ``seg_start_pe`` |
| Offset within the LV to the start of the segment in physical extents. |
| |
| ``seg_size`` |
| Size of the segment in bytes. |
| |
| ``seg_tags`` |
| Tags for the segment, if any. |
| |
| ``seg_pe_ranges`` |
| Ranges of Physical Extents of underlying devices in lvs command line format. |
| |
| ``devices`` |
| Underlying devices used with starting extent numbers. |
| |
| ``instance`` |
| The name of the instance this LV is used by, or ``null`` if it was not |
| possible to determine it. |
| |
| DRBD status |
| *********** |
| |
| This data collector will run only on nodes where DRBD is actually |
| present and it will gather information about DRBD devices. |
| |
| Its ``kind`` in the report will be ``1`` (`Status reporting collectors`_). |
| |
| Its ``category`` field in the report will contain the value ``storage``. |
| |
| When executed in verbose mode, the ``data`` section of the report of this |
| collector will provide the following fields: |
| |
| ``versionInfo`` |
| Information about the DRBD version number, given by a combination of |
| any (but at least one) of the following fields: |
| |
| ``version`` |
| The DRBD driver version. |
| |
| ``api`` |
| The API version number. |
| |
| ``proto`` |
| The protocol version. |
| |
| ``srcversion`` |
| The version of the source files. |
| |
| ``gitHash`` |
| Git hash of the source files. |
| |
| ``buildBy`` |
| Who built the binary, and, optionally, when. |
| |
| ``device`` |
| A list of structures, each describing a DRBD device (a minor) and containing |
| the following fields: |
| |
| ``minor`` |
| The device minor number. |
| |
| ``connectionState`` |
| The state of the connection. If it is "Unconfigured", all the following |
| fields are not present. |
| |
| ``localRole`` |
| The role of the local resource. |
| |
| ``remoteRole`` |
| The role of the remote resource. |
| |
| ``localState`` |
| The status of the local disk. |
| |
| ``remoteState`` |
| The status of the remote disk. |
| |
| ``replicationProtocol`` |
| The replication protocol being used. |
| |
| ``ioFlags`` |
| The input/output flags. |
| |
| ``perfIndicators`` |
| The performance indicators. This field will contain the following |
| sub-fields: |
| |
| ``networkSend`` |
| KiB of data sent on the network. |
| |
| ``networkReceive`` |
| KiB of data received from the network. |
| |
| ``diskWrite`` |
| KiB of data written on local disk. |
| |
| ``diskRead`` |
| KiB of date read from the local disk. |
| |
| ``activityLog`` |
| Number of updates of the activity log. |
| |
| ``bitMap`` |
| Number of updates to the bitmap area of the metadata. |
| |
| ``localCount`` |
| Number of open requests to the local I/O subsystem. |
| |
| ``pending`` |
| Number of requests sent to the partner but not yet answered. |
| |
| ``unacknowledged`` |
| Number of requests received by the partner but still to be answered. |
| |
| ``applicationPending`` |
| Num of block input/output requests forwarded to DRBD but that have not yet |
| been answered. |
| |
| ``epochs`` |
| (Optional) Number of epoch objects. Not provided by all DRBD versions. |
| |
| ``writeOrder`` |
| (Optional) Currently used write ordering method. Not provided by all DRBD |
| versions. |
| |
| ``outOfSync`` |
| (Optional) KiB of storage currently out of sync. Not provided by all DRBD |
| versions. |
| |
| ``syncStatus`` |
| (Optional) The status of the synchronization of the disk. This is present |
| only if the disk is being synchronized, and includes the following fields: |
| |
| ``percentage`` |
| The percentage of synchronized data. |
| |
| ``progress`` |
| How far the synchronization is. Written as "x/y", where x and y are |
| integer numbers expressed in the measurement unit stated in |
| ``progressUnit`` |
| |
| ``progressUnit`` |
| The measurement unit for the progress indicator. |
| |
| ``timeToFinish`` |
| The expected time before finishing the synchronization. |
| |
| ``speed`` |
| The speed of the synchronization. |
| |
| ``want`` |
| The desiderd speed of the synchronization. |
| |
| ``speedUnit`` |
| The measurement unit of the ``speed`` and ``want`` values. Expressed |
| as "size/time". |
| |
| ``instance`` |
| The name of the Ganeti instance this disk is associated to. |
| |
| |
| Ganeti daemons status |
| +++++++++++++++++++++ |
| |
| Ganeti will report what information it has about its own daemons. |
| This should allow identifying possible problems with the Ganeti system itself: |
| for example memory leaks, crashes and high resource utilization should be |
| evident by analyzing this information. |
| |
| The ``kind`` field will be ``1`` (`Status reporting collectors`_). |
| |
| Each daemon will have its own data collector, and each of them will have |
| a ``category`` field valued ``daemon``. |
| |
| When executed in verbose mode, their data section will include at least: |
| |
| ``memory`` |
| The amount of used memory. |
| |
| ``size_unit`` |
| The measurement unit used for the memory. |
| |
| ``uptime`` |
| The uptime of the daemon. |
| |
| ``CPU usage`` |
| How much cpu the daemon is using (percentage). |
| |
| Any other daemon-specific information can be included as well in the ``data`` |
| section. |
| |
| Hypervisor resources report |
| +++++++++++++++++++++++++++ |
| |
| Each hypervisor has a view of system resources that sometimes is |
| different than the one the OS sees (for example in Xen the Node OS, |
| running as Dom0, has access to only part of those resources). In this |
| section we'll report all information we can in a "non hypervisor |
| specific" way. Each hypervisor can then add extra specific information |
| that is not generic enough be abstracted. |
| |
| The ``kind`` field will be ``0`` (`Performance reporting collectors`_). |
| |
| Each of the hypervisor data collectory will be of ``category``: ``hypervisor``. |
| |
| Node OS resources report |
| ++++++++++++++++++++++++ |
| |
| Since Ganeti assumes it's running on Linux, it's useful to export some |
| basic information as seen by the host system. |
| |
| The ``category`` field of the report will be ``null``. |
| |
| The ``kind`` field will be ``0`` (`Performance reporting collectors`_). |
| |
| The ``data`` section will include: |
| |
| ``cpu_number`` |
| The number of available cpus. |
| |
| ``cpus`` |
| A list with one element per cpu, showing its average load. |
| |
| ``memory`` |
| The current view of memory (free, used, cached, etc.) |
| |
| ``filesystem`` |
| A list with one element per filesystem, showing a summary of the |
| total/available space. |
| |
| ``NICs`` |
| A list with one element per network interface, showing the amount of |
| sent/received data, error rate, IP address of the interface, etc. |
| |
| ``versions`` |
| A map using the name of a component Ganeti interacts (Linux, drbd, |
| hypervisor, etc) as the key and its version number as the value. |
| |
| Note that we won't go into any hardware specific details (e.g. querying a |
| node RAID is outside the scope of this, and can be implemented as a |
| plugin) but we can easily just report the information above, since it's |
| standard enough across all systems. |
| |
| Node OS CPU load average report |
| +++++++++++++++++++++++++++++++ |
| |
| This data collector will export CPU load statistics as seen by the host |
| system. Apart from using the data from an external monitoring system we |
| can also use the data to improve instance allocation and/or the Ganeti |
| cluster balance. To compute the CPU load average we will use a number of |
| values collected inside a time window. The collection process will be |
| done by an independent thread (see `Mode of Operation`_). |
| |
| This report is a subset of the previous report (`Node OS resources |
| report`_) and they might eventually get merged, once reporting for the |
| other fields (memory, filesystem, NICs) gets implemented too. |
| |
| Specifically: |
| |
| The ``category`` field of the report will be ``null``. |
| |
| The ``kind`` field will be ``0`` (`Performance reporting collectors`_). |
| |
| The ``data`` section will include: |
| |
| ``cpu_number`` |
| The number of available cpus. |
| |
| ``cpus`` |
| A list with one element per cpu, showing its average load. |
| |
| ``cpu_total`` |
| The total CPU load average as a sum of the all separate cpus. |
| |
| The CPU load report function will get N values, collected by the |
| CPU load collection function and calculate the above averages. Please |
| see the section `Mode of Operation`_ for more information one how the |
| two functions of the data collector interact. |
| |
| Format of the query |
| ------------------- |
| |
| .. include:: monitoring-query-format.rst |
| |
| Instance disk status propagation |
| -------------------------------- |
| |
| As for the instance status Ganeti has now only partial information about |
| its instance disks: in particular each node is unaware of the disk to |
| instance mapping, that exists only on the master. |
| |
| For this design doc we plan to fix this by changing all RPCs that create |
| a backend storage or that put an already existing one in use and passing |
| the relevant instance to the node. The node can then export these to the |
| status reporting tool. |
| |
| While we haven't implemented these RPC changes yet, we'll use Confd to |
| fetch this information in the data collectors. |
| |
| Plugin system |
| ------------- |
| |
| The monitoring system will be equipped with a plugin system that can |
| export specific local information through it. |
| |
| The plugin system is expected to be used by local installations to |
| export any installation specific information that they want to be |
| monitored, about either hardware or software on their systems. |
| |
| The plugin system will be in the form of either scripts or binaries whose output |
| will be inserted in the report. |
| |
| Eventually support for other kinds of plugins might be added as well, such as |
| plain text files which will be inserted into the report, or local unix or |
| network sockets from which the information has to be read. This should allow |
| most flexibility for implementing an efficient system, while being able to keep |
| it as simple as possible. |
| |
| Data collectors |
| --------------- |
| |
| In order to ease testing as well as to make it simple to reuse this |
| subsystem it will be possible to run just the "data collectors" on each |
| node without passing through the agent daemon. |
| |
| If a data collector is run independently, it should print on stdout its |
| report, according to the format corresponding to a single data collector |
| report object, as described in the previous paragraphs. |
| |
| Mode of operation |
| ----------------- |
| |
| In order to be able to report information fast the monitoring agent |
| daemon will keep an in-memory or on-disk cache of the status, which will |
| be returned when queries are made. The status system will then |
| periodically check resources to make sure the status is up to date. |
| |
| Different parts of the report will be queried at different speeds. These |
| will depend on: |
| - how often they vary (or we expect them to vary) |
| - how fast they are to query |
| - how important their freshness is |
| |
| Of course the last parameter is installation specific, and while we'll |
| try to have defaults, it will be configurable. The first two instead we |
| can use adaptively to query a certain resource faster or slower |
| depending on those two parameters. |
| |
| When run as stand-alone binaries, the data collector will not using any |
| caching system, and just fetch and return the data immediately. |
| |
| Since some performance collectors have to operate on a number of values |
| collected in previous times, we need a mechanism independent of the data |
| collector which will trigger the collection of those values and also |
| store them, so that they are available for calculation by the data |
| collectors. |
| |
| To collect data periodically, a thread will be created by the monitoring |
| agent which will run the collection function of every data collector |
| that provides one. The values returned by the collection function of |
| the data collector will be saved in an appropriate map, associating each |
| value to the corresponding collector, using the collector's name as the |
| key of the map. This map will be stored in mond's memory. |
| |
| For example: the collection function of the CPU load collector will |
| collect a CPU load value and save it in the map mentioned above. The |
| collection function will be called by the collector thread every t |
| milliseconds. When the report function of the collector is called, it |
| will process the last N values of the map and calculate the |
| corresponding average. |
| |
| Implementation place |
| -------------------- |
| |
| The status daemon will be implemented as a standalone Haskell daemon. In |
| the future it should be easy to merge multiple daemons into one with |
| multiple entry points, should we find out it saves resources and doesn't |
| impact functionality. |
| |
| The libekg library should be looked at for easily providing metrics in |
| json format. |
| |
| Implementation order |
| -------------------- |
| |
| We will implement the agent system in this order: |
| |
| - initial example data collectors (eg. for drbd and instance status). |
| - initial daemon for exporting data, integrating the existing collectors |
| - plugin system |
| - RPC updates for instance status reasons and disk to instance mapping |
| - cache layer for the daemon |
| - more data collectors |
| |
| |
| Future work |
| =========== |
| |
| As a future step it can be useful to "centralize" all this reporting |
| data on a single place. This for example can be just the master node, or |
| all the master candidates. We will evaluate doing this after the first |
| node-local version has been developed and tested. |
| |
| Another possible change is replacing the "read-only" RPCs with queries |
| to the agent system, thus having only one way of collecting information |
| from the nodes from a monitoring system and for Ganeti itself. |
| |
| One extra feature we may need is a way to query for only sub-parts of |
| the report (eg. instances status only). This can be done by passing |
| arguments to the HTTP GET, which will be defined when we get to this |
| funtionality. |
| |
| Finally the :doc:`autorepair system design <design-autorepair>`. system |
| (see its design) can be expanded to use the monitoring agent system as a |
| source of information to decide which repairs it can perform. |
| |
| .. vim: set textwidth=72 : |
| .. Local Variables: |
| .. mode: rst |
| .. fill-column: 72 |
| .. End: |