| =========================================== |
| Splitting the query and job execution paths |
| =========================================== |
| |
| |
| Introduction |
| ============ |
| |
| Currently, the master daemon does two main roles: |
| |
| - execute jobs that change the cluster state |
| - respond to queries |
| |
| Due to the technical details of the implementation, the job execution |
| and query paths interact with each other, and for example the "masterd |
| hang" issue that we had late in the 2.5 release cycle was due to the |
| interaction between job queries and job execution. |
| |
| Furthermore, also because technical implementations (Python lacking |
| read-only variables being one example), we can't share internal data |
| structures for jobs; instead, in the query path, we read them from |
| disk in order to not block job execution due to locks. |
| |
| All these point to the fact that the integration of both queries and |
| job execution in the same process (multi-threaded) creates more |
| problems than advantages, and hence we should look into separating |
| them. |
| |
| |
| Proposed design |
| =============== |
| |
| In Ganeti 2.7, we will introduce a separate, optional daemon to handle |
| queries (note: whether this is an actual "new" daemon, or its |
| functionality is folded into confd, remains to be seen). |
| |
| This daemon will expose exactly the same Luxi interface as masterd, |
| except that job submission will be disabled. If so configured (at |
| build time), clients will be changed to: |
| |
| - keep sending REQ_SUBMIT_JOB, REQ_SUBMIT_MANY_JOBS, and all requests |
| except REQ_QUERY_* to the masterd socket (but also QR_LOCK) |
| - redirect all REQ_QUERY_* requests to the new Luxi socket of the new |
| daemon (except generic query with QR_LOCK) |
| |
| This new daemon will serve both pure configuration queries (which |
| confd can already serve), and run-time queries (which currently only |
| masterd can serve). Since the RPC can be done from any node to any |
| node, the new daemon can run on all master candidates, not only on the |
| master node. This means that all gnt-* list options can be now run on |
| other nodes than the master node. If we implement this as a separate |
| daemon that talks to confd, then we could actually run this on all |
| nodes of the cluster (to be decided). |
| |
| During the 2.7 release, masterd will still respond to queries itself, |
| but it will log all such queries for identification of "misbehaving" |
| clients. |
| |
| Advantages |
| ---------- |
| |
| As far as I can see, this will bring some significant advantages. |
| |
| First, we remove any interaction between the job execution and cluster |
| query state. This means that bugs in the locking code (job execution) |
| will not impact the query of the cluster state, nor the query of the |
| job execution itself. Furthermore, we will be able to have different |
| tuning parameters between job execution (e.g. 25 threads for job |
| execution) versus query (since these are transient, we could |
| practically have unlimited numbers of query threads). |
| |
| As a result of the above split, we move from the current model, where |
| shutdown of the master daemon practically "breaks" the entire Ganeti |
| functionality (no job execution nor queries, not even connecting to |
| the instance console), to a split model: |
| |
| - if just masterd is stopped, then other cluster functionality remains |
| available: listing instances, connecting to the console of an |
| instance, etc. |
| - if just "luxid" is stopped, masterd can still process jobs, and one |
| can furthermore run queries from other nodes (MCs) |
| - only if both are stopped, we end up with the previous state |
| |
| This will help, for example, in the case where the master node has |
| crashed and we haven't failed it over yet: querying and investigating |
| the cluster state will still be possible from other master candidates |
| (on small clusters, this will mean from all nodes). |
| |
| A last advantage is that we finally will be able to reduce the |
| footprint of masterd; instead of previous discussion of splitting |
| individual jobs, which requires duplication of all the base |
| functionality, this will just split the queries, a more trivial piece |
| of code than job execution. This should be a reasonable work effort, |
| with a much smaller impact in case of failure (we can still run |
| masterd as before). |
| |
| Disadvantages |
| ------------- |
| |
| We might get increased inconsistency during queries, as there will be |
| a delay between masterd saving an updated configuration and |
| confd/query loading and parsing it. However, this could be compensated |
| by the fact that queries will only look at "snapshots" of the |
| configuration, whereas before it could also look at "in-progress" |
| modifications (due to the non-atomic updates). I think these will |
| cancel each other out, we will have to see in practice how it works. |
| |
| Another disadvantage *might* be that we have a more complex setup, due |
| to the introduction of a new daemon. However, the query path will be |
| much simpler, and when we remove the query functionality from masterd |
| we should have a more robust system. |
| |
| Finally, we have QR_LOCK, which is an internal query related to the |
| master daemon, using the same infrastructure as the other queries |
| (related to cluster state). This is unfortunate, and will require |
| untangling in order to keep code duplication low. |
| |
| Long-term plans |
| =============== |
| |
| If this works well, the plan would be (tentatively) to disable the |
| query functionality in masterd completely in Ganeti 2.8, in order to |
| remove the duplication. This might change based on how/if we split the |
| configuration/locking daemon out, or not. |
| |
| Once we split this out, there is not technical reason why we can't |
| execute any query from any node; except maybe practical reasons |
| (network topology, remote nodes, etc.) or security reasons (if/whether |
| we want to change the cluster security model). In any case, it should |
| be possible to do this in a reliable way from all master candidates. |
| |
| Some implementation details |
| --------------------------- |
| |
| We will fold this in confd, at least initially, to reduce the |
| proliferation of daemons. Haskell will limit (if used properly) any too |
| deep integration between the old "confd" functionality and the new query |
| one. As advantages, we'll have a single daemons that handles |
| configuration queries. |
| |
| The redirection of Luxi requests can be easily done based on the |
| request type, if we have both sockets open, or if we open on demand. |
| |
| We don't want the masterd to talk to the luxid itself (hidden |
| redirection), since we want to be able to run queries while masterd is |
| down. |
| |
| During the 2.7 release cycle, we can test all queries against both |
| masterd and luxid in QA, so we know we have exactly the same |
| interface and it is consistent. |
| |
| .. vim: set textwidth=72 : |
| .. Local Variables: |
| .. mode: rst |
| .. fill-column: 72 |
| .. End: |