move supported operating system into cluster using RPC

Bug #1319143 reported by Blake Rouse
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Gavin Panella

Bug Description

The OperatingSystemRegistry is directly imported into maasserver, this is incorrect. It should use RPC to retrieve the available operating systems from each cluster.

Tags: server-hwe

Related branches

Revision history for this message
Julian Edwards (julian-edwards) wrote :

I'm treating this as a critical problem as it breaks our cluster mesh assumptions.

Changed in maas:
status: Confirmed → Triaged
importance: High → Critical
Revision history for this message
Blake Rouse (blake-rouse) wrote :

Now that the OperatingSystemRegistry is being used throughout maasserver, it is clear that it should have never been placed into pserv, but instead should be in maasserver instead. Here is a break down of where the OperatingSystemRegistry is used throughout each project.

maasserver:
  - boot image selection
  - preseed selection
  - preseed generation
  - api/webui for node
  - api/webui for license keys
  - api/webui for other settings

provisioningserver:
  - boot image purposes

Looking at the uses, it seems that using an RPC call for these operations would be a waste, as this logic could just be located directly into maasserver, were it is used the most. Ofcourse we still need to handle the case where it is being used in provisioningserver.

The pserv logic could be moved into maasserver as well. Since RPC goes both ways, pserv could use RPC to request what purposes should be supported. Or when the boot images are reported a list of files that exists in each directory could be reported as well, so maasserver can make a determination on what purposes should be enabled for that boot image.

Revision history for this message
Gavin Panella (allenap) wrote :

The clusters are meant to be the reference for what operating systems might be possible to install, and then what operating systems can actually be installed based on what boot resources are available. These could differ between clusters because of version differences, or because of what resources each has been asked to download, or what they've actually downloaded.

RPC calls are also cheap, comparable to a query to the database, because the connections between the region and each cluster are persistent, and because the protocol is lightweight (there's no HTTP overhead for example).

In the longer term I think moving things like preseeds over to the cluster makes sense too.

In short, I think OperatingSystemRegistry should stay in the cluster, and be queried via RPC.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Since none of the operating system work has been released there is no reason the region, could not dictate what operating systems are supported. As long as the cluster reports that it has boot images, supporting that operating system then it can be used for that cluster.

RPC might be cheap, but making an RPC call everytime the region needs to get a title for an operating system or what operating systems are available, seems like more network traffic than needed. Also this could be come a scale issue, as many clusters would need to be contacted for something simple as what operating systems do we support.

On the note about moving the preseeds over to the cluster, is there really a benefit? If you a user modifies a preseed, then it would need to be modified on each cluster, meaning more work for the user, and easy for them to miss one or make a mistake. To remove direct traffic from the nodes to the region, the cluster could easily proxy the requests to the region.

I see little benefit of keeping the OperatingSystemRegistry in the cluster? What is truly gained from this implementation?

Revision history for this message
Andres Rodriguez (andreserl) wrote :

From perspective I think we have to differentiate two things:

1. The knowledge of what Operating Systems we support (This should live in the Region)
2. The knowledge of what Operating Systems are available. (which is what currently exists and happens)

It seems to me that the appropriate place for MAAS to know about the supported OS' is the Region Controller. Now, it is my understanding that the Cluster Controllers reports back to the Region what available images it has. This would mean, the Region has a knowledge of all the operating systems, but based from the information from each Cluster Controller, it would also know what OS' are currently available in each of the clusters.

So based on that, i think that the correct place for the OperatingSystemRegistry could be the Region Controller, as it is the one that knows of all the currently *supported* Operating Systems, where as it is not the one that knows about the currently *available* OS per Cluster.

Now, the question is. What happens is a machine is requested by juju for X operating system that is not supported in Y cluster where the allocated node belongs to?

Now, as far as preseeds go, I agree with Blake. Preseeds should remain in the region controller and be passed to the clusters. If we had 100 of cluster controllers, we don't want to be making modifications to preseeds in them, so they should live in the Region.

Thoughts?

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Gavin is right, more of this logic should live in the cluster.

There's no reason that the region should be the exclusive fount of all knowledge regarding what OSes it can install, it exists purely to act as a coordinator between user actions and cluster actions. All of the knowledge for actually dealing with hardware needs to be based in the cluster, so that new clusters can be added at any time that deal with new types of hardware. The clusters are responsible for reporting their capabilities to the region using the protocols available in our RPC versions.

Blake said:
> RPC might be cheap, but making an RPC call everytime the region needs to
> get a title for an operating system or what operating systems are available,
> seems like more network traffic than needed. Also this could be come a
> scale issue, as many clusters would need to be contacted for something
> simple as what operating systems do we support.

We're yet to see this being an actual problem in practice. Given the huge amount of network traffic that MAAS already generates, a few RPCs are nothing. If it becomes an actual problem in the future then we can add a caching layer that persists information for the duration of a transaction in the region.

> If you a user modifies a preseed, then it would need to be modified on
> each cluster, meaning more work for the user, and easy for them to miss
> one or make a mistake. To remove direct traffic from the nodes to the
> region, the cluster could easily proxy the requests to the region.

Preseeds should originate in the clusters but get stored in the database using versioned records. That way we have a history of preseeds, are easier to edit via the UI/API, are easier to reconcile changes against shipped versions, and allows new clusters to supply special preseeds for their hardware.

Andres said:
> Now, the question is. What happens is a machine is requested by juju
> for X operating system that is not supported in Y cluster where the
> allocated node belongs to?

The OS should be in the acquisition constraints. It currently isn't.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Julian/Gavin,

Let me ask you, why should the knowledge of the OS' that maas supports should be on the Cluster instead of the Region?

 It seems to be that the knowledge of what Operating Systems we *support* (Ubuntu XYZ, OperatingSystem2 XYZ, OperatingSystem3 XYZ) belongs to the Region controller, where as the knowledge of what Operating Systems we can/cannot *install* belongs to the Cluster. The Cluster Controller should only care about the images it has.

So in terms of [1], the approach of keeping the OperatingSystemRegistry on the Cluster controller would mean that:

1. In order for us to input License Key's, the Region would need to RPC the Cluster to obtain that information.
2. Once the RPC is sent, then the Region would know what OS' we support and hence, we would only be able to input a License Key for the OS' the Cluster knows about.

This approach leaves open questions:

1. What happens when we try to input the License Keys we know we will need to generate the images for the Cluster to deploy? What happens if the Cluster Controller is not yet available? or What happens if the Cluster Controller is unreachble? We won't be able to input Keys based on the OS'.

2. Are we going to have to RPC all of the available Clusters to figure out what we support and them pull together all of the RPC calls into a single one to allow the Region know what OS' we*support*? (I think this is expensive).

Based on this it seems to me that the right place for us to have the knowledge of what we *support* is the Region and *not* the Cluster. The Cluster would remain the place where we figure out what OS' we *can* install.

[1]: https://code.launchpad.net/~blake-rouse/maas/license-key-views/+merge/224808

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Oh... and keep in mind that before the OperatingSystemRegistry came along, the knowledge of the *Supported* Operating Systems was in maasserver/enum.py. We should continue to have this knowledge in the Region Controller. Note that the *supported* Operating Systems does not mean that they should be shown as *currently available* Operating Systems for some cases, such as for example the Commissioning Release.

Revision history for this message
Gavin Panella (allenap) wrote : Re: [Bug 1319143] Re: move supported operating system into cluster using RPC

On 3 July 2014 16:20, Andres Rodriguez <email address hidden> wrote:
> Julian/Gavin,
>
> Let me ask you, why should the knowledge of the OS' that maas supports
> should be on the Cluster instead of the Region?

Ultimately what a MAAS installation can claim to support and what it
can support right now are both functions of what the clusters are able
to do.

Which versions of Ubuntu a cluster can install, and what kernels it
can put with them, is now data-driven. Previously the versions were
hard-coded, like CentOS and Windows are now (iirc). In time I suspect
those will also become data-driven. The cluster is the consumer for
that data, and the region asks the cluster what's possible.

This also opens the door to custom cluster controllers, ones
specialised to their environments, perhaps shipped with hardware,
working with out-of-the-box region controllers.

...
> 1. What happens when we try to input the License Keys we know we will
> need to generate the images for the Cluster to deploy? What happens if
> the Cluster Controller is not yet available? or What happens if the
> Cluster Controller is unreachble? We won't be able to input Keys based
> on the OS'.

Make the cluster controller reachable is one option. What use is that
license key if a cluster that supports the target OS is not available
or reachable?

Also, the handling and distribution of license keys is conceptually a
separate service. It doesn't necessarily need to use the operating
system registry.

>
> 2. Are we going to have to RPC all of the available Clusters to figure
> out what we support and them pull together all of the RPC calls into a
> single one to allow the Region know what OS' we*support*? (I think this
> is expensive).

Yes, and it's not expensive. It's really quick actually, because
connections are already established, AMP is lightweight, and IO can be
done concurrently. There are helpers already in the codebase to make
this work nicely from a Django request for example. In addition, like
Julian has said, we can optimise if this really does become an issue,
caching for example, but I want to see that happen first.

>
> Based on this it seems to me that the right place for us to have the
> knowledge of what we *support* is the Region and *not* the Cluster. The
> Cluster would remain the place where we figure out what OS' we *can*
> install.

I think the region should model users and networks and other
region-wide things. What operating systems we support is mainly down
to what the clusters can do.

We have spent a lot of energy in thinking and planning about where to
place responsibilities between the region and cluster controllers, to
simplify conceptually, to spread load around, to maybe even get to the
point where a cluster controller can do most things independently.
Moving the OS registry back into the region is the opposite of where
>2 years of work on MAAS has lead us.

Revision history for this message
Gavin Panella (allenap) wrote :

On 3 July 2014 18:00, Andres Rodriguez <email address hidden> wrote:
> Oh... and keep in mind that before the OperatingSystemRegistry came
> along, the knowledge of the *Supported* Operating Systems was in
> maasserver/enum.py. We should continue to have this knowledge in the
> Region Controller. Note that the *supported* Operating Systems does not
> mean that they should be shown as *currently available* Operating
> Systems for some cases, such as for example the Commissioning Release.

Yeah, we ought to kill that with fire. The cluster should report those
operating systems and releases that it can support. For Ubuntu this
means reading the simplestream data, while Windows and CentOS will be
hard-coded for now.

Revision history for this message
Julian Edwards (julian-edwards) wrote :
Download full text (3.8 KiB)

On 04/07/14 03:01, Gavin Panella wrote:
> On 3 July 2014 16:20, Andres Rodriguez <email address hidden> wrote:
>> Julian/Gavin,
>>
>> Let me ask you, why should the knowledge of the OS' that maas supports
>> should be on the Cluster instead of the Region?

> Ultimately what a MAAS installation can claim to support and what it
> can support right now are both functions of what the clusters are able
> to do.

This is a key point. Clusters are not just things that do PXE/TFTP and
store images. They are a source of information.

> Which versions of Ubuntu a cluster can install, and what kernels it
> can put with them, is now data-driven. Previously the versions were
> hard-coded, like CentOS and Windows are now (iirc). In time I suspect
> those will also become data-driven. The cluster is the consumer for
> that data, and the region asks the cluster what's possible.
>
> This also opens the door to custom cluster controllers, ones
> specialised to their environments, perhaps shipped with hardware,
> working with out-of-the-box region controllers.

This is the main driver for the architecture we're pushing for. We want
to be able to drop in clusters that have specialist knowledge. If
everything is originated in the region, that is impossible.

> ...
>> 1. What happens when we try to input the License Keys we know we will
>> need to generate the images for the Cluster to deploy? What happens if
>> the Cluster Controller is not yet available? or What happens if the
>> Cluster Controller is unreachble? We won't be able to input Keys based
>> on the OS'.

If the CC is unavailable, not being able to input licence keys is the
least of your worries - you won't be able to enlist, commission and
start nodes anyway.

Preventing input and alerting the user of this problem at an early stage
is desirable, rather than leaving it until it's too late.

>
> Make the cluster controller reachable is one option. What use is that
> license key if a cluster that supports the target OS is not available
> or reachable?
>
> Also, the handling and distribution of license keys is conceptually a
> separate service. It doesn't necessarily need to use the operating
> system registry.
>
>>
>> 2. Are we going to have to RPC all of the available Clusters to figure
>> out what we support and them pull together all of the RPC calls into a
>> single one to allow the Region know what OS' we*support*? (I think this
>> is expensive).
>
> Yes, and it's not expensive. It's really quick actually, because
> connections are already established, AMP is lightweight, and IO can be
> done concurrently. There are helpers already in the codebase to make
> this work nicely from a Django request for example. In addition, like
> Julian has said, we can optimise if this really does become an issue,
> caching for example, but I want to see that happen first.

Exactly as Gavin says - we already do RPC to all clusters to get
critical info for the UI. Nobody complained it's slow yet.

>
>>
>> Based on this it seems to me that the right place for us to have the
>> knowledge of what we *support* is the Region and *not* the Cluster. The
>> Cluster would remain the place where we figure out what...

Read more...

Changed in maas:
assignee: Blake Rouse (blake-rouse) → Gavin Panella (allenap)
Revision history for this message
Julian Edwards (julian-edwards) wrote :

This was release in 1.6.0

Changed in maas:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.