schema validation not in line with specification

Bug #1983306 reported by Peter Surda
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
High
Alberto Contreras

Bug Description

Related to #1983303. My user-data begins with #include, as it's not a "Cloud Config Data" but an "Include File" as described in the official documentation. However, this causes the validator `cloud-init schema --system` to complain that

```
Error:
Cloud config schema errors: format-l1.c1: File None needs to begin with "#cloud-config"
```

Ok I thought, I just manually add "#cloud-config" at the top and re-test:

```
Error:
Cloud-config is not a YAML dict.
```

Well, it's not a YAML dict because it's not a cloud config data but an include file, which isn't in the YAML format.

See the specification: https://cloudinit.readthedocs.io/en/latest/topics/format.html

Also look at the implementation in `user_data.py`, function `_do_include`. As you can see, this file isn't processed as YAML but parsed line by line. So the specification and implementation agree, but the schema validator doesn't and thinks it should process it as YAML.

This wouldn't be a practical problem for me, but due to #1983303 I get mangled logs and can't work around it.

description: updated
Revision history for this message
Alberto Contreras (aciba) wrote :

Reproduced with:

```bash
cat > /tmp/lp1983306 <<EOF
#include
http://www.canonical.com
EOF

lxc launch ubuntu-daily:kinetic lp1983306
lxc file push /tmp/lp1983306 lp1983306/root/
lxc exec lp1983306 -- cloud-init status --wait

$ lxc exec lp1983306 -- cloud-init --version
/usr/bin/cloud-init 22.2-64-g1fcd55d6-0ubuntu1~22.10.1

$ lxc exec lp1983306 -- cloud-init schema -c /root/lp1983306
Error:
Cloud config schema errors: format-l1.c1: File /root/lp1983306 needs to begin with "#cloud-config"

$ lxc exec lp1983306 -- cloud-init schema -c /root/lp1983306 --annotate
#include # E1
http://www.google.com

# Errors: -------------
# E1: File /root/lp1983306 needs to begin with "#cloud-config"
```

Note that the query command does also not resolve include directives:

```bash
$ lxc exec lp1983306 -- cloud-init query -u /root/lp1983306 userdata.asdf
Traceback (most recent call last):
  File "/usr/bin/cloud-init", line 33, in <module>
    sys.exit(load_entry_point('cloud-init==22.2', 'console_scripts', 'cloud-init')())
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 1063, in main
    retval = util.log_time(
  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2621, in log_time
    ret = func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/query.py", line 291, in handle_args
    response = _find_instance_data_leaf_by_varname_path(
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/query.py", line 230, in _find_instance_data_leaf_by_varname_path
    jinja_vars_with_aliases = jinja_vars_with_aliases[key_path_part]
TypeError: string indices must be integers
```

Changed in cloud-init:
status: New → Confirmed
Changed in cloud-init:
status: Confirmed → Triaged
importance: Undecided → Medium
Revision history for this message
Brett Holman (holmanb) wrote :

Hi Peter,

Thanks for reporting. I wouldn't expect validation to succeed on include format user-data for the following reason:

Per the man page[1] and user docs[2], the schema command is for validating cloud-config files. In the future we could potentially add validation for include or other user-data formats, but for now it doesn't support anything besides cloud-config files. If you see anywhere that the docs claim that the schema validator supports all user-data formats, please let us know so we can correct it. I don't see anywhere at this time, but it's possible I missed something in the docs.

> This wouldn't be a practical problem for me, but due to #1983303 I get mangled logs and can't work around it.

Can you expand on this statement please? How does this block you?

I'm marking this as Incomplete until we get more information on how mangled logs cannot be worked around.

[1] https://github.com/canonical/cloud-init/blob/main/doc/man/cloud-init.1
[2] https://cloudinit.readthedocs.io/en/latest/topics/cli.html?highlight=schema#schema

Changed in cloud-init:
status: Triaged → Incomplete
Revision history for this message
Peter Surda (surda) wrote :

> Can you expand on this statement please? How does this block you?

If you use #include, it results in truncated logs. This happens automatically and there is no obvious way to disable it.

Revision history for this message
Chad Smith (chad.smith) wrote :

Thanks a lot Peter for the bugs.

I agree with Brett on the sentiment that the following cmd should fail because the `-c` option is specifically for validating strict `#cloud-config` files.

   cloud-init schema -c <some_cloud_config_file>

But two things look like they are a bug here:
 - Our schema validation is working against userdata_raw instead of the full rendered userdata
 - `cloud-init schema --system` is supposed to be a helper/alias that should have smarts to look at the fully rendered system user-data instead of the raw content provided to the instance (which could be #include's for pulling things from a remote source)

Both touch points should be fixed. As mentioned by Peter, this will break anyone using non `#cloud-config` user-data parts to the instance.

# Because cloud-init

lxc exec lp1983306 -- cloud-init schema -c /root/lp1983306 --annotate

Changed in cloud-init:
importance: Medium → High
Revision history for this message
Chad Smith (chad.smith) wrote :

I'm going to bump this to high so we can resolve it before cloud-init 22.3

Revision history for this message
Peter Surda (surda) wrote :

I found out that while running in a terminal, the inconsistency occurs as I described above. However that appears to be an insufficient condition to cause mangled logs as well, that needs additional errors (in my case jinja2 template errors) while rendering the full cloud config. The message therefore may be confusing. In my case, it suggests it's the "#include" causing the problems, but it appears that it's an invalid jinja2 template (well, technically a missing variable resulting in a rendering error) causing the problem (mangled logs).

To summarise the point I'm trying to make, for me the main problem is the **practical** confusion in messages between components of cloud-init, not the **theoretical** inconsistency between specification and validation, as I thought originally.

In case I am confusing, here is a practical example:

> root@worker-z-3:~# cloud-init schema --system
> Error:
> Cloud config schema errors: format-l1.c1: File None needs to begin with "#cloud-config"

This output is the same on a system that has mangled logs, and one that doesn't have mangled logs. Someone who has mangled logs (e.g. me) may erroneously conclude that it's the validator causing his mangled logs. The system that has mangled logs has actually a different cause for the problems, but the validator doesn't show this cause as it doesn't get past the "schema errors" message above.

Revision history for this message
Chad Smith (chad.smith) wrote :
Download full text (3.3 KiB)

This makes sense. I think we can minimally fix the messaging seen on the commandline when running `sudo cloud-init schema --system` to actually have it consume the rendered #cloud-config or #template: jinja user-data instead of the raw `#include <your-url>` as the potential source for invalid user-data.

Minimally I think we need to fix https://github.com/canoncloud-config ical/cloud-init/blob/main/cloudinit/config/schema.py#L621-L623 to validated the processed/rendered userdata instead of "userdata_raw" so that `sudo cloud-init schema --system` provides a concise error about what the procesed invalid user-data is.

I can validate that we get a less than desirable message from `sudo cloud-init schema --system` when we launch a container with a valid #include containing valid #cloud-config.

reproducer:
# terminal 1 on host system 192.168.1.8
$ mkdir -p instance-data
$ cd instance-data
$ cat > user-data <<EOF
#cloud-config
ssh_import_id: [chad.smith]
EOF

$ python3 -m http.server
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...

# terminal 2 on host system 192.168.1.8
$ cat > include.yaml <<EOF
#include http://192.168.1.8:8000/user-data
EOF
$ lxc launch ubuntu-daily:kinetic include-k -c user.user-data="$(cat include.yaml)"

$ lxc exec include-k bash
root@include-k:~#

# See raw userdata (this is ok/desired)
root@include-k:~# cloud-init query userdata
#include http://192.168.1.8:8000/user-data

# verify cloud-init schema --system alias for rendered user-data (not correct)
root@include-k:~# cloud-init schema --system --annotate
#include http://192.168.1.8:8000/user-data # E1
# Errors: -------------
# E1: File None needs to begin with "#cloud-config"

What I think we want:
root@include-k:~# cloud-init schema --system --annotate
Valid cloud-config: system userdata

This comes with something like the following diff, but needs tweaking and unittest coverage
diff --git a/cloudinit/config/schema.py b/cloudinit/config/schema.py
index d62073d0..aa4ec3a1 100644
--- a/cloudinit/config/schema.py
+++ b/cloudinit/config/schema.py
@@ -18,8 +17,8 @@ from typing import TYPE_CHECKING, List, NamedTuple, Optional, Type, Union, cast
 import yaml

 from cloudinit import importer, safeyaml
-from cloudinit.cmd.devel import read_cfg_paths
-from cloudinit.util import error, get_modules_from_dir, load_file
+from cloudinit.stages import Init
+from cloudinit.util import encode_text, error, get_modules_from_dir, load_file

 try:
     from jsonschema import ValidationError as _ValidationError
@@ -617,9 +616,17 @@ def validate_cloudconfig_file(config_path, schema, annotate=False):
                 "Unable to read system userdata as non-root user."
                 " Try using sudo"
             )
- paths = read_cfg_paths()
- user_data_file = paths.get_ipath_cur("userdata_raw")
- content = load_file(user_data_file, decode=False)
+ init = Init(ds_deps=[])
+ ds = init.fetch("trust")
+ ud = ds.get_userdata()
+ content = None
+ for part in ud.walk():
+ if part.get_content_type() == "text/cloud-config":
+ content = encode_text(part.get_payload())
+ break
+ ...

Read more...

Changed in cloud-init:
assignee: nobody → Alberto Contreras (aciba)
status: Incomplete → In Progress
Revision history for this message
Alberto Contreras (aciba) wrote :

Thanks, chad.smith, for the proposed patch. I have added unit tests to it and created a PR here: https://github.com/canonical/cloud-init/pull/1644

James Falcon (falcojr)
Changed in cloud-init:
status: In Progress → Fix Committed
Revision history for this message
Brett Holman (holmanb) wrote : Fixed in cloud-init version 22.3.

This bug is believed to be fixed in cloud-init in version 22.3. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.