'chown failed: failed to look up user stack'

Bug #1887708 reported by John Fulton
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
John Fulton

Bug Description

If you deploy using a separate undercloud with separate overcloud nodes and not using the standalone installer then the ceph deployment fails on the following task:

- name: genereate create_ceph_ansible_remote_tmp playbook

and looking at the logs in /home/stack/config-download/config-download-latest/ceph-ansible we can see that the overcloud nodes tried to change ownership of the temp directory to the stack user which doesn't exist on the overcloud nodes [1]

This bug was introduced by:

 https://review.opendev.org/#/c/739521/

We didn't see it in CI because the standalone hides this issue.

[1]

[CentOS-8.2 - stack@undercloud ceph-ansible]$ head -15 create_ceph_ansible_remote_tmp.log
2020-07-15 17:56:49,992 p=399413 u=root n=ansible | [WARNING]: Skipping key (deprecated) in group (overcloud) as it is not a
mapping, it is a <class 'ansible.parsing.yaml.objects.AnsibleUnicode'>

2020-07-15 17:56:50,294 p=399413 u=root n=ansible | PLAY [all] *********************************************************************
2020-07-15 17:56:50,306 p=399413 u=root n=ansible | TASK [create ceph_ansible_remote_tmp on all nodes with necessary ownership] ****
2020-07-15 17:56:50,306 p=399413 u=root n=ansible | Wednesday 15 July 2020 17:56:50 +0000 (0:00:00.026) 0:00:00.026 ********
2020-07-15 17:56:50,643 p=399413 u=root n=ansible | changed: [undercloud]
2020-07-15 17:56:51,089 p=399413 u=root n=ansible | fatal: [oc0-controller-2]: FAILED! => changed=false
  gid: 0
  group: root
  mode: '0755'
  msg: 'chown failed: failed to look up user stack'
  owner: root
  path: /tmp/ceph_ansible_tmp
  secontext: unconfined_u:object_r:user_tmp_t:s0
[CentOS-8.2 - stack@undercloud ceph-ansible]$

Revision history for this message
John Fulton (jfulton-org) wrote :

This is not something we want to run on all overcloud nodes:

[CentOS-8.2 - stack@undercloud ceph-ansible]$ cat create_ceph_ansible_remote_tmp.yml
- hosts: all
  gather_facts: no
  tasks:
    # Avoiding the following by creating directory owned by user who will
    # SSH into nodes (not root). When root needs to write to this directory
    # it will not have permission problems by definition. As per ansible:
    # """
    # Module remote_tmp /tmp/ceph_ansible_tmp did not exist and was created
    # with a mode of 0700, this may cause issues when running as another user.
    # To avoid this, create the remote_tmp dir with the correct permissions
    # manually.
    # """
    - name: create ceph_ansible_remote_tmp on all nodes with necessary ownership
      become: true
      file:
        path: "/tmp/ceph_ansible_tmp"
        owner: "stack"
        group: "stack"
        mode: "700"
        state: directory
[CentOS-8.2 - stack@undercloud ceph-ansible]$

Revision history for this message
John Fulton (jfulton-org) wrote :

1884816 [1] begat 1886497 [2] who begat 1887708 [3]

root cause of [1] was ansible 2.9.9 vs 2.9.10
root cause of [2] was standalone not having tripleo-admin user
root cause of [3] was overdcloud node not having the stack user

The fix to this bug should fix [2] correctly and fix [3].

We need to be careful about this spot of the code pertaining to which user has permission to what and the standalone masks that so don't use it to verify any changes involving the users that ceph-ansible runs as.

[1] https://bugs.launchpad.net/tripleo/+bug/1884816
[2] https://bugs.launchpad.net/tripleo/+bug/1886497
[3] https://bugs.launchpad.net/tripleo/+bug/1887708

Revision history for this message
John Fulton (jfulton-org) wrote :

There's a process to create the user who runs ansible on the overcloud [1]. We should use the same user. Does the standalone run this process?

[1] https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_create_admin/tasks/create_user.yml#L18

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/741505

Changed in tripleo:
assignee: John Fulton (jfulton-org) → Juan Badia Payno (jbadiapa)
status: Triaged → In Progress
Revision history for this message
wes hayutin (weshayutin) wrote :

Any updates here? Thanks

Revision history for this message
John Fulton (jfulton-org) wrote :

Working on it. Hope to have a patch up today.

Changed in tripleo:
assignee: Juan Badia Payno (jbadiapa) → John Fulton (jfulton-org)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/742213

Changed in tripleo:
assignee: John Fulton (jfulton-org) → Juan Badia Payno (jbadiapa)
Changed in tripleo:
assignee: Juan Badia Payno (jbadiapa) → John Fulton (jfulton-org)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/742287

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/742291

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/742293

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ansible (master)

Change abandoned by Juan Badia Payno (<email address hidden>) on branch: master
Review: https://review.opendev.org/741505
Reason: In favor of https://review.opendev.org/#/c/742287

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/742405

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/train)

Change abandoned by John Fulton (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/742293
Reason: this test served its purpose

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (stable/train)

Reviewed: https://review.opendev.org/742291
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=42b7a23f04831e360c89d7423854b11090626629
Submitter: Zuul
Branch: stable/train

commit 42b7a23f04831e360c89d7423854b11090626629
Author: John Fulton <email address hidden>
Date: Tue Jul 21 21:27:18 2020 +0000

    Use ansible_user only if ANSIBLE_REMOTE_USER is unset

    In a non-standalone deployment config-download is called
    with the ANSIBLE_REMOTE_USER defined. This variable is
    undefined when using standalone, which is what caused the
    related bug. Unfortunately the fix for the related bug
    introduced the bug this patch closes. This patch should
    work for both non-standalone and standalone deployments.

    Change-Id: Ie7234bf113204e8cf847257554b88b899b45d5ee
    Related-Bug: #1886497
    Closes-Bug: #1887708
    (cherry picked from commit 67c4a4f58e9b8af07c277b1f6c20fa7af116eaf7)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (stable/ussuri)

Reviewed: https://review.opendev.org/742405
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=f114ce08e290d23f709a0d9b56c5b9084949dd3e
Submitter: Zuul
Branch: stable/ussuri

commit f114ce08e290d23f709a0d9b56c5b9084949dd3e
Author: John Fulton <email address hidden>
Date: Tue Jul 21 21:27:18 2020 +0000

    Use ansible_user only if ANSIBLE_REMOTE_USER is unset

    In a non-standalone deployment config-download is called
    with the ANSIBLE_REMOTE_USER defined. This variable is
    undefined when using standalone, which is what caused the
    related bug. Unfortunately the fix for the related bug
    introduced the bug this patch closes. This patch should
    work for both non-standalone and standalone deployments.

    Change-Id: Ie7234bf113204e8cf847257554b88b899b45d5ee
    Related-Bug: #1886497
    Closes-Bug: #1887708
    (cherry picked from commit 67c4a4f58e9b8af07c277b1f6c20fa7af116eaf7)

tags: added: in-stable-ussuri
Changed in tripleo:
milestone: victoria-1 → victoria-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/742287
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=67c4a4f58e9b8af07c277b1f6c20fa7af116eaf7
Submitter: Zuul
Branch: master

commit 67c4a4f58e9b8af07c277b1f6c20fa7af116eaf7
Author: John Fulton <email address hidden>
Date: Tue Jul 21 21:27:18 2020 +0000

    Use ansible_user only if ANSIBLE_REMOTE_USER is unset

    In a non-standalone deployment config-download is called
    with the ANSIBLE_REMOTE_USER defined. This variable is
    undefined when using standalone, which is what caused the
    related bug. Unfortunately the fix for the related bug
    introduced the bug this patch closes. This patch should
    work for both non-standalone and standalone deployments.

    Change-Id: Ie7234bf113204e8cf847257554b88b899b45d5ee
    Related-Bug: #1886497
    Closes-Bug: #1887708

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 0.6.0

This issue was fixed in the openstack/tripleo-ansible 0.6.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.