Comment 2 for bug 1834191

Revision history for this message
Mark Goddard (mgoddard) wrote :

Another issue was found. During the upgrade, both rocky and stein mariadb containers can be running. In Stein we switched from xtrabackup to mariabackup for the galera state sync, which means that stein and rocky containers cannot sync. I didn't hit this locally, but it was seen in CI. Here are the relevant error messages from the primary node at that time:

2019-06-25 19:05:50 140049019632384 [Note] WSREP: sst_donor_thread signaled with 0
2019-06-25 19:05:50 140044555761408 [Note] WSREP: async IST sender starting to serve tcp://10.209.96.149:4568 sending 8524-8557
sh: wsrep_sst_mariabackup: command not found
2019-06-25 19:05:50 140044564154112 [ERROR] WSREP: Failed to read from: wsrep_sst_mariabackup --role 'donor' --address '10.209.96.149:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --binlog 'mysql-bin' --gtid 'b99680cf-9773-11e9-b90b-e6ca413f0ef1:8523' --gtid-domain-id '0' --bypass
2019-06-25 19:05:50 140044564154112 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'donor' --address '10.209.96.149:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --binlog 'mysql-bin' --gtid 'b99680cf-9773-11e9-b90b-e6ca413f0ef1:8523' --gtid-domain-id '0' --bypass: 2 (No such file or directory)
2019-06-25 19:05:50 140044564154112 [ERROR] WSREP: Command did not run: wsrep_sst_mariabackup --role 'donor' --address '10.209.96.149:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --binlog 'mysql-bin' --gtid 'b99680cf-9773-11e9-b90b-e6ca413f0ef1:8523' --gtid-domain-id '0' --bypass
2019-06-25 19:05:50 140049068836608 [Warning] WSREP: 1.0 (secondary1): State transfer to 0.0 (secondary2) failed: -2 (No such file or directory)

http://logs.openstack.org/63/667363/3/check/kolla-ansible-centos-source-upgrade-ceph/479cd15/secondary1/logs/kolla/mariadb/mariadb.txt.gz#_2019-06-25_19_05_50

I think we need to shutdown all nodes and perform a recovery in this case.