Default Kafka replication factor is always 1
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla-ansible |
Fix Released
|
Medium
|
Doug Szumski | ||
Rocky |
New
|
Medium
|
Unassigned | ||
Stein |
Fix Released
|
Medium
|
Mark Goddard | ||
Train |
Fix Released
|
Medium
|
Mark Goddard | ||
Ussuri |
Fix Released
|
Medium
|
Mark Goddard | ||
Victoria |
Fix Released
|
Medium
|
Doug Szumski |
Bug Description
By default, the replication factor for topics automatically created in Kafka is 1. This feature is used by Monasca.
This means that there is only ever 1 'in-sync replica' - the leader of the topic. When you deploy Kafka in a clustered configuration, this should be increased so that there is at least 1 actual replica of all partitions in the topic, allowing a single node in the cluster to fail without the topic going down.
This problem didn't show up before, because the Kafka client used by Monasca ignored the minimum insync replica setting. Now that we use the Confluent Kafka client, we see errors like this in the Monasca logs when deploying Monasca in a clustered configuration (3 nodes):
2020-07-22 13:54:04.446 40 ERROR monasca_
2020-07-22 13:54:04.446 40 ERROR monasca_
2020-07-22 13:54:04.446 40 ERROR monasca_
2020-07-22 13:54:04.446 40 ERROR monasca_
2020-07-22 13:54:04.446 40 ERROR monasca_
2020-07-22 13:54:04.446 40 ERROR monasca_
2020-07-22 13:54:04.446 40 ERROR monasca_
2020-07-22 13:54:04.446 40 ERROR monasca_
2020-07-22 13:54:04.446 40 ERROR monasca_
Changed in kolla-ansible: | |
assignee: | nobody → Doug Szumski (dszumski) |
Example of how a 1 in 3 node failure could look (and the cluster should keep on working).
``` [kafka@ control02 /opt/kafka/bin]$ ./kafka-topics.sh --describe --zookeeper localhost --topic metrics
(kafka)
Topic:metrics PartitionCount:30 ReplicationFactor:2 Configs:
Topic: metrics Partition: 0 Leader: 1002 Replicas: 1002,1003 Isr: 1002
Topic: metrics Partition: 1 Leader: 1001 Replicas: 1003,1001 Isr: 1001
Topic: metrics Partition: 2 Leader: 1001 Replicas: 1001,1002 Isr: 1001,1002
Topic: metrics Partition: 3 Leader: 1002 Replicas: 1002,1001 Isr: 1002,1001
Topic: metrics Partition: 4 Leader: 1002 Replicas: 1003,1002 Isr: 1002
Topic: metrics Partition: 5 Leader: 1001 Replicas: 1001,1003 Isr: 1001
Topic: metrics Partition: 6 Leader: 1002 Replicas: 1002,1003 Isr: 1002
Topic: metrics Partition: 7 Leader: 1001 Replicas: 1003,1001 Isr: 1001
Topic: metrics Partition: 8 Leader: 1001 Replicas: 1001,1002 Isr: 1001,1002
Topic: metrics Partition: 9 Leader: 1002 Replicas: 1002,1001 Isr: 1002,1001
Topic: metrics Partition: 10 Leader: 1002 Replicas: 1003,1002 Isr: 1002
Topic: metrics Partition: 11 Leader: 1001 Replicas: 1001,1003 Isr: 1001
Topic: metrics Partition: 12 Leader: 1002 Replicas: 1002,1003 Isr: 1002
Topic: metrics Partition: 13 Leader: 1001 Replicas: 1003,1001 Isr: 1001
Topic: metrics Partition: 14 Leader: 1001 Replicas: 1001,1002 Isr: 1001,1002
Topic: metrics Partition: 15 Leader: 1002 Replicas: 1002,1001 Isr: 1002,1001
Topic: metrics Partition: 16 Leader: 1002 Replicas: 1003,1002 Isr: 1002
Topic: metrics Partition: 17 Leader: 1001 Replicas: 1001,1003 Isr: 1001
Topic: metrics Partition: 18 Leader: 1002 Replicas: 1002,1003 Isr: 1002
Topic: metrics Partition: 19 Leader: 1001 Replicas: 1003,1001 Isr: 1001
Topic: metrics Partition: 20 Leader: 1001 Replicas: 1001,1002 Isr: 1001,1002
Topic: metrics Partition: 21 Leader: 1002 Replicas: 1002,1001 Isr: 1002,1001
Topic: metrics Partition: 22 Leader: 1002 Replicas: 1003,1002 Isr: 1002
Topic: metrics Partition: 23 Leader: 1001 Replicas: 1001,1003 Isr: 1001
Topic: metrics Partition: 24 Leader: 1002 Replicas: 1002,1003 Isr: 1002
Topic: metrics Partition: 25 Leader: 1001 Replicas: 1003,1001 Isr: 1001
Topic: metrics Partition: 26 Leader: 1001 Replicas: 1001,1002 I...