Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on invalid partition in Metadata #132

Closed
laxpio opened this issue Aug 4, 2014 · 27 comments
Closed

Crash on invalid partition in Metadata #132

laxpio opened this issue Aug 4, 2014 · 27 comments
Labels

Comments

@laxpio
Copy link

laxpio commented Aug 4, 2014

core info as follow:
(gdb) bt
#0 0x00007f4440f93885 in raise () from /lib64/libc.so.6
#1 0x00007f4440f95065 in abort () from /lib64/libc.so.6
#2 0x00000000004759bd in rd_kafka_crash (file=, line=, function=,

rk=0x7f440c000f80, reason=<value optimized out>) at rdkafka.c:1609

#3 0x000000000048601d in rd_kafka_topic_leader_update (rkb=0x7f440c005c30, mdt=0x7f436400179d) at rdkafka_topic.c:642
#4 rd_kafka_topic_metadata_update (rkb=0x7f440c005c30, mdt=0x7f436400179d) at rdkafka_topic.c:1001
#5 0x000000000047cb10 in rd_kafka_metadata_handle (rkb=0x7f440c005c30, err=0, reply=0x7f4364000940, request=0x7f4364000d20,

opaque=<value optimized out>) at rdkafka_broker.c:937

#6 rd_kafka_broker_metadata_reply (rkb=0x7f440c005c30, err=0, reply=0x7f4364000940, request=0x7f4364000d20,

opaque=<value optimized out>) at rdkafka_broker.c:988

#7 0x0000000000480477 in rd_kafka_req_response (rkb=0x7f440c005c30) at rdkafka_broker.c:1265
#8 rd_kafka_recv (rkb=0x7f440c005c30) at rdkafka_broker.c:1457
#9 0x0000000000480a20 in rd_kafka_broker_io_serve (rkb=0x7f440c005c30) at rdkafka_broker.c:2351
#10 0x0000000000481e2a in rd_kafka_broker_ua_idle (arg=) at rdkafka_broker.c:2370
#11 rd_kafka_broker_thread_main (arg=) at rdkafka_broker.c:3904
#12 0x00007f4441ea47f1 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f4441046ccd in clone () from /lib64/libc.so.6

@edenhill
Copy link
Contributor

edenhill commented Aug 4, 2014

Can you try the following commands in gdb:

  frame 3
  p *mdt
  p *rkt
  up
  p j
  p mdt->partitions[j]

@edenhill edenhill added the bug label Aug 4, 2014
@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

(gdb) frame 3
#3 0x000000000048601d in rd_kafka_topic_leader_update (rkb=0x7f12b0007070, mdt=0x7f1318000ff5) at rdkafka_topic.c:642
642 rdkafka_topic.c: No such file or directory.
in rdkafka_topic.c
(gdb) p *mdt
No symbol "mdt" in current context.
(gdb) p *rkt
value has been optimized out
(gdb) up
#4 rd_kafka_topic_metadata_update (rkb=0x7f12b0007070, mdt=0x7f1318000ff5) at rdkafka_topic.c:1001
1001 in rdkafka_topic.c
(gdb) p j
$1 =
(gdb) p mdt->parttions[j]
There is no member named parttions.

can not get any more info

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

(gdb) frame 3
#3 0x000000000048601d in rd_kafka_topic_leader_update (rkb=0x7fee6c005c30, mdt=0x7fee1800a34d) at rdkafka_topic.c:642
642 in rdkafka_topic.c
(gdb) info f
Stack level 3, frame at 0x7fee5aff7e10:
rip = 0x48601d in rd_kafka_topic_leader_update (rdkafka_topic.c:642); saved rip 0x47cb10
inlined into frame 4, caller of frame at 0x7fee5aff7d10
source language c.
Arglist at unknown address.
Locals at unknown address, Previous frame's sp is 0x7fee5aff7d10
Saved registers:
rbx at 0x7fee5aff7cf8, rbp at 0x7fee5aff7d00, rip at 0x7fee5aff7d08

@edenhill
Copy link
Contributor

edenhill commented Aug 4, 2014

which version are you on? (git revision)

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

message as follow is hellpful for this issues?

(gdb) up
#4 rd_kafka_topic_metadata_update (rkb=0x7fee6c005c30, mdt=0x7fee1800a34d) at rdkafka_topic.c:1001
1001 in rdkafka_topic.c
(gdb) info args
rkb = 0x7fee6c005c30
mdt = 0x7fee1800a34d
(gdb) p *rkb
$3 = {rkb_link = {tqe_next = 0x7fee180032f0, tqe_prev = 0x7fee6c0056c0}, rkb_nodeid = -1, rkb_rsal = 0x7fee18000910,
rkb_t_rsal_last = 0, rkb_s = 23, rkb_pfd = {fd = 23, events = 1, revents = 1}, rkb_corrid = 2, rkb_ops = {rkq_lock = {__data = {
__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = '\000' <repeats 39 times>, __align = 0}, rkq_cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0,
__wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
__size = '\000' <repeats 47 times>, __align = 0}, rkq_q = {tqh_first = 0x0, tqh_last = 0x7fee6c005cc0}, rkq_qlen = 0,
rkq_qsize = 0, rkq_refcnt = 1, rkq_flags = 0}, rkb_toppars = {tqh_first = 0x0, tqh_last = 0x7fee6c005ce8}, rkb_toppar_lock = {
__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0,
__nr_writers_queued = 0, __writer = 0, __shared = 0, __pad1 = 0, __pad2 = 0, __flags = 0},
__size = '\000' <repeats 55 times>, __align = 0}, rkb_toppar_cnt = 0, rkb_ts_fetch_backoff = 0, rkb_fetching = 0,
rkb_state = RD_KAFKA_BROKER_STATE_UP, rkb_source = RD_KAFKA_CONFIGURED, rkb_c = {tx_bytes = 50, tx = 2, tx_err = 0,
tx_retries = 0, rx_bytes = 3844, rx = 2, rx_err = 0, rx_corrid_err = 0}, rkb_ts_metadata_poll = 1233949896202,
rkb_metadata_fast_poll_cnt = 0, rkb_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0,
__spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0},
rkb_thread = 140661705643776, rkb_refcnt = 1, rkb_rk = 0x7fee6c000f80, rkb_err = {msg = '\000' <repeats 511 times>, err = 0},
rkb_recv_buf = 0x0, rkb_outbufs = {rkbq_bufs = {tqh_first = 0x0, tqh_last = 0x7fee6c005ff0}, rkbq_cnt = 0}, rkb_waitresps = {
rkbq_bufs = {tqh_first = 0x0, tqh_last = 0x7fee6c006008}, rkbq_cnt = 0}, rkb_retrybufs = {rkbq_bufs = {tqh_first = 0x0,
tqh_last = 0x7fee6c006020}, rkbq_cnt = 0}, rkb_avg_rtt = {ra_v = {maxv = 46540, minv = 38792, avg = 0, sum = 85332, cnt = 2,
start = 1233929808664}, ra_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0,
__list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, ra_type = RD_KAFKA_AVG_GAUGE},
rkb_name = "10.153.133.206:9092/bootstrap", '\000' <repeats 98 times>,
rkb_nodename = "10.153.133.206:9092", '\000' <repeats 108 times>}
(gdb) p *mdt
$4 = {topic = 0x7fee1800a40d "RealStatis_test", partition_cnt = 8, partitions = 0x7fee1800a41f, err = RD_KAFKA_RESP_ERR_NO_ERROR}
(gdb) p mdt->partitions[8]
$5 = {id = 7, err = RD_KAFKA_RESP_ERR_INVALID_MSG_SIZE, leader = 5, replica_cnt = 7, replicas = 0x600000001, isr_cnt = 7,
isrs = 0x3}

@edenhill
Copy link
Contributor

edenhill commented Aug 4, 2014

 frame 4
 p j

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

kafka 0.8.0
rdkafka_master

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

frame 4
p j
$1 =

@edenhill
Copy link
Contributor

edenhill commented Aug 4, 2014

Is this crash reproducible?

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

yes,i write to 2 topic,topic_1 is ok ,other topic
crash

@edenhill
Copy link
Contributor

edenhill commented Aug 4, 2014

can you reproduce this with debug = "topic,metadata" enabled?
rd_kafka_conf_set(rk_conf, "debug", "topic,metadata", errstr);

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

the core
(gdb) p *mdt
$1 = {topic = 0x7ff17c006b8d "RealStatis_test", partition_cnt = 8, partitions = 0x7ff17c006b9f, err = RD_KAFKA_RESP_ERR_NO_ERROR}
(gdb) p mdt->partitions[8]
$2 = {id = 7, err = RD_KAFKA_RESP_ERR_INVALID_MSG_SIZE, leader = 5, replica_cnt = 7, replicas = 0x600000001, isr_cnt = 7,
isrs = 0x3}
partition is =7 leader =5

but in kafka
topic: RealStatis_test partition: 7 leader: 3 replicas: 6,3,4 isr: 3

the info can not fixed.

@edenhill
Copy link
Contributor

edenhill commented Aug 4, 2014

I think you should print p mdt->partitions[7], not 8.

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

p mdt->partitions[7]
$2 = {id = 15, err = RD_KAFKA_RESP_ERR_NO_ERROR, leader = 0, replica_cnt = 3, replicas = 0x7ff17c006d4f, isr_cnt = 1,
isrs = 0x7ff17c006d5b}

kafka info:
topic: RealStatis_test partition: 15 leader: 0 replicas: 6,0,1 isr: 0

the info is ok

@edenhill
Copy link
Contributor

edenhill commented Aug 4, 2014

Very strange, please do this:

 p mdt->partition_cnt
 p mdt->partitions[0]
 p mdt->partitions[1]
 p mdt->partitions[2]
 p mdt->partitions[3]
 p mdt->partitions[4]
 ..... continue until  ..->parition_cnt - 1

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

partition_cnt is partition size?

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

(gdb) p mdt->partition_cnt
$9 = 8
(gdb) p mdt->partitions[0]
$10 = {id = 0, err = RD_KAFKA_RESP_ERR_NO_ERROR, leader = 7, replica_cnt = 3, replicas = 0x7ff17c006cdf, isr_cnt = 1,
isrs = 0x7ff17c006ceb}
(gdb) p mdt->partitions[1]
$11 = {id = 2, err = RD_KAFKA_RESP_ERR_NO_ERROR, leader = 6, replica_cnt = 3, replicas = 0x7ff17c006cef, isr_cnt = 1,
isrs = 0x7ff17c006cfb}
(gdb) p mdt->partitions[2]
$12 = {id = 4, err = RD_KAFKA_RESP_ERR_NO_ERROR, leader = 3, replica_cnt = 3, replicas = 0x7ff17c006cff, isr_cnt = 1,
isrs = 0x7ff17c006d0b}
(gdb) p mdt->partitions[3]
$13 = {id = 5, err = RD_KAFKA_RESP_ERR_NO_ERROR, leader = 4, replica_cnt = 3, replicas = 0x7ff17c006d0f, isr_cnt = 1,
isrs = 0x7ff17c006d1b}
(gdb) p mdt->partitions[4]
$14 = {id = 8, err = RD_KAFKA_RESP_ERR_NO_ERROR, leader = 7, replica_cnt = 3, replicas = 0x7ff17c006d1f, isr_cnt = 1,
isrs = 0x7ff17c006d2b}
(gdb) p mdt->partitions[5]
$15 = {id = 10, err = RD_KAFKA_RESP_ERR_NO_ERROR, leader = 3, replica_cnt = 3, replicas = 0x7ff17c006d2f, isr_cnt = 1,
isrs = 0x7ff17c006d3b}
(gdb) p mdt->partitions[6]
$16 = {id = 13, err = RD_KAFKA_RESP_ERR_NO_ERROR, leader = 7, replica_cnt = 3, replicas = 0x7ff17c006d3f, isr_cnt = 1,
isrs = 0x7ff17c006d4b}
(gdb) p mdt->partitions[7]
$17 = {id = 15, err = RD_KAFKA_RESP_ERR_NO_ERROR, leader = 0, replica_cnt = 3, replicas = 0x7ff17c006d4f, isr_cnt = 1,
isrs = 0x7ff17c006d5b}

@edenhill
Copy link
Contributor

edenhill commented Aug 4, 2014

Yes, number of partitions
Den 4 aug 2014 12:55 skrev "Chen" [email protected]:

partition_cnt is partition size?


Reply to this email directly or view it on GitHub
#132 (comment).

@edenhill
Copy link
Contributor

edenhill commented Aug 4, 2014

The partitions are not consecutive, havent seen this before.
Can you do this:
bin/kafka-topics.sh --zookeeper <zookeeper-address> --describe --topic <topicname>

If you have rdkafka_example available (in librdkafka/examples directory), please do this too:
rdkafka_example -b <broker-address> -L -t <topicname>

thanks

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

my partition nums is 16,why rdkafka just only gain 8?

@edenhill
Copy link
Contributor

edenhill commented Aug 4, 2014

Can you provide the output of the two above commands?

@edenhill
Copy link
Contributor

edenhill commented Aug 4, 2014

Oh, and run the rdkafka_example .. command for each broker you have.

I think one of the brokers is reporting a corrupted partition set

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

./kafka-list-topic.sh --zookeeper 10.135.3.34 --topic RealStatis_test
topic: RealStatis_test partition: 0 leader: 0 replicas: 0,4,5 isr: 5,4,0
topic: RealStatis_test partition: 1 leader: 0 replicas: 0,5,6 isr: 0,1
topic: RealStatis_test partition: 2 leader: 6 replicas: 2,6,7 isr: 6,7,2
topic: RealStatis_test partition: 3 leader: 2 replicas: 2,7,0 isr: 2
topic: RealStatis_test partition: 4 leader: 3 replicas: 3,0,1 isr: 3,0,4
topic: RealStatis_test partition: 5 leader: 4 replicas: 4,1,2 isr: 4,2,5
topic: RealStatis_test partition: 6 leader: 2 replicas: 5,2,3 isr: 2
topic: RealStatis_test partition: 7 leader: 3 replicas: 6,3,4 isr: 3,7
topic: RealStatis_test partition: 8 leader: 0 replicas: 0,5,6 isr: 0,5,6
topic: RealStatis_test partition: 9 leader: 0 replicas: 0,2,3 isr: 0,3
topic: RealStatis_test partition: 10 leader: 2 replicas: 2,7,0 isr: 0,2,7
topic: RealStatis_test partition: 11 leader: 3 replicas: 3,0,1 isr: 0,1,3
topic: RealStatis_test partition: 12 leader: 4 replicas: 4,1,2 isr: 2,4,1
topic: RealStatis_test partition: 13 leader: 5 replicas: 5,2,3 isr: 3,5,2
topic: RealStatis_test partition: 14 leader: 6 replicas: 6,3,4 isr: 6,4,3
topic: RealStatis_test partition: 15 leader: 0 replicas: 6,0,1 isr: 0

./rdkafka_example -b "10.135.3.34:9092,10.135.35.227:9092" -L -t RealStatis_test
Metadata for RealStatis_test (from broker -1: 10.135.3.34:9092/bootstrap):
8 brokers:
broker 0 at 10.135.3.34:9092
broker 5 at 10.153.133.204:9092
broker 1 at 10.135.35.227:9092
broker 6 at 10.153.133.205:9092
broker 2 at 10.135.4.232:9092
broker 7 at 10.153.133.206:9092
broker 3 at 10.135.4.83:9092
broker 4 at 10.153.133.203:9092
1 topics:
topic "RealStatis_test" with 16 partitions:
partition 0, leader 0, replicas: 7,4,5,0, isrs: 7,5,4,0
partition 1, leader 0, replicas: 0,5,6, isrs: 0
partition 2, leader 6, replicas: 1,6,7,2, isrs: 6
partition 3, leader 2, replicas: 2,7,0, isrs: 2,0
partition 4, leader 3, replicas: 3,0,1, isrs: 3
partition 5, leader 4, replicas: 4,1,2, isrs: 4
partition 6, leader 2, replicas: 5,2,3, isrs: 2
partition 7, leader 3, replicas: 6,3,4, isrs: 3
partition 8, leader 0, replicas: 2,1,6,5,7,0, isrs: 0,5,6,7
partition 9, leader 0, replicas: 0,2,3, isrs: 0,2
partition 10, leader 2, replicas: 2,1,4,3,7,0, isrs: 0,2,7,4
partition 11, leader 3, replicas: 2,1,4,5,3,0, isrs: 0,1,3
partition 12, leader 4, replicas: 2,1,6,4,5,3, isrs: 2,4,1
partition 13, leader 5, replicas: 2,6,4,5,3,7, isrs: 5,6,2,7,3,4
partition 14, leader 6, replicas: 6,4,5,3,7,0, isrs: 6,4,3
partition 15, leader 0, replicas: 6,0,1, isrs: 0

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

i run the cmd 'kafka-reassign-partitions.sh'

@laxpio
Copy link
Author

laxpio commented Aug 4, 2014

so strange;after i run kafka reassign ;the svr is ok.
i think i need to pay attention to it a long time
thanks

@edenhill
Copy link
Contributor

edenhill commented Aug 4, 2014

I think one of the brokers were out of sync, the reassign tool fixed it.
Would've been interesting to see which one and why.
But I'll make sure rdkafka does not assert on such corrupt data.
Den 4 aug 2014 16:53 skrev "Chen" [email protected]:

so strange;after i run kafka reassign ;the svr is ok.
i think i need to pay attention to it a long time
thanks


Reply to this email directly or view it on GitHub
#132 (comment).

@edenhill edenhill changed the title crash when write msg to kafka Crash on invalid partition in Metadata Aug 4, 2014
@edenhill
Copy link
Contributor

edenhill commented Aug 5, 2014

If you have any more information from this problem (broker logs or similar) then please post them to this issue.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants