Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in BG Compaction #5418

Open
redmeadowman opened this issue Jun 5, 2019 · 8 comments
Open

Segfault in BG Compaction #5418

redmeadowman opened this issue Jun 5, 2019 · 8 comments

Comments

@redmeadowman
Copy link

I frequently see a SIGSEGV in the BG thread performing compaction. This is in version 6.2.0 (just pulled latest code 6/4/2019).

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7feb863a5700 (LWP 3638)]
0x00007ffff4ce3b59 in std::string::assign(std::string const&) ()
from /lib64/libstdc++.so.6
(gdb) up
#1 0x00007ffff734d465 in operator= (__str=..., this=0x7feb863a1b90)
at /usr/include/c++/4.9.2/bits/basic_string.h:555
555 { return this->assign(__str); }

The problem is that compaction is using input[0] when the input array has no elements (its empty), thus triggering an exception.

#3 rocksdb::CompactionPicker::GetRange (this=0x7feb04015f40, inputs=...,
smallest=smallest@entry=0x7feb863a1b40, largest=largest@entry=0x7feb863a1b90)
at db/compaction/compaction_picker.cc:175
175 *largest = inputs[inputs.size() - 1]->largest;
(gdb) l
170 }
171 }
172 }
173 } else {
174 *smallest = inputs[0]->smallest; // dnevil - need to make sure input[] is not empty
175 *largest = inputs[inputs.size() - 1]->largest;
176 }
177 }

Here is the backtrace.

(gdb) bt
#0 0x00007ffff4ce3b59 in std::string::assign(std::string const&) ()
from /lib64/libstdc++.so.6
#1 0x00007ffff734d465 in operator= (__str=..., this=0x7feb863a1b90)
at /usr/include/c++/4.9.2/bits/basic_string.h:555
#2 operator= (this=0x7feb863a1b90) at db/dbformat.h:204
#3 rocksdb::CompactionPicker::GetRange (this=0x7feb04015f40, inputs=...,
smallest=smallest@entry=0x7feb863a1b40, largest=largest@entry=0x7feb863a1b90)
at db/compaction/compaction_picker.cc:175
#4 0x00007ffff7357753 in PickFileToCompact (this=0x7feb863a1c30)
at db/compaction/compaction_picker_level.cc:511
#5 SetupInitialFiles (this=)
at db/compaction/compaction_picker_level.cc:213
#6 PickCompaction (this=0x7feb863a1c30) at db/compaction/compaction_picker_level.cc:351
#7 rocksdb::LevelCompactionPicker::PickCompaction (this=, cf_name=...,
mutable_cf_options=..., vstorage=, log_buffer=)
at db/compaction/compaction_picker_level.cc:555
#8 0x00007ffff732707c in rocksdb::ColumnFamilyData::PickCompaction (
this=this@entry=0x7feb0400f3d0, mutable_options=...,
log_buffer=log_buffer@entry=0x7feb863a2780) at db/column_family.cc:927
#9 0x00007ffff73939a3 in rocksdb::DBImpl::BackgroundCompaction (this=this@entry=
0x7feb04000b70, made_progress=made_progress@entry=0x7feb863a258e,
job_context=job_context@entry=0x7feb863a25b0,
log_buffer=log_buffer@entry=0x7feb863a2780,
prepicked_compaction=prepicked_compaction@entry=0x0, thread_pri=rocksdb::Env::LOW)
at db/db_impl/db_impl_compaction_flush.cc:2474
#10 0x00007ffff739ace3 in rocksdb::DBImpl::BackgroundCallCompaction (
this=this@entry=0x7feb04000b70, prepicked_compaction=prepicked_compaction@entry=0x0,
bg_thread_pri=bg_thread_pri@entry=rocksdb::Env::LOW)
at db/db_impl/db_impl_compaction_flush.cc:2245
#11 0x00007ffff739b2da in rocksdb::DBImpl::BGWorkCompaction (arg=)
at db/db_impl/db_impl_compaction_flush.cc:2021
#12 0x00007ffff75a51b7 in rocksdb::ThreadPoolImpl::Impl::BGThread (
this=this@entry=0xda3eb90, thread_id=thread_id@entry=0) at util/threadpool_imp.cc:266
#13 0x00007ffff75a540d in rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper (
arg=0x7feb800020e0) at util/threadpool_imp.cc:307
#14 0x00007ffff4cd9e00 in ?? () from /lib64/libstdc++.so.6
#15 0x00007ffff595f52a in start_thread () from /lib64/libpthread.so.0
#16 0x00007ffff444322d in clone () from /lib64/libc.so.6

@maysamyabandeh
Copy link
Contributor

Thanks @redmeadowman for the report. You seem to be hit by a bug but from the stack trace itself I cannot figure the root cause. Can you run your code with ASAN? That should give more details about the bug. You can enable ASAN by the setting env variable COMPILE_WITH_ASAN=1 before running make.

@redmeadowman
Copy link
Author

Hi @maysamyabandeh, I will try to do that in the next few days and post the result.
In the mean time, here is another clue. I found if I set KCompactionStyleUniversal before opening the database then I don't see the crash.

// The default compaction style causes a SIGSEGV, so change to this one
m_db_options.compaction_style = rocksdb::kCompactionStyleUniversal;

@gdyang1990
Copy link

Is there any progress?We have encountered the same situation.

@Connor1996
Copy link
Contributor

Hi @redmeadowman, does the crash happen right after opening, and then after several times of segfault and restart it recovers automatically?

@redmeadowman
Copy link
Author

No, my code loads some large tables, then the balancing/compaction starts and it crashes during that phase.

@yiwu-arbug
Copy link
Contributor

yiwu-arbug commented Jul 24, 2019

Hit the same issue in one of our use case. From core dump it is caused by compaction picking a file with smallest_key > largest_key, so ExpandInputsToCleanCut here (https://github.com/facebook/rocksdb/blob/v5.18.3/db/compaction_picker.cc#L1488) clear the start_level_inputs_ array, and cause segfault here (https://github.com/facebook/rocksdb/blob/v5.18.3/db/compaction_picker.cc#L171). Still investigating why smallest_key can be larger than largest_key in this case. Custom rocksdb fork based on 5.18, and with default BytewiseComparator.

update: the database was wiped and we are not able to investigate further.

@veryl
Copy link

veryl commented Nov 4, 2019

I meet the same SIGSEGV by use rocksdb 5.10.2 when use TwoPartComparator.

@phoenixchinar
Copy link

This happened with us as well. The cause however was that we had some keys written to the db that weren't in the format expected. The custom comparator didn't handle this case well. Removing the offending SST file with the bad records and repairing the db helped to recover the DB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants