-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add a function for washing data after upgrading version from 4.0.0 to 4.0.1 #2888
fix: add a function for washing data after upgrading version from 4.0.0 to 4.0.1 #2888
Conversation
WalkthroughThe changes introduce a new configuration option Changes
Sequence Diagram(s)sequenceDiagram
participant Main
participant PikaConf
participant DB
Main->>PikaConf: wash_data()
PikaConf-->>Main: returns true
Main->>DB: Get database instances
Main->>DB: WashData()
DB-->>Main: returns success or failure
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (3)
src/pika_db.cc (1)
57-86
: TheWashData()
method looks good overall, but consider the following improvements:
- Move the
suffix_len
calculation outside the loop as it remains constant across iterations. This will slightly improve performance.- Log more details about the specific write operation that failed in the error handling block. This will help with debugging if failures occur.
- Extract the complex condition in the if statement into a well-named function to improve readability.
Here's an example of how you can refactor the code:
bool DB::WashData() { rocksdb::ReadOptions read_options; rocksdb::Status s; + auto suffix_len = storage::ParsedBaseDataValue::GetkBaseDataValueSuffixLength(); for (int i = 0; i < g_pika_conf->db_instance_num(); i++) { rocksdb::WriteBatch batch; auto handle = storage_->GetHashCFHandles(i)[1]; auto db = storage_->GetDBByIndex(i); std::unique_ptr<rocksdb::Iterator> it(db->NewIterator(read_options, handle)); for (it->SeekToFirst(); it->Valid(); it->Next()) { std::string key = it->key().ToString(); std::string value = it->value().ToString(); - auto suffix_len = storage::ParsedBaseDataValue::GetkBaseDataValueSuffixLength(); - if (value.size() >= suffix_len && - std::all_of(value.end() - static_cast<int>(suffix_len), value.end() - storage::kTimestampLength, - [](char c) { return c == 0; })) { + if (isValidValue(value, suffix_len)) { // right data, no wash } else { storage::BaseDataValue internal_value(value); batch.Delete(handle, key); batch.Put(handle, key, internal_value.Encode()); } } s = db->Write(storage_->GetDefaultWriteOptions(i), &batch); if (!s.ok()) { - LOG(ERROR) << "write batch error in WashData"; + LOG(ERROR) << "Write batch error in WashData, instance: " << i << ", status: " << s.ToString(); return false; } } return true; } +bool DB::isValidValue(const std::string& value, size_t suffix_len) { + return value.size() >= suffix_len && + std::all_of(value.end() - static_cast<int>(suffix_len), value.end() - storage::kTimestampLength, + [](char c) { return c == 0; }); +}src/storage/src/storage.cc (2)
92-94
: LGTM! But consider adding a bounds check for theidx
parameter.The code changes are approved. However, to prevent potential out-of-bounds access, consider adding a bounds check for the
idx
parameter before accessing theinsts_
array.Add a bounds check for
idx
like this:std::vector<rocksdb::ColumnFamilyHandle*> Storage::GetHashCFHandles(const int idx) { + if (idx < 0 || idx >= insts_.size()) { + // Return an empty vector or throw an exception based on your error handling strategy. + return {}; + } return insts_[idx]->GetHashCFHandles(); }
96-98
: LGTM! But consider adding a bounds check for theidx
parameter.The code changes are approved. However, to prevent potential out-of-bounds access, consider adding a bounds check for the
idx
parameter before accessing theinsts_
array.Add a bounds check for
idx
like this:rocksdb::WriteOptions Storage::GetDefaultWriteOptions(const int idx) const { + if (idx < 0 || idx >= insts_.size()) { + // Return a default constructed WriteOptions object or throw an exception based on your error handling strategy. + return rocksdb::WriteOptions(); + } return insts_[idx]->GetDefaultWriteOptions(); }
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (10)
- conf/pika.conf (1 hunks)
- include/pika_conf.h (2 hunks)
- include/pika_db.h (1 hunks)
- src/pika.cc (1 hunks)
- src/pika_conf.cc (1 hunks)
- src/pika_db.cc (1 hunks)
- src/storage/include/storage/storage.h (1 hunks)
- src/storage/src/base_data_value_format.h (1 hunks)
- src/storage/src/redis.h (1 hunks)
- src/storage/src/storage.cc (1 hunks)
Additional comments not posted (11)
src/storage/src/base_data_value_format.h (2)
107-107
: LGTM! The change to makekBaseDataValueSuffixLength
a static constant is appropriate.Making
kBaseDataValueSuffixLength
a static constant aligns with its nature as a fixed value that doesn't depend on any instance of the class. This change can potentially improve performance by avoiding unnecessary memory allocation for each instance of the class.
101-101
: LGTM! Adding theGetkBaseDataValueSuffixLength
static method is a good practice.The new
GetkBaseDataValueSuffixLength
static method enhances the encapsulation of the class by providing a dedicated method for retrieving the suffix length. It aligns well with the change to makekBaseDataValueSuffixLength
a static constant, as static methods can access static member variables. This improves the overall design of the class.include/pika_db.h (1)
93-97
: LGTM! The addition of theWashData()
method is a good approach for handling data integrity during version upgrades.The new
WashData()
method provides a clear and explicit mechanism for cleaning data when upgrading from version 4.0.0 to 4.0.1. By adding this method to theDB
class, it encapsulates the data washing functionality within the database abstraction, which is a logical place for it. The method signature is self-explanatory, making it easy to understand its purpose.This addition enhances the overall functionality of the
DB
class by providing a way to handle data integrity during version transitions.src/pika.cc (1)
235-243
: LGTM! The addition of the data washing step in themain
function is a good enhancement.The new conditional block in the
main
function adds a data integrity step before starting the server. It checks the configuration using thewash_data()
method ofg_pika_conf
to determine if data washing is necessary. If required, it retrieves the database instances fromg_pika_server
, iterates through them, and calls theWashData()
method on each database.This ensures that the data is cleaned up based on the configuration, enhancing the overall functionality and reliability of the system. The code follows the proper logic and error handling, returning an error code of
1
if any database fails to wash its data.This addition improves the startup process by incorporating a data integrity check and cleanup step.
src/storage/src/redis.h (1)
467-468
: LGTM!The
GetDefaultWriteOptions()
method looks good. It provides a simple way to access the default write options externally.conf/pika.conf (1)
657-660
: Thewash-data
configuration option looks good!The new
wash-data
option is appropriately added to handle data migration from version 4.0.0 to 4.0.1. Setting the default value totrue
ensures that data cleaning is performed by default during the upgrade process. The comments provide clear context and a link to the relevant GitHub issue.src/pika_conf.cc (1)
733-734
: LGTM! Verify the usage ofwash_data_
for data migration.The code changes to read the
wash-data
configuration option in theLoad
method are approved. However, ensure that thewash_data_
member variable is properly used in other parts of the codebase to facilitate the data migration from version 4.0.0 to 4.0.1.Run the following script to verify the usage of
wash_data_
:include/pika_conf.h (2)
335-338
: LGTM!The code changes to add the
wash_data()
method are approved. The method correctly acquires a shared lock and returns the value ofwash_data_
.
1077-1078
: LGTM!The code changes to declare the
wash_data_
member variable are approved. The variable is appropriately declared as a boolean, and the comment clearly indicates its purpose for data migration from version 4.0.0 to 4.0.1.src/storage/include/storage/storage.h (2)
1099-1100
: LGTM!The code changes to add the
GetHashCFHandles
method are approved. The method correctly takes an indexidx
and returns a vector ofrocksdb::ColumnFamilyHandle*
pointers, allowing access to specific column family handles for interacting with RocksDB's data structures.
1101-1102
: LGTM!The code changes to add the
GetDefaultWriteOptions
method are approved. The method correctly takes an indexidx
, returns arocksdb::WriteOptions
object, and is marked asconst
to ensure it does not modify the state of theStorage
class. This allows access to the default write options for the corresponding instance in theinsts_
vector.
src/pika_db.cc
Outdated
auto suffix_len = storage::ParsedBaseDataValue::GetkBaseDataValueSuffixLength(); | ||
if (value.size() >= suffix_len && | ||
std::all_of(value.end() - static_cast<int>(suffix_len), value.end() - storage::kTimestampLength, | ||
[](char c) { return c == 0; })) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最后8 byte存储的是数据写入时间,internalvalue构造函数里设置的,所以不是0.
倒数24~倒数8 byte现在存的是0,你可以用这几个。
不过既然你是遍历的方式,我觉得直接判断是否数据‘0’ - ‘9’就可以了。只要有一个字符不在这个区间内,就认为这是一条正常写入的数据。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我数错了,忽略。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确实,如果倒数 24 ~ 倒数 8 的字符,有任意一个为 \0,那么说明数据正常,否则就需要被洗一下,这里我就默认用倒数 24 位置的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
而且,这块事实上只需要判断长度,因为 4.0.0 中只有 hincrby 中会产生错误数据,但是错误数据却一定是数字转来的,这里的数字却为 int64_t,所以错误数据长度不可能超过 24,所以只需要长度判断即可。
src/pika_db.cc
Outdated
// right data, no wash | ||
} else { | ||
storage::BaseDataValue internal_value(value); | ||
batch.Delete(handle, key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下边有put了,上边的delete就不需要了吧。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是
batch.Put(handle, key, internal_value.Encode()); | ||
} | ||
} | ||
s = db->Write(storage_->GetDefaultWriteOptions(i), &batch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个writeoptions我们好像并没有特殊的设置,所以直接用默认的就行。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为我看 default write option 是禁止 wal 的,所以我用了默认的
2. change judgement's way in whether old or new data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (1)
src/pika_db.cc (1)
57-81
: Optimize and refactor the function for better performance and maintainability.Consider the following improvements:
- Optimize the function to avoid creating a new
BaseDataValue
object for each value that needs to be modified. Instead, directly encode the value and update the entry in the database.- Refactor the function to extract the logic for processing a single database instance into a separate function. This will improve readability and maintainability.
- Improve error handling by logging error messages and providing context about the failure.
Here's a suggested refactor:
bool DB::WashData() { rocksdb::ReadOptions read_options; rocksdb::Status s; auto suffix_len = storage::ParsedBaseDataValue::GetkBaseDataValueSuffixLength(); + + auto wash_db = [&](int i) -> bool { + rocksdb::WriteBatch batch; + auto handle = storage_->GetHashCFHandles(i)[1]; + auto db = storage_->GetDBByIndex(i); + auto it(db->NewIterator(read_options, handle)); + for (it->SeekToFirst(); it->Valid(); it->Next()) { + std::string key = it->key().ToString(); + std::string value = it->value().ToString(); + if (value.size() < suffix_len) { + // need to wash - storage::BaseDataValue internal_value(value); - batch.Put(handle, key, internal_value.Encode()); + batch.Put(handle, key, storage::BaseDataValue::Encode(value)); + } + } + s = db->Write(storage_->GetDefaultWriteOptions(i), &batch); + if (!s.ok()) { + LOG(ERROR) << "Failed to write batch for db instance " << i << ": " << s.ToString(); + return false; + } + return true; + }; + for (int i = 0; i < g_pika_conf->db_instance_num(); i++) { - rocksdb::WriteBatch batch; - auto handle = storage_->GetHashCFHandles(i)[1]; - auto db = storage_->GetDBByIndex(i); - auto it(db->NewIterator(read_options, handle)); - for (it->SeekToFirst(); it->Valid(); it->Next()) { - std::string key = it->key().ToString(); - std::string value = it->value().ToString(); - if (value.size() < suffix_len) { - // need to wash - storage::BaseDataValue internal_value(value); - batch.Put(handle, key, internal_value.Encode()); - } - } - s = db->Write(storage_->GetDefaultWriteOptions(i), &batch); - if (!s.ok()) { + if (!wash_db(i)) { return false; } } return true; }
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- conf/pika.conf (1 hunks)
- include/pika_db.h (1 hunks)
- src/pika.cc (1 hunks)
- src/pika_db.cc (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- conf/pika.conf
- src/pika.cc
Additional comments not posted (1)
include/pika_db.h (1)
93-98
: LGTM! The method declaration looks good.The method is well-documented and designed to address a specific issue during the upgrade process. The boolean return value is a good choice for indicating the success or failure of the operation.
Verify if the method needs any input parameters by checking its implementation. If it does, consider updating the method signature accordingly.
Run the following script to find the method implementation:
Verification successful
The
WashData
method implementation confirms no input parameters are needed.The method processes data internally and aligns with its intended use during the upgrade process. The header file declaration is correct and does not require changes.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find the implementation of the `WashData` method. # Test: Search for the method implementation. Expect: The implementation file path. rg --type cpp -l $'bool\s+DB::WashData\(\)'Length of output: 59
Script:
#!/bin/bash # Extract the implementation of the `WashData` method from `src/pika_db.cc`. ast-grep --lang cpp --pattern $'bool DB::WashData() {\n $$$\n}' src/pika_db.ccLength of output: 78
Script:
#!/bin/bash # Use `rg` to search for the `WashData` method implementation in `src/pika_db.cc`. rg 'bool\s+DB::WashData\(\)' -A 20 src/pika_db.ccLength of output: 893
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool DB::WashData() { | ||
rocksdb::ReadOptions read_options; | ||
rocksdb::Status s; | ||
auto suffix_len = storage::ParsedBaseDataValue::GetkBaseDataValueSuffixLength(); | ||
for (int i = 0; i < g_pika_conf->db_instance_num(); i++) { | ||
rocksdb::WriteBatch batch; | ||
auto handle = storage_->GetHashCFHandles(i)[1]; | ||
auto db = storage_->GetDBByIndex(i); | ||
auto it(db->NewIterator(read_options, handle)); | ||
for (it->SeekToFirst(); it->Valid(); it->Next()) { | ||
std::string key = it->key().ToString(); | ||
std::string value = it->value().ToString(); | ||
if (value.size() < suffix_len) { | ||
// need to wash | ||
storage::BaseDataValue internal_value(value); | ||
batch.Put(handle, key, internal_value.Encode()); | ||
} | ||
} | ||
delete it; | ||
s = db->Write(storage_->GetDefaultWriteOptions(i), &batch); | ||
if (!s.ok()) { | ||
return false; | ||
} | ||
} | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize and refactor the WashData
function.
The WashData
function looks correct and follows the intended logic. However, consider the following improvements:
- Optimize the function to avoid creating a new
BaseDataValue
object for each value that needs to be modified. Instead, directly encode the value and update the entry in the database. - Refactor the function to extract the logic for processing a single database instance into a separate function. This will improve readability and maintainability.
- Improve error handling by logging an error message if a write operation fails, instead of just returning
false
.
Here's a refactored version of the function with the suggested improvements:
bool DB::WashData() {
rocksdb::ReadOptions read_options;
rocksdb::Status s;
auto suffix_len = storage::ParsedBaseDataValue::GetkBaseDataValueSuffixLength();
for (int i = 0; i < g_pika_conf->db_instance_num(); i++) {
s = WashDataForInstance(i, suffix_len);
if (!s.ok()) {
LOG(ERROR) << "Failed to wash data for instance " << i << ": " << s.ToString();
return false;
}
}
return true;
}
rocksdb::Status DB::WashDataForInstance(int instance_id, size_t suffix_len) {
rocksdb::WriteBatch batch;
auto handle = storage_->GetHashCFHandles(instance_id)[1];
auto db = storage_->GetDBByIndex(instance_id);
auto it(db->NewIterator(read_options, handle));
for (it->SeekToFirst(); it->Valid(); it->Next()) {
std::string key = it->key().ToString();
std::string value = it->value().ToString();
if (value.size() < suffix_len) {
// need to wash
std::string encoded_value = storage::BaseDataValue(value).Encode();
batch.Put(handle, key, encoded_value);
}
}
delete it;
return db->Write(storage_->GetDefaultWriteOptions(instance_id), &batch);
}
include/pika_db.h
Outdated
@@ -90,6 +90,13 @@ class DB : public std::enable_shared_from_this<DB>, public pstd::noncopyable { | |||
friend class PkClusterInfoCmd; | |||
friend class PikaServer; | |||
|
|||
/** | |||
* When it is the first time for upgrading version from 4.0.0 to 4.0.1, you should call | |||
* this function to wash data. false if successful, false otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是想写true if successful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
失误失误,我改一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | ||
* When it is the first time for upgrading version from 4.0.0 to 4.0.1, you should call | ||
* this function to wash data. true if successful, false otherwise. | ||
* @see https://github.com/OpenAtomFoundation/pika/issues/2886 | ||
*/ | ||
bool WashData(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve the method signature and documentation. Request implementation details.
The new WashData()
method is well-documented, explaining its purpose and linking to the related GitHub issue. The method signature and return type are appropriate.
However, the method has an empty implementation. It's unclear how the data washing operation will be performed.
Please provide the implementation details for the WashData()
method. Specifically, explain:
- What are the steps involved in the data washing operation?
- How will the method handle different data types and scenarios?
- How will the method ensure data integrity during the washing process?
可以简单测试:
|
You can simply test:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
….0 to 4.0.1 (OpenAtomFoundation#2888) * add a function for washing data after upgrading version from 4.0.0 to 4.0.1
fix #2886
因为 4.0.0 中错误格式的数据一定是在 hincrby 中产生的,那么 value 肯定为 '0' - '9' 的字符,所以一定可以与 4.0.0 中 reverse 中字符相区分。
当第一次从 4.0.0 升级到 4.0.1 时,建议开启 pika.conf 中 wash-data,清洗一遍数据。
Summary by CodeRabbit
New Features
wash-data
to enhance data migration handling.DB
class to ensure data integrity during version upgrades.Bug Fixes
Documentation