libsql/core: Clean up sqlite3 on drop #293

LucioFranco · 2023-08-17T17:38:52Z

No description provided.

293: log compaction r=MarinPostma a=MarinPostma # Log compaction and snapshotting This PR enables log compaction and snapshotting. ## Motivation The replication log follows sqlite WAL and grows indefinitely. Fortunately, it contains a lot of duplicate data, so we can compress it by getting by keeping only the most recent version of each page. This is what this PR does: whenever the replication log grows above some threshold, a new log is created, and the old log is compacted. This operation is done atomically, so the compaction can happen in the background, while we keep writing to the old log. ## Log compaction: The log compaction is very straightforward. We iterate backwards through the replication log, and write the frames to the snapshot file. We keep track of what pages we have already seen, and ignore older version of them. When the snapshot is finished, we remove the old log file. Notice how the frames are now in a reverse order in the snapshot: starting with the most recent, and ending with the oldest. ## Snapshoting: Whenever a replica asks for a frame that is not present in the current log (i.e, it asks for a frame index less than the log starting frame id), then it sends the replica an error, asking it to ask for a snapshot. The replica receives this message, and immediately asks for a snapshot, sending over the frame id that got it rejected in the first place. The primary looks for a snapshot containing the request frame, and starts iterating through it, until it reaches a frame that is less than the requested one. This mechanism allows us to send partial snapshot: the replica gets minimal amount of frames required to get up to speed. The replica writes the snapshot frames to a file, and then `mmap` that file to build a chained list of pages to append to the WAL. ## Future work: This PR was getting a bit too long, so I left out some work for followup PRs: - Even though the snapshot significantly compresses the size of the log, a new log is created, that will lead to the creation of a new snapshot. Those snapshot will pile up and we're back at step one. The next step is to merge snapshots into bigger snapshots, and get rid of the older snapshots. - Every query for a frame causes a read to the snapshot/log. To speed things up, let's add a MRU cache. - Explore compression Co-authored-by: ad hoc <[email protected]>

libsql/core: Clean up sqlite3 on drop

bbbed13

LucioFranco added this pull request to the merge queue Aug 17, 2023

Merged via the queue into main with commit db893af Aug 17, 2023

LucioFranco deleted the lucio/asan branch August 17, 2023 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libsql/core: Clean up sqlite3 on drop #293

libsql/core: Clean up sqlite3 on drop #293

LucioFranco commented Aug 17, 2023

libsql/core: Clean up sqlite3 on drop #293

libsql/core: Clean up sqlite3 on drop #293

Conversation

LucioFranco commented Aug 17, 2023