Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Do not shell out for tar #19043

Merged
merged 1 commit into from
Aug 4, 2021
Merged

Do not shell out for tar #19043

merged 1 commit into from
Aug 4, 2021

Conversation

brooksprumo
Copy link
Contributor

@brooksprumo brooksprumo commented Aug 3, 2021

When making a snapshot archive, we used to shell out and call tar -S
for sparse file support. The tar crate supports sparse files, so no
need to do this anymore.

Fixes #10860

When making a snapshot archive, we used to shell out and call `tar -S`
for sparse file support.  The tar crate supports sparse files, so no
need to do this anymore.

Fixes solana-labs#10860
@sakridge
Copy link
Contributor

sakridge commented Aug 3, 2021

Sweet! I think maybe we can remove flushing the mmaps explicitly now as well since we are reading from in-process memory.

@brooksprumo
Copy link
Contributor Author

brooksprumo commented Aug 4, 2021

Sweet! I think maybe we can remove flushing the mmaps explicitly now as well since we are reading from in-process memory.

@sakridge One thing I'm not sure is how to actually test this change sufficiently. Is passing CI good enough? It looks like the tar crate added sparse file support, but I'm not sure if it's something that's automatically detected when tar-ing...

@t-nelson I know you posted the link in the original issue; do you happen to know off-hand?

@sakridge
Copy link
Contributor

sakridge commented Aug 4, 2021

Sweet! I think maybe we can remove flushing the mmaps explicitly now as well since we are reading from in-process memory.

@sakridge One thing I'm not sure is how to actually test this change sufficiently. Is passing CI good enough? It looks like the tar crate added sparse file support, but I'm not sure if it's something that's automatically detected when tar-ing...

@t-nelson I know you posted the link in the original issue; do you happen to know off-hand?

Like if you want to test that it doesn't kill the tar?

You can send a signal SIGUSR1 to the validator while it's making a snapshot.

@sakridge
Copy link
Contributor

sakridge commented Aug 4, 2021

@sakridge One thing I'm not sure is how to actually test this change sufficiently. Is passing CI good enough? It looks like the tar crate added sparse file support, but I'm not sure if it's something that's automatically detected when tar-ing...

For the sparse part, you can start a node and it will create snapshots by default, compare those snapshot sizes to a node which is producing snapshots through the normal path. I think if it's not sparse the size difference will be large. Can also take a new snapshot, extract it and see the on-disk size is comparable as before.

@t-nelson
Copy link
Contributor

t-nelson commented Aug 4, 2021

Sparse files should be detectable by comparing a file's total and allocated sizes, however browsing that commit I only see the read side of support added. Maybe the write side is already there?

I think a lighter test would be to take an existing snapshot, use ledger-tool from this PR to generate a new snapshot, then use ledger-tool from the source release to generate another snapshot from from the PR snapshot. If both snapshots are generated successfully and all three have similar sizes, we're probably good

@codecov
Copy link

codecov bot commented Aug 4, 2021

Codecov Report

Merging #19043 (1907ca3) into master (cde1461) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##           master   #19043   +/-   ##
=======================================
  Coverage    82.8%    82.8%           
=======================================
  Files         449      449           
  Lines      127997   127988    -9     
=======================================
+ Hits       105993   106007   +14     
+ Misses      22004    21981   -23     

@brooksprumo
Copy link
Contributor Author

I ran bootstrap-validator.sh locally for a while to generate some snapshots and compare. The sizes for both the .tar.zst and .tar files look the same. Here's the snapshots and their sizes:

this PR

There are two snapshot, and I've unzstd'ed them both. So you'll see the output from ls for the .tar.zst and the .tar next to it.

$ ls
total 2601128
drwxr-xr-x  9 brooks  staff   288B Aug  4 10:24 ./
drwx------@ 7 brooks  staff   224B Aug  4 09:52 ../
-rw-r--r--  1 brooks  staff   370M Aug  4 10:16 snapshot-49720-GVDd3Cj1GSsFPy6vNKXpGgGka29c6Uqg8pDvh7zuKvnB.tar
-rw-r--r--  1 brooks  staff    94M Aug  4 09:43 snapshot-49720-GVDd3Cj1GSsFPy6vNKXpGgGka29c6Uqg8pDvh7zuKvnB.tar.zst
-rw-r--r--  1 brooks  staff   380M Aug  4 10:16 snapshot-51120-G5KBFXknWuFrydhL6BpmTiLDmMHKC4yndkNyXyrX18PP.tar
-rw-r--r--  1 brooks  staff    95M Aug  4 09:52 snapshot-51120-G5KBFXknWuFrydhL6BpmTiLDmMHKC4yndkNyXyrX18PP.tar.zst

And here's the result from du, which should be spitting out the on disk size:

$ BLOCKSIZE=1 du *
370M	snapshot-49720-GVDd3Cj1GSsFPy6vNKXpGgGka29c6Uqg8pDvh7zuKvnB.tar
 94M	snapshot-49720-GVDd3Cj1GSsFPy6vNKXpGgGka29c6Uqg8pDvh7zuKvnB.tar.zst
380M	snapshot-51120-G5KBFXknWuFrydhL6BpmTiLDmMHKC4yndkNyXyrX18PP.tar
 95M	snapshot-51120-G5KBFXknWuFrydhL6BpmTiLDmMHKC4yndkNyXyrX18PP.tar.zst

master (cde1461):

Similarly, I have two snapshots for the comparison with master as well, just a few slots later.

$ ls
total 2654704
drwxr-xr-x  8 brooks  staff   256B Aug  4 10:25 ./
drwx------@ 7 brooks  staff   224B Aug  4 09:52 ../
-rw-r--r--  1 brooks  staff   382M Aug  4 10:25 snapshot-52121-44Nzn8ixNMiYSxzaVca8Mn17DcyvMk7HwzATbQVbzmwk.tar
-rw-r--r--  1 brooks  staff    95M Aug  4 10:22 snapshot-52121-44Nzn8ixNMiYSxzaVca8Mn17DcyvMk7HwzATbQVbzmwk.tar.zst
-rw-r--r--  1 brooks  staff   384M Aug  4 10:25 snapshot-52321-DD32fnn8nL8GVEAeyB1nkQ4yvPNctiWvGJrTFNDNxTVT.tar
-rw-r--r--  1 brooks  staff    95M Aug  4 10:23 snapshot-52321-DD32fnn8nL8GVEAeyB1nkQ4yvPNctiWvGJrTFNDNxTVT.tar.zst

and du:

$ BLOCKSIZE=1 du *
382M	snapshot-52121-44Nzn8ixNMiYSxzaVca8Mn17DcyvMk7HwzATbQVbzmwk.tar
 95M	snapshot-52121-44Nzn8ixNMiYSxzaVca8Mn17DcyvMk7HwzATbQVbzmwk.tar.zst
384M	snapshot-52321-DD32fnn8nL8GVEAeyB1nkQ4yvPNctiWvGJrTFNDNxTVT.tar
 95M	snapshot-52321-DD32fnn8nL8GVEAeyB1nkQ4yvPNctiWvGJrTFNDNxTVT.tar.zs

I think this means it's working? What do you think @sakridge @t-nelson? Is generating snapshots via bootstap-validator.sh sufficient?

@brooksprumo brooksprumo marked this pull request as ready for review August 4, 2021 15:48
@t-nelson
Copy link
Contributor

t-nelson commented Aug 4, 2021

I think this means it's working? What do you think @sakridge @t-nelson? Is generating snapshots via bootstap-validator.sh sufficient?

This is definitely a good sign! I'd like to see master load a PR-generated snapshot and vice versa so we can be confident in a mixed cluster during the SW upgrade that introduces this change.

@brooksprumo
Copy link
Contributor Author

This is definitely a good sign! I'd like to see master load a PR-generated snapshot and vice versa so we can be confident in a mixed cluster during the SW upgrade that introduces this change.

@t-nelson Gotcha! I ran bootstrap-validator 4 times, each for ~1000 slots so a few snapshots archives would be made. I did my PR, then master, then my PR again, and master once more. All runs successfully used the existing snapshot from the previous run (and therefore the opposite branch).

Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-nelson Gotcha! I ran bootstrap-validator 4 times, each for ~1000 slots so a few snapshots archives would be made. I did my PR, then master, then my PR again, and master once more. All runs successfully used the existing snapshot from the previous run (and therefore the opposite branch).

Awesome! Let's do it!

@brooksprumo brooksprumo merged commit 68cc714 into solana-labs:master Aug 4, 2021
@brooksprumo brooksprumo deleted the tar branch August 4, 2021 22:07
@nyetwurk
Copy link

nyetwurk commented Aug 25, 2021

This fixes this problem as well, where logrotate's SIGUSR causes solana to leave snapshot droppings

-rw-r--r--  1 solana solana 1111387853 Aug 20 00:00 tmp-snapshot-92444888.tar.zst
-rw-r--r--  1 solana solana 3358889776 Aug 21 00:00 tmp-snapshot-92597063.tar.zst
-rw-r--r--  1 solana solana 2238783974 Aug 23 00:00 tmp-snapshot-92897640.tar.zst
-rw-r--r--  1 solana solana  791530364 Aug 24 00:00 tmp-snapshot-93050059.tar.zst
-rw-r--r--  1 solana solana  806058774 Aug 25 00:00 tmp-snapshot-93201834.tar.zst

@t-nelson
Copy link
Contributor

@brooksprumo any reason not to backport this one?

@brooksprumo
Copy link
Contributor Author

@t-nelson I think it should be fine. I don't know what the criteria is to backport commits, so as long as the Rust tar package version is new enough in the previous branch(es) then we should be good.

@CriesofCarrots
Copy link
Contributor

tar on v1.7 is up to date with master, but v1.6 is not, and will I think require this to bump: #19216
Is v1.7 enough @t-nelson ?

@t-nelson
Copy link
Contributor

tar on v1.7 is up to date with master, but v1.6 is not, and will I think require this to bump: #19216
Is v1.7 enough @t-nelson ?

Could be. Motivation here is #19766, which I think is more costly than we really want to promote. I'd say v1.7 for sure, then look into v1.6 based on resource leak debug in v1.7 and prospects of an MB upgrade

@mvines
Copy link
Contributor

mvines commented Sep 11, 2021

My vote is v1.7 only, cause it should be easy. The second the resource leak on v1.7 is mitigated it's MB upgrade time!

@t-nelson
Copy link
Contributor

I'mma do it

@t-nelson t-nelson added the v1.7 label Oct 11, 2021
mergify bot pushed a commit that referenced this pull request Oct 11, 2021
When making a snapshot archive, we used to shell out and call `tar -S`
for sparse file support.  The tar crate supports sparse files, so no
need to do this anymore.

Fixes #10860

(cherry picked from commit 68cc714)

# Conflicts:
#	runtime/src/snapshot_utils.rs
@CriesofCarrots
Copy link
Contributor

Presumably v1.8 too?

@t-nelson
Copy link
Contributor

Presumably v1.8 too?

Yeah, but by hand to avoid resolving the conflicts twice

t-nelson pushed a commit to t-nelson/solana that referenced this pull request Oct 12, 2021
When making a snapshot archive, we used to shell out and call `tar -S`
for sparse file support.  The tar crate supports sparse files, so no
need to do this anymore.

Fixes solana-labs#10860

(cherry picked from commit 68cc714)

# Conflicts:
#	runtime/src/snapshot_utils.rs
t-nelson pushed a commit that referenced this pull request Oct 12, 2021
When making a snapshot archive, we used to shell out and call `tar -S`
for sparse file support.  The tar crate supports sparse files, so no
need to do this anymore.

Fixes #10860

(cherry picked from commit 68cc714)

# Conflicts:
#	runtime/src/snapshot_utils.rs
mergify bot pushed a commit that referenced this pull request Oct 12, 2021
When making a snapshot archive, we used to shell out and call `tar -S`
for sparse file support.  The tar crate supports sparse files, so no
need to do this anymore.

Fixes #10860

(cherry picked from commit 68cc714)

# Conflicts:
#	runtime/src/snapshot_utils.rs
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snapshot creation fails sometimes
6 participants