Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ipfs repo gc" is freezing IPFS node #6107

Closed
obo20 opened this issue Mar 20, 2019 · 2 comments
Closed

"ipfs repo gc" is freezing IPFS node #6107

obo20 opened this issue Mar 20, 2019 · 2 comments
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@obo20
Copy link

obo20 commented Mar 20, 2019

Version information:

go-ipfs version: 0.4.19- (I also tried downgrading to 0.4.18 and am still running into this issue)
Repo version: 7
System version: amd64/linux
Golang version: go1.11.5
Filesystem: Flatfs

Type:

panic

-->

Description:

The most recent garbage collection I manually ran on one of our nodes completely froze the node and made it unresponsive to any sort of adding / pinning of content.

Upon restarting the node and attempting another garbage collection, I received the same result.

It seems I'm not the only one with this issue:
#5177
#5489

@Stebalien Stebalien added the kind/bug A bug in existing code (including security flaws) label Mar 21, 2019
@Stebalien
Copy link
Member

Talking with @obo20 about this on IRC. Looks like something is deadlocking somewhere.

New information:

  • ~15GiB of data
  • ~130,000 blocks
  • ipfs repo verify passes

That means it's not a disk issue and we can definitely iterate over the entire collection. It looks like an issue traversing the graph or maybe some locking issue.


Current status: The repo is live so we're cloning to a new machine where we can leave a GC running for a while (order of minutes). We should then be able to pull a goroutine list to see what's stuck.

Stebalien added a commit that referenced this issue Mar 21, 2019
Events:

1. User triggers a GC.
2. User aborts the GC.
3. We fail to delete a block when the output channel is already full.

This is really unlikely to happen in practice but it's still incorrect.

Could be related to #6107

License: MIT
Signed-off-by: Steven Allen <[email protected]>
@obo20
Copy link
Author

obo20 commented Mar 21, 2019

A little update. After working with @Stebalien on this, once I got the machine cloned, the garbage collection on that machine ran in about 2 minutes. With this knowledge, I attempted another garbage collection on the main machine and it also ran in about 2 minutes.

We're still not sure what caused the original garbage collection non-ending deadlock, but the issue seems to have resolved itself for now.

I did learn something through this process though, in that IPFS intentionally locks the node down for things like adding files when doing a GC. I'd love to see this paradigm changed, as this could be pretty problematic at scale, especially for production nodes. I'm going to file a new issue on this so things don't get too mixed up.

@obo20 obo20 closed this as completed Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

2 participants