-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failures using gocryptfs-on-NFS as ElasticSearch snapshot store #156
Comments
Should be possible withou too much work, you can reuse the openfiletable infrastructure that already today prevents concurrent writes to the same inode number. However, can you explain your stack a bit more? I don't understand where the races come from. Does it look like this? Ext4 | gocryptfs | elasticsearch Where are the other nodes? |
(customer-maintained / black-box unknown infrastructure) -> NFSv4 -> gocryptfs -> elasticsearch ...so multiple nodes have the same NFSv4 store mounted, with separate gocryptfs daemons on each machine. |
So with multiple nodes it is
So both gocryptfs instances would have to coordinate somehow? |
(Does it work without gocryptfs, just nfs and multiple nodes?) |
Correct explanation of architecture. Yes, it works with gocryptfs taken out of the loop. |
...I'm positing NFS support for advisory locking as the means of coordinating between instances here. (Note that we have filename encryption turned off, so that's one layer of indirection/complexity avoided). |
Hmm. This sounds like a caching problem to me. There is no cli option, but you could try setting the three cache timeouts to zero at https://github.com/rfjakob/gocryptfs/blob/master/mount.go#L263 |
Even running a patched gocryptfs with zeroed timeouts, I'm still seeing errors -- in particular, ENOENTs for files which are expected to exist (and do exist after the backup process is complete):
I'll run some traces to figure out exactly what's going on here. |
Curiouser and curiouser. I'm starting to wonder if this is a caching issue, and the change made (changing
...but on the third, that same file shows up without any metadata and can't be opened (though everything else in the same directory seems fine):
|
When you get a The kernel talks to FUSE using file numbers ( Can you try to disable ClientInodes in https://github.com/rfjakob/gocryptfs/blob/master/mount.go#L232 ? This takes hard link tracking (via inode numbers) out of the equation. I have seen inode number reuse cause problems for reverse mode ( 8c1b363 ). |
Interesting -- it's trying to
I'll let you know if the proposed patch fixes this. |
Ok good, matches what I have seen with the reverse mode problem. |
After leaving our test cluster running for an hour (with a backup every five minutes), not a sign of any issue, whereas previously it was happening every 2/3 snapshots. Thus, this issue looks to have been resolved. What's your thoughts on what to do here? Command-line argument to disable caching (leaving it on-by-default for the common case)? |
Yes, the minimal solution would be a command-line switch like I'd like to check how libfuse filesystems handle this. Maybe this can be handled automatically. |
Looks like the situation is worse in libfuse: vgough/encfs#452 |
At the moment, it does two things: 1. Disable stat() caching so changes to the backing storage show up immediately. 2. Disable hard link tracking, as the inode numbers on the backing storage are not stable when files are deleted and re-created behind our back. This would otherwise produce strange "file does not exist" and other errors. Mitigates #156
I have added the Automatically handling this seems pretty difficult, so this will stay a manual switch for now. |
If you're inclined to close this ticket, I consider the above an adequate fix. |
Ok let's close this for now. Thanks again for the report! |
This is distinct from #39 because it proposes using advisory locks to simulate Windows single-writer filesystem semantics (which are safer when having a writer can invalidate concurrent reads), as opposed to letting applications make their own use of advisory locks.
I'm using gocryptfs to encrypt an ElasticSearch snapshot store.
This works perfectly when the ElasticSearch cluster consists of only one node. With multiple nodes, however, there are various race conditions and timing errors that take place as one node tries to read content that another has not fully written, or when a node attempts to delete a presently-invalid/partial file (resulting in the
unlink()
attempt returningENOENT
despite the file actually existing).Since ElasticSearch's snapshot support is designed to work with Microsoft's filesystem semantics -- where only a single writer to a given file is allowed and no concurrent reads are permitted during writes -- it should be possible to avoid these errors by using POSIX advisory locks to prevent any node from reading or writing to a file which any other node is actively writing to, or to prevent a file from being opened for write while any other node has it opened for read.
Obviously this would not be appropriate as default behavior, but would be useful to provide as an available option.
Thoughts on feasibility? Guidance on where to start with an implementation?
The text was updated successfully, but these errors were encountered: