Skip to content

Commit

Permalink
change inode conversion algo to reduce collision
Browse files Browse the repository at this point in the history
  • Loading branch information
trapexit committed Feb 13, 2020
1 parent f45a52d commit a646fe0
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 14 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1070,9 +1070,9 @@ Yes. While some users have reported problems it appears to always be related to

#### How are inodes calculated?

mergerfs-inode = (original-inode | (device-id << 32))
https://github.com/trapexit/mergerfs/blob/master/src/fs_inode.hpp

While `ino_t` is 64 bits only a few filesystems use more than 32. Similarly, while `dev_t` is also 64 bits it was traditionally 16 bits. Bitwise or'ing them together should work most of the time. While totally unique inodes are preferred the overhead which would be needed does not seem to out weighted by the benefits.
Originally tried to simply OR st_ino and (st_dev << 32) for 64bit systems. After a number of years someone finally ran into a collision that lead to some problems. Traditionally `dev_t` was 16bit and `ino_t` was 32bit so merging into one 64bit value worked but with both types being able to be up to 64bit that is no longer as simple. A proper hash seems like the best compromise. While totally unique inodes are preferred the overhead which would be needed does not seem to be outweighed by the benefits.

While atypical, yes, inodes can be reused and not refer to the same file. The internal id used to reference a file in FUSE is different from the inode value presented. The former is the `nodeid` and is actually a tuple of (nodeid,generation). That tuple is not user facing. The inode is merely metadata passed through the kernel and found using the `stat` family of calls or `readdir`.

Expand All @@ -1089,6 +1089,7 @@ Note that this does *not* affect the inode that libfuse
and the kernel use internally (also called the "nodeid").
```

Generally collision, if it occurs, shouldn't be a problem. You can turn off the calculation by not using `use_ino`. In the future it might be worth creating different strategies for users to select from.

#### I notice massive slowdowns of writes over NFS

Expand Down
23 changes: 16 additions & 7 deletions man/mergerfs.1
Original file line number Diff line number Diff line change
Expand Up @@ -2227,14 +2227,18 @@ While some users have reported problems it appears to always be related
to how Samba is setup in relation to permissions.
.SS How are inodes calculated?
.PP
mergerfs\-inode = (original\-inode | (device\-id << 32))
.PP
While \f[C]ino_t\f[] is 64 bits only a few filesystems use more than 32.
Similarly, while \f[C]dev_t\f[] is also 64 bits it was traditionally 16
bits.
Bitwise or\[aq]ing them together should work most of the time.
https://github.com/trapexit/mergerfs/blob/master/src/fs_inode.hpp
.PP
Originally tried to simply OR st_ino and (st_dev << 32) for 64bit
systems.
After a number of years someone finally ran into a collision that lead
to some problems.
Traditionally \f[C]dev_t\f[] was 16bit and \f[C]ino_t\f[] was 32bit so
merging into one 64bit value worked but with both types being able to be
up to 64bit that is no longer as simple.
A proper hash seems like the best compromise.
While totally unique inodes are preferred the overhead which would be
needed does not seem to out weighted by the benefits.
needed does not seem to be outweighed by the benefits.
.PP
While atypical, yes, inodes can be reused and not refer to the same
file.
Expand All @@ -2260,6 +2264,11 @@ Note\ that\ this\ does\ *not*\ affect\ the\ inode\ that\ libfuse
and\ the\ kernel\ use\ internally\ (also\ called\ the\ "nodeid").
\f[]
.fi
.PP
Generally collision, if it occurs, shouldn\[aq]t be a problem.
You can turn off the calculation by not using \f[C]use_ino\f[].
In the future it might be worth creating different strategies for users
to select from.
.SS I notice massive slowdowns of writes over NFS
.PP
Due to how NFS works and interacts with FUSE when not using
Expand Down
16 changes: 11 additions & 5 deletions src/fs_inode.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@

#pragma once

#include "fasthash.h"

#include <stdint.h>
#include <sys/stat.h>

Expand All @@ -31,11 +33,15 @@ namespace fs
void
recompute(struct stat *st_)
{
/* not ideal to do this at runtime but likely gets optimized out */
if(sizeof(st_->st_ino) == 4)
st_->st_ino |= ((uint32_t)st_->st_dev << 16);
else
st_->st_ino |= ((uint64_t)st_->st_dev << 32);
uint64_t buf[5];

buf[0] = st_->st_ino;
buf[1] = st_->st_dev;
buf[2] = buf[0] ^ buf[1];
buf[3] = buf[0] & buf[1];
buf[4] = buf[0] | buf[1];

st_->st_ino = fasthash64(&buf[0],sizeof(buf),MAGIC);
}
}
}

0 comments on commit a646fe0

Please sign in to comment.