-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std::hash::Hash
documentation should suggest that the hash data should be prefix-free
#89429
Comments
Should hand-implementors of struct User {
name: String,
age: u32,
cool: bool,
}
impl Hash for User {
fn hash<H: Hasher>(&self, state: &mut H) {
self.name.hash(state);
self.age.hash(state);
self.cool.hash(state);
}
} There are no magic bytes here, as in |
use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};
#[derive(Hash)]
struct User {
name: String,
age: u32,
cool: bool,
}
struct Boozer {
name: String,
age: u32,
cool: bool,
}
impl Hash for Boozer {
fn hash<H: Hasher>(&self, state: &mut H) {
self.name.hash(state);
self.age.hash(state);
self.cool.hash(state);
}
}
fn hash<H: Hash>(h: H) {
let mut hasher = DefaultHasher::new();
h.hash(&mut hasher);
println!("{}", hasher.finish());
}
fn main() {
let user = User {
name: "Jack".to_string(),
age: 8,
cool: true,
};
let boozer = Boozer {
name: "Jack".to_string(),
age: 8,
cool: true,
};
let tuple = ("Jack".to_string(), 8, true);
hash(user);
hash(boozer);
hash(tuple);
} The results are:
Is this expected? |
Different types hashing to the same value are not a problem. What is a problem is different (unequal) values of the same type hashing to the same value (for all |
@rustbot claim |
Collisions are inevitable, and it shouldn't matter between I do agree that types should try to avoid collisions, as long as we make this kind of thing a suggestion of best practice, not stated as a hard requirement of the trait. |
Collisions in One benefit of not creating collisions in |
OK, that's a compelling distinction between the responsibilities of Is it always possible to do this perfectly on the |
[Trying again to set labels since I typoed the first one:] |
I am not sure it's always possible, in theory you could imagine some contrived I think it would be fine to say that if can't come up with a proper |
For the docs, would it make sense to discuss the case of rust/library/core/src/hash/mod.rs Lines 246 to 249 in ed93759
The paragraph could start with something like:
|
Let's then note that |
I gave this a try! I might not have understood correctly, however... |
FWIW, being prefix-free can also be unfortunate sometimes. For example, I don't know that there's a great answer for this, though... |
See #89443. |
Is there a real reason In any case, slices are already prefix-free (as long as T is) (and I think that's good, otherwise you'd have a lot of collisions on different short slices), so this issue doesn't change that. |
Yes, |
Nominated this for @rust-lang/libs-api discussion. |
@tczajka I added a comment to the impl in #86140 after I forget why it was that way too :) |
One possibility would be to leave the choice of whether to add a prefix to the This could be done with an optional method on the trait Hasher {
// Option 1
fn should_add_prefix(&self) -> bool { false }
// Option 2
fn write_prefix(&mut self, prefix: usize) { self.write_usize(prefix); }
} |
I dispute the idea that making the Potential huge cost of resulting hash collisions (for certain kinds of data) is likely to greatly overwhelm any such (tiny) savings from lazy hashing. I think of this as a bug because it defeats the assumptions of mathematical analysis of the performance properties of hash tables with certain hashers. |
docs: `std::hash::Hash` should ensure prefix-free data Attempt to synthesize the discussion in rust-lang#89429 into a suggestion regarding `Hash` implementations (not a hard requirement). Closes rust-lang#89429.
As discussed at Why does
str.hash(…)
pass an extra byte to the Hasher?: the code ofimpl Hash for &str
specifically passes an extra 0xFF byte to the Hasher, so that values like("ab", "c")
and("a", "bc")
hash differently (added in 6066118).This is a subtle property of hashing and should probably be mentioned in the documentation for
Hash
— particularly as the documentation forHasher
already says “you cannot assume, for example, that awrite_u32
call is equivalent to four calls ofwrite_u8
” which could be sloppily interpreted as an expectation that a goodHasher
implementation will handle “quoting” of sequential calls itself.If an implementor of
Hash
fails to have this property, it could compromise the hash-DoS resistance that Rust tries to offer by default.@rustbot labels: +A-docs +T-libs-api +C-enhancement
The text was updated successfully, but these errors were encountered: