Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the default hasher to foldhash #563

Merged
merged 1 commit into from
Oct 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ rust-version = "1.63.0"

[dependencies]
# For the default hasher
ahash = { version = "0.8.7", default-features = false, optional = true }
foldhash = { version = "0.1.2", default-features = false, optional = true }

# For external trait impls
rayon = { version = "1.0", optional = true }
Expand Down Expand Up @@ -66,10 +66,10 @@ rustc-dep-of-std = [
# Enables the deprecated RawEntry API.
raw-entry = []

# Provides a default hasher. Currently this is AHash but this is subject to
# Provides a default hasher. Currently this is foldhash but this is subject to
# change in the future. Note that the default hasher does *not* provide HashDoS
# resistance, unlike the one in the standard library.
default-hasher = ["dep:ahash"]
default-hasher = ["dep:foldhash"]

# Enables usage of `#[inline]` on far more functions than by default in this
# crate. This may lead to a performance increase but often comes at a compile
Expand Down
51 changes: 3 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,59 +26,14 @@ in environments without `std`, such as embedded systems and kernels.
## Features

- Drop-in replacement for the standard library `HashMap` and `HashSet` types.
- Uses [AHash](https://github.com/tkaitchuck/aHash) as the default hasher, which is much faster than SipHash.
However, AHash does *not provide the same level of HashDoS resistance* as SipHash, so if that is important to you, you might want to consider using a different hasher.
- Uses [foldhash](https://github.com/orlp/foldhash) as the default hasher, which is much faster than SipHash.
However, foldhash does *not provide the same level of HashDoS resistance* as SipHash, so if that is important to you, you might want to consider using a different hasher.
- Around 2x faster than the previous standard library `HashMap`.
- Lower memory usage: only 1 byte of overhead per entry instead of 8.
- Compatible with `#[no_std]` (but requires a global allocator with the `alloc` crate).
- Empty hash maps do not allocate any memory.
- SIMD lookups to scan multiple hash entries in parallel.

## Performance

Compared to the previous implementation of `std::collections::HashMap` (Rust 1.35).

With the hashbrown default AHash hasher:

| name | oldstdhash ns/iter | hashbrown ns/iter | diff ns/iter | diff % | speedup |
| :-------------------------- | :----------------: | ----------------: | :----------: | ------: | ------- |
| insert_ahash_highbits | 18,865 | 8,020 | -10,845 | -57.49% | x 2.35 |
| insert_ahash_random | 19,711 | 8,019 | -11,692 | -59.32% | x 2.46 |
| insert_ahash_serial | 19,365 | 6,463 | -12,902 | -66.63% | x 3.00 |
| insert_erase_ahash_highbits | 51,136 | 17,916 | -33,220 | -64.96% | x 2.85 |
| insert_erase_ahash_random | 51,157 | 17,688 | -33,469 | -65.42% | x 2.89 |
| insert_erase_ahash_serial | 45,479 | 14,895 | -30,584 | -67.25% | x 3.05 |
| iter_ahash_highbits | 1,399 | 1,092 | -307 | -21.94% | x 1.28 |
| iter_ahash_random | 1,586 | 1,059 | -527 | -33.23% | x 1.50 |
| iter_ahash_serial | 3,168 | 1,079 | -2,089 | -65.94% | x 2.94 |
| lookup_ahash_highbits | 32,351 | 4,792 | -27,559 | -85.19% | x 6.75 |
| lookup_ahash_random | 17,419 | 4,817 | -12,602 | -72.35% | x 3.62 |
| lookup_ahash_serial | 15,254 | 3,606 | -11,648 | -76.36% | x 4.23 |
| lookup_fail_ahash_highbits | 21,187 | 4,369 | -16,818 | -79.38% | x 4.85 |
| lookup_fail_ahash_random | 21,550 | 4,395 | -17,155 | -79.61% | x 4.90 |
| lookup_fail_ahash_serial | 19,450 | 3,176 | -16,274 | -83.67% | x 6.12 |


With the libstd default SipHash hasher:

| name | oldstdhash ns/iter | hashbrown ns/iter | diff ns/iter | diff % | speedup |
| :------------------------ | :----------------: | ----------------: | :----------: | ------: | ------- |
| insert_std_highbits | 19,216 | 16,885 | -2,331 | -12.13% | x 1.14 |
| insert_std_random | 19,179 | 17,034 | -2,145 | -11.18% | x 1.13 |
| insert_std_serial | 19,462 | 17,493 | -1,969 | -10.12% | x 1.11 |
| insert_erase_std_highbits | 50,825 | 35,847 | -14,978 | -29.47% | x 1.42 |
| insert_erase_std_random | 51,448 | 35,392 | -16,056 | -31.21% | x 1.45 |
| insert_erase_std_serial | 87,711 | 38,091 | -49,620 | -56.57% | x 2.30 |
| iter_std_highbits | 1,378 | 1,159 | -219 | -15.89% | x 1.19 |
| iter_std_random | 1,395 | 1,132 | -263 | -18.85% | x 1.23 |
| iter_std_serial | 1,704 | 1,105 | -599 | -35.15% | x 1.54 |
| lookup_std_highbits | 17,195 | 13,642 | -3,553 | -20.66% | x 1.26 |
| lookup_std_random | 17,181 | 13,773 | -3,408 | -19.84% | x 1.25 |
| lookup_std_serial | 15,483 | 13,651 | -1,832 | -11.83% | x 1.13 |
| lookup_fail_std_highbits | 20,926 | 13,474 | -7,452 | -35.61% | x 1.55 |
| lookup_fail_std_random | 21,766 | 13,505 | -8,261 | -37.95% | x 1.61 |
| lookup_fail_std_serial | 19,336 | 13,519 | -5,817 | -30.08% | x 1.43 |

## Usage

Add this to your `Cargo.toml`:
Expand Down Expand Up @@ -107,7 +62,7 @@ This crate has the following Cargo features:
- `raw-entry`: Enables access to the deprecated `RawEntry` API.
- `inline-more`: Adds inline hints to most functions, improving run-time performance at the cost
of compilation time. (enabled by default)
- `default-hasher`: Compiles with ahash as default hasher. (enabled by default)
- `default-hasher`: Compiles with foldhash as default hasher. (enabled by default)
- `allocator-api2`: Enables support for allocators that support `allocator-api2`. (enabled by default)

## License
Expand Down
54 changes: 27 additions & 27 deletions benches/bench.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// This benchmark suite contains some benchmarks along a set of dimensions:
// Hasher: std default (SipHash) and crate default (AHash).
// Hasher: std default (SipHash) and crate default (foldhash).
// Int key distribution: low bit heavy, top bit heavy, and random.
// Task: basic functionality: insert, insert_erase, lookup, lookup_fail, iter
#![feature(test)]
Expand All @@ -18,7 +18,7 @@ use std::{
const SIZE: usize = 1000;

// The default hashmap when using this crate directly.
type AHashMap<K, V> = HashMap<K, V, DefaultHashBuilder>;
type FoldHashMap<K, V> = HashMap<K, V, DefaultHashBuilder>;
// This uses the hashmap from this crate with the default hasher of the stdlib.
type StdHashMap<K, V> = HashMap<K, V, RandomState>;

Expand Down Expand Up @@ -58,22 +58,22 @@ impl Drop for DropType {
}

macro_rules! bench_suite {
($bench_macro:ident, $bench_ahash_serial:ident, $bench_std_serial:ident,
$bench_ahash_highbits:ident, $bench_std_highbits:ident,
$bench_ahash_random:ident, $bench_std_random:ident) => {
$bench_macro!($bench_ahash_serial, AHashMap, 0..);
($bench_macro:ident, $bench_foldhash_serial:ident, $bench_std_serial:ident,
$bench_foldhash_highbits:ident, $bench_std_highbits:ident,
$bench_foldhash_random:ident, $bench_std_random:ident) => {
$bench_macro!($bench_foldhash_serial, FoldHashMap, 0..);
$bench_macro!($bench_std_serial, StdHashMap, 0..);
$bench_macro!(
$bench_ahash_highbits,
AHashMap,
$bench_foldhash_highbits,
FoldHashMap,
(0..).map(usize::swap_bytes)
);
$bench_macro!(
$bench_std_highbits,
StdHashMap,
(0..).map(usize::swap_bytes)
);
$bench_macro!($bench_ahash_random, AHashMap, RandomKeys::new());
$bench_macro!($bench_foldhash_random, FoldHashMap, RandomKeys::new());
$bench_macro!($bench_std_random, StdHashMap, RandomKeys::new());
};
}
Expand All @@ -97,11 +97,11 @@ macro_rules! bench_insert {

bench_suite!(
bench_insert,
insert_ahash_serial,
insert_foldhash_serial,
insert_std_serial,
insert_ahash_highbits,
insert_foldhash_highbits,
insert_std_highbits,
insert_ahash_random,
insert_foldhash_random,
insert_std_random
);

Expand All @@ -122,11 +122,11 @@ macro_rules! bench_grow_insert {

bench_suite!(
bench_grow_insert,
grow_insert_ahash_serial,
grow_insert_foldhash_serial,
grow_insert_std_serial,
grow_insert_ahash_highbits,
grow_insert_foldhash_highbits,
grow_insert_std_highbits,
grow_insert_ahash_random,
grow_insert_foldhash_random,
grow_insert_std_random
);

Expand Down Expand Up @@ -158,11 +158,11 @@ macro_rules! bench_insert_erase {

bench_suite!(
bench_insert_erase,
insert_erase_ahash_serial,
insert_erase_foldhash_serial,
insert_erase_std_serial,
insert_erase_ahash_highbits,
insert_erase_foldhash_highbits,
insert_erase_std_highbits,
insert_erase_ahash_random,
insert_erase_foldhash_random,
insert_erase_std_random
);

Expand All @@ -187,11 +187,11 @@ macro_rules! bench_lookup {

bench_suite!(
bench_lookup,
lookup_ahash_serial,
lookup_foldhash_serial,
lookup_std_serial,
lookup_ahash_highbits,
lookup_foldhash_highbits,
lookup_std_highbits,
lookup_ahash_random,
lookup_foldhash_random,
lookup_std_random
);

Expand All @@ -216,11 +216,11 @@ macro_rules! bench_lookup_fail {

bench_suite!(
bench_lookup_fail,
lookup_fail_ahash_serial,
lookup_fail_foldhash_serial,
lookup_fail_std_serial,
lookup_fail_ahash_highbits,
lookup_fail_foldhash_highbits,
lookup_fail_std_highbits,
lookup_fail_ahash_random,
lookup_fail_foldhash_random,
lookup_fail_std_random
);

Expand All @@ -244,11 +244,11 @@ macro_rules! bench_iter {

bench_suite!(
bench_iter,
iter_ahash_serial,
iter_foldhash_serial,
iter_std_serial,
iter_ahash_highbits,
iter_foldhash_highbits,
iter_std_highbits,
iter_ahash_random,
iter_foldhash_random,
iter_std_random
);

Expand Down
6 changes: 3 additions & 3 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,11 @@
#![cfg_attr(feature = "nightly", warn(fuzzy_provenance_casts))]
#![cfg_attr(feature = "nightly", allow(internal_features))]

/// Default hasher for [`HashMap`], [`HashSet`] and [`HashTable`].
/// Default hasher for [`HashMap`] and [`HashSet`].
#[cfg(feature = "default-hasher")]
pub type DefaultHashBuilder = core::hash::BuildHasherDefault<ahash::AHasher>;
pub type DefaultHashBuilder = foldhash::fast::RandomState;

/// Dummy default hasher for [`HashMap`], [`HashSet`] and [`HashTable`].
/// Dummy default hasher for [`HashMap`] and [`HashSet`].
#[cfg(not(feature = "default-hasher"))]
pub enum DefaultHashBuilder {}

Expand Down
20 changes: 10 additions & 10 deletions src/map.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ pub use crate::raw_entry::*;

/// A hash map implemented with quadratic probing and SIMD lookup.
///
/// The default hashing algorithm is currently [`AHash`], though this is
/// The default hashing algorithm is currently [`foldhash`], though this is
/// subject to change at any point in the future. This hash function is very
/// fast for all types of keys, but this algorithm will typically *not* protect
/// against attacks such as HashDoS.
Expand Down Expand Up @@ -142,7 +142,7 @@ pub use crate::raw_entry::*;
/// [`with_hasher`]: #method.with_hasher
/// [`with_capacity_and_hasher`]: #method.with_capacity_and_hasher
/// [`fnv`]: https://crates.io/crates/fnv
/// [`AHash`]: https://crates.io/crates/ahash
/// [`foldhash`]: https://crates.io/crates/foldhash
///
/// ```
/// use hashbrown::HashMap;
Expand Down Expand Up @@ -270,7 +270,7 @@ impl<K, V> HashMap<K, V, DefaultHashBuilder> {
/// The `hash_builder` normally use a fixed key by default and that does
/// not allow the `HashMap` to be protected against attacks such as [`HashDoS`].
/// Users who require HashDoS resistance should explicitly use
/// [`ahash::RandomState`] or [`std::collections::hash_map::RandomState`]
/// [`std::collections::hash_map::RandomState`]
/// as the hasher when creating a [`HashMap`], for example with
/// [`with_hasher`](HashMap::with_hasher) method.
///
Expand Down Expand Up @@ -300,7 +300,7 @@ impl<K, V> HashMap<K, V, DefaultHashBuilder> {
/// The `hash_builder` normally use a fixed key by default and that does
/// not allow the `HashMap` to be protected against attacks such as [`HashDoS`].
/// Users who require HashDoS resistance should explicitly use
/// [`ahash::RandomState`] or [`std::collections::hash_map::RandomState`]
/// [`std::collections::hash_map::RandomState`]
/// as the hasher when creating a [`HashMap`], for example with
/// [`with_capacity_and_hasher`](HashMap::with_capacity_and_hasher) method.
///
Expand Down Expand Up @@ -333,7 +333,7 @@ impl<K, V, A: Allocator> HashMap<K, V, DefaultHashBuilder, A> {
/// The `hash_builder` normally use a fixed key by default and that does
/// not allow the `HashMap` to be protected against attacks such as [`HashDoS`].
/// Users who require HashDoS resistance should explicitly use
/// [`ahash::RandomState`] or [`std::collections::hash_map::RandomState`]
/// [`std::collections::hash_map::RandomState`]
/// as the hasher when creating a [`HashMap`], for example with
/// [`with_hasher_in`](HashMap::with_hasher_in) method.
///
Expand Down Expand Up @@ -377,7 +377,7 @@ impl<K, V, A: Allocator> HashMap<K, V, DefaultHashBuilder, A> {
/// The `hash_builder` normally use a fixed key by default and that does
/// not allow the `HashMap` to be protected against attacks such as [`HashDoS`].
/// Users who require HashDoS resistance should explicitly use
/// [`ahash::RandomState`] or [`std::collections::hash_map::RandomState`]
/// [`std::collections::hash_map::RandomState`]
/// as the hasher when creating a [`HashMap`], for example with
/// [`with_capacity_and_hasher_in`](HashMap::with_capacity_and_hasher_in) method.
///
Expand Down Expand Up @@ -429,7 +429,7 @@ impl<K, V, S> HashMap<K, V, S> {
/// The `hash_builder` normally use a fixed key by default and that does
/// not allow the `HashMap` to be protected against attacks such as [`HashDoS`].
/// Users who require HashDoS resistance should explicitly use
/// [`ahash::RandomState`] or [`std::collections::hash_map::RandomState`]
/// [`std::collections::hash_map::RandomState`]
/// as the hasher when creating a [`HashMap`].
///
/// The `hash_builder` passed should implement the [`BuildHasher`] trait for
Expand Down Expand Up @@ -471,7 +471,7 @@ impl<K, V, S> HashMap<K, V, S> {
/// The `hash_builder` normally use a fixed key by default and that does
/// not allow the `HashMap` to be protected against attacks such as [`HashDoS`].
/// Users who require HashDoS resistance should explicitly use
/// [`ahash::RandomState`] or [`std::collections::hash_map::RandomState`]
/// [`std::collections::hash_map::RandomState`]
/// as the hasher when creating a [`HashMap`].
///
/// The `hash_builder` passed should implement the [`BuildHasher`] trait for
Expand Down Expand Up @@ -521,7 +521,7 @@ impl<K, V, S, A: Allocator> HashMap<K, V, S, A> {
/// The `hash_builder` normally use a fixed key by default and that does
/// not allow the `HashMap` to be protected against attacks such as [`HashDoS`].
/// Users who require HashDoS resistance should explicitly use
/// [`ahash::RandomState`] or [`std::collections::hash_map::RandomState`]
/// [`std::collections::hash_map::RandomState`]
/// as the hasher when creating a [`HashMap`].
///
/// [`HashDoS`]: https://en.wikipedia.org/wiki/Collision_attack
Expand Down Expand Up @@ -556,7 +556,7 @@ impl<K, V, S, A: Allocator> HashMap<K, V, S, A> {
/// The `hash_builder` normally use a fixed key by default and that does
/// not allow the `HashMap` to be protected against attacks such as [`HashDoS`].
/// Users who require HashDoS resistance should explicitly use
/// [`ahash::RandomState`] or [`std::collections::hash_map::RandomState`]
/// [`std::collections::hash_map::RandomState`]
/// as the hasher when creating a [`HashMap`].
///
/// [`HashDoS`]: https://en.wikipedia.org/wiki/Collision_attack
Expand Down
Loading
Loading