pub struct BytellHashTable<TKey, TValue, TConfig>{ /* private fields */ }Expand description
§The Bytell Hash Table
The bytell (byte linked list) hash table is an algorithm by Malte Skarupke, which we implement here in Rust.
The overview of the algorithm can be found here: A new fast hash table in response to Google’s new fast hash table
And Matle Skarupke’s C++ implementation can be found in he’s GitHub repository.
§Properties
The bytell hash table is optimized for memory usage, it uses 1 byte overhead per entry. Insetion, query and removal are quite fast as well.
§Basic layout
Bytell hash table internally is built from a contiguous sequence of blocks. The number of blocks is always a power of two, at any time.
Each block is composed of N entries, where N is also a power of two, and it
is fixed, depending on key and value types only. Typically N is a low number like 8 or 16.
A block entry is a triple (control_byte, key, value), where control_byte is
a single byte (explained later), key is a comparable, hashable key and value
is any value.
The exact layout of block is not a contiguous sequence of (control_byte, key, value)
triples. On the contrary, we first store a sequence of N control bytes, and then
a sequence of N key-value pairs. In particular N has to be at least the alignment
of (key, value) pair (in reality it is at least 8 as well).
The capacity of bytell hash table is calculated as number_of_blocks * N. And so
it is a power of two as well.
Additionally a hash function hash(x) is attached to the bytell hash table. And also
a hash_to_index(x) function, which turns the hashed value into an index usable by
the bytell hash table internally. The hash_to_index(x) value is guaranteed to be
in 0..capacity range.
§control_byte
The control_byte is a byte (8-bit unsigned number) which is divided into two pieces:
- The first bit, if set, denotes that the entry is “direct”, meaning some hash maps
exactly to this entry. If it is zero, then it is a “storage” entry, meaning it is used
for storing a linked list of (key, value) pairs that share the hash. We will call
this field
control_byte.type. - The remaining 7 bits are used as jump distance indexes. Given such an entry with non-zero
jump distance index we can calculate the next item in the linked list by calculating an
appropriate offset based on the index. So this field is a 7-bit integer denoted by
control_byte.distance_index.
The distance_index is not an offset to the next entry. There’s an indirection: the
distance_index retrieves the offset from a global static JUMP_DISTANCES array, where
each distance is carefuly chosen for the highest quality.
Nevertheless, the next_entry that follows prev_entry through distance_index will
be denoted by next_entry := prev_entry + prev_entry.control_byte.distance_index. We
keep in mind that in reality there’s this JUMP_DISTANCES array indirection.
§Query
For a given query_key the retrieval of the value works through the following steps:
- Let
hash_value := hash(query_key) - Let
index := hash_to_index(hash_value) - Let
block := blocks[index % number_of_blocks] - Let
entry := block.entries[index % N] - If
entry.control_byte.type != DIRECT_HITthen returnNONE - If
entry.control_byte.type == DIRECT_HITthen:- loop:
- If
entry.key == query_keythen return(entry.key, entry.value) - If
entry.control_byte.distance_index == 0then returnNONE entry := entry + entry.control_byte.distance_index
- If
- loop:
§Delete
For a given delete_key the deletion of a key works like this:
- Let
hash_value := hash(delete_key) - Let
index := hash_to_index(hash_value) - Let
block := blocks[index % number_of_blocks] - Let
entry := block.entries[index % N] - If
entry.control_byte.type != DIRECT_HITthen returnNONE - loop:
- If
entry.key == delete_keythen break withfound_entry := entry - If
entry.control_byte.distance_index == 0then returnNONE entry := entry + entry.control_byte.distance_index
- If
- We found matching entry. But in order to delete it, we first swap it with the last in the linked list.
- Let
tail := last_linked_entry(found_entry) - If
tail != found_entrythenswap(tail, found_entry). Nowfound_entryis at the end of the list, it has no successor. - Let
parent_entry := parent(found_entry) parent_entry.control_byte.distance_index := 0found_entry.control_byte := EMPTY- Return
found_entry
§Insert
For a given (insert_key, insert_value) pair the insertion works like this:
- Let
hash_value := hash(insert_key) - Let
index := hash_to_index(hash_value) - Let
block := blocks[index % number_of_blocks] - Let
entry := block.entries[index % N] - If
entry.control_byte.type == DIRECT_HIT:- loop “direct”:
- If
entry.key == insert_keythenentry.value := insert_valueand returnold_value. We did an in-place update. - If
entry.control_byte.distance_index != 0then:entry := entry + entry.control_byte.distance_index- continue loop “direct”
- We reached the end of the linked list. We need to insert a new element.
- If
should_grow()then grow the table - loop “inner”:
- Let
(free_entry, free_entry_distance) := search_for_free_entry(entry) - If found then break “inner”, otherwise grow the table. Panic if that is not possible.
- Let
entry.control_byte.distance_index := free_entry_distancefree_entry.control_byte.type := STORAGEfree_entry.control_byte.distance_index := 0free_entry.key := insert_keyfree_entry.value := value- RETURN
- If
- loop “direct”:
- If
entry.control_byte.type != DIRECT_HIT:- We have direct hit on an entry that is actually a storage entry. What we need to do is to move around values to make place for this hit.
- Let
parent_entry := parent(entry) - Let
tmp_current := entryandentry := reserved. We reserve the entry, because we don’t know the operation of moving around stuff to potentially hit it again. - Let
original_entry := entry - loop “outer”:
- loop “inner”:
- Let
(free_entry, free_entry_distance) := search_for_free_entry(entry) - If found then break “inner”, otherwise grow the table. Panic if that is not possible
- Let
- copy
tmp_currenttofree_entry, set distance onparent_entry - If
tmp_current.control_byte.distance_index == 0then break “outer” parent_entry := tmp_current, andtmp_current := tmp_current + tmp_current.control_byte.distance_index
- loop “inner”:
- Now we have place for a direct hit
original_entry.control_byte.type := DIRECT_HIToriginal_entry.control_byte.distance_index := 0original_entry.key := insert_keyoriginal_entry.value := insert_value- RETURN
Implementations§
Source§impl<TKey, TValue, TConfig> BytellHashTable<TKey, TValue, TConfig>
impl<TKey, TValue, TConfig> BytellHashTable<TKey, TValue, TConfig>
Sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new BytellHashTable with the default configuration.
Sourcepub fn with_config(config: TConfig) -> Self
pub fn with_config(config: TConfig) -> Self
Creates a new BytellHashTable with the specified configuration.
Sourcepub fn with_capacity(number_of_items: u32) -> Result<Self, BytellError>
pub fn with_capacity(number_of_items: u32) -> Result<Self, BytellError>
Creates a new BytellHashTable with the specified capacity and the default configuration.
§Errors
Returns BytellError::AllocationError if the allocation fails.
§Panics
If the expected_capacity exceeds u32::MAX
Sourcepub fn with_capacity_and_config(
number_of_items: u32,
config: TConfig,
) -> Result<Self, BytellError>
pub fn with_capacity_and_config( number_of_items: u32, config: TConfig, ) -> Result<Self, BytellError>
Creates a new BytellHashTable expected to support passed number_of_items.
§Notes
This method will almost surely overallocate, since it takes config.load_factor()
into account.
§Errors
Returns BytellError::AllocationError if the allocation fails.
§Panics
If the expected_capacity exceeds u32::MAX.
Sourcepub fn shrink_to_fit(&mut self) -> Result<(), BytellError>
pub fn shrink_to_fit(&mut self) -> Result<(), BytellError>
Reduces the memory of the current table, if possible.
§Notes
If the number of underlying blocks cannot be reduced, it does nothing.
Otherwise it tries to reduce the number of blocks to the
minimal possible, creates a new table with that layout,
and moves (rehases) items into it. Then overwrites self
with the new table.
§Errors
Returns BytellError::AllocationError if it cannot allocate
memory for the new hidden table.
Sourcepub fn capacity(&self) -> PowerOfTwo32
pub fn capacity(&self) -> PowerOfTwo32
Returns the capacity of the BytellHashTable.
Trait Implementations§
Source§impl<TKey, TValue, TConfig> Clone for BytellHashTable<TKey, TValue, TConfig>
impl<TKey, TValue, TConfig> Clone for BytellHashTable<TKey, TValue, TConfig>
Source§impl<TKey, TValue, TConfig> Default for BytellHashTable<TKey, TValue, TConfig>
impl<TKey, TValue, TConfig> Default for BytellHashTable<TKey, TValue, TConfig>
Source§impl<TKey, TValue, TConfig> Drop for BytellHashTable<TKey, TValue, TConfig>
impl<TKey, TValue, TConfig> Drop for BytellHashTable<TKey, TValue, TConfig>
Source§impl<TKey, TValue, TConfig> Hash for BytellHashTable<TKey, TValue, TConfig>
impl<TKey, TValue, TConfig> Hash for BytellHashTable<TKey, TValue, TConfig>
Source§impl<TKey, TValue, TConfig> ImmutableHashTable<TKey, TValue> for BytellHashTable<TKey, TValue, TConfig>
impl<TKey, TValue, TConfig> ImmutableHashTable<TKey, TValue> for BytellHashTable<TKey, TValue, TConfig>
Source§fn length(&self) -> Length
fn length(&self) -> Length
(TKey, TValue) pairs the table
contains. This typically does not correspond to the actual
size of the table in bytes.Source§fn contains<Q>(&self, key: &Q) -> bool
fn contains<Q>(&self, key: &Q) -> bool
key. Read moreSource§fn get<Q>(&self, key: &Q) -> Option<&TValue>
fn get<Q>(&self, key: &Q) -> Option<&TValue>
key, and if so then returns
the reference to the TValue, or None otherwise. Read moreSource§fn get_key_value<Q>(&self, key: &Q) -> Option<(&TKey, &TValue)>
fn get_key_value<Q>(&self, key: &Q) -> Option<(&TKey, &TValue)>
key, and if so then returns
the (&TKey, &TValue) pair or None otherwise. Read moreSource§impl<TKey, TValue, TConfig> MutableHashTable<TKey, TValue> for BytellHashTable<TKey, TValue, TConfig>
impl<TKey, TValue, TConfig> MutableHashTable<TKey, TValue> for BytellHashTable<TKey, TValue, TConfig>
Source§fn insert(&mut self, key: TKey, value: TValue) -> Option<TValue>
fn insert(&mut self, key: TKey, value: TValue) -> Option<TValue>
(TKey, TValue) pair into the table. Read moreSource§fn remove_entry<Q>(&mut self, key: &Q) -> Option<(TKey, TValue)>
fn remove_entry<Q>(&mut self, key: &Q) -> Option<(TKey, TValue)>
(TKey, TValue) pair
for the matching key or None if there is no match. Read moreSource§fn insert_or_update_with<FAdd, FUpdate>(
&mut self,
key: TKey,
adder: FAdd,
updater: FUpdate,
) -> &mut TValue
fn insert_or_update_with<FAdd, FUpdate>( &mut self, key: TKey, adder: FAdd, updater: FUpdate, ) -> &mut TValue
key. If the table contains it, then
it runs updater on the corresponding TValue. Otherwise runs adder
to add a new TValue to the table. Returns the mutable reference to the
final TValue. Read moreSource§fn iter_mut<'a>(
&'a mut self,
) -> impl Iterator<Item = (&'a TKey, &'a mut TValue)>where
TKey: 'a,
TValue: 'a,
Self: 'a,
fn iter_mut<'a>(
&'a mut self,
) -> impl Iterator<Item = (&'a TKey, &'a mut TValue)>where
TKey: 'a,
TValue: 'a,
Self: 'a,
Source§fn remove<Q>(&mut self, key: &Q) -> Option<TValue>
fn remove<Q>(&mut self, key: &Q) -> Option<TValue>
MutableHashTable::remove_entry,
but returns TValue only for the matching key or None if there is no match. Read moreSource§fn get_or_insert_default(&mut self, key: TKey) -> &mut TValuewhere
TValue: Default,
fn get_or_insert_default(&mut self, key: TKey) -> &mut TValuewhere
TValue: Default,
TValue, or inserts a default one. Read moreSource§fn get_or_insert(&mut self, key: TKey, value: TValue) -> &mut TValue
fn get_or_insert(&mut self, key: TKey, value: TValue) -> &mut TValue
TValue, or inserts the passed one. Read more