Skip to main content

BytellHashTable

Struct BytellHashTable 

Source
pub struct BytellHashTable<TKey, TValue, TConfig>
where TKey: Eq + Hash, TConfig: BytellConfig,
{ /* private fields */ }
Expand description

§The Bytell Hash Table

The bytell (byte linked list) hash table is an algorithm by Malte Skarupke, which we implement here in Rust.

The overview of the algorithm can be found here: A new fast hash table in response to Google’s new fast hash table

And Matle Skarupke’s C++ implementation can be found in he’s GitHub repository.

§Properties

The bytell hash table is optimized for memory usage, it uses 1 byte overhead per entry. Insetion, query and removal are quite fast as well.

§Basic layout

Bytell hash table internally is built from a contiguous sequence of blocks. The number of blocks is always a power of two, at any time.

Each block is composed of N entries, where N is also a power of two, and it is fixed, depending on key and value types only. Typically N is a low number like 8 or 16.

A block entry is a triple (control_byte, key, value), where control_byte is a single byte (explained later), key is a comparable, hashable key and value is any value.

The exact layout of block is not a contiguous sequence of (control_byte, key, value) triples. On the contrary, we first store a sequence of N control bytes, and then a sequence of N key-value pairs. In particular N has to be at least the alignment of (key, value) pair (in reality it is at least 8 as well).

The capacity of bytell hash table is calculated as number_of_blocks * N. And so it is a power of two as well.

Additionally a hash function hash(x) is attached to the bytell hash table. And also a hash_to_index(x) function, which turns the hashed value into an index usable by the bytell hash table internally. The hash_to_index(x) value is guaranteed to be in 0..capacity range.

§control_byte

The control_byte is a byte (8-bit unsigned number) which is divided into two pieces:

  1. The first bit, if set, denotes that the entry is “direct”, meaning some hash maps exactly to this entry. If it is zero, then it is a “storage” entry, meaning it is used for storing a linked list of (key, value) pairs that share the hash. We will call this field control_byte.type.
  2. The remaining 7 bits are used as jump distance indexes. Given such an entry with non-zero jump distance index we can calculate the next item in the linked list by calculating an appropriate offset based on the index. So this field is a 7-bit integer denoted by control_byte.distance_index.

The distance_index is not an offset to the next entry. There’s an indirection: the distance_index retrieves the offset from a global static JUMP_DISTANCES array, where each distance is carefuly chosen for the highest quality.

Nevertheless, the next_entry that follows prev_entry through distance_index will be denoted by next_entry := prev_entry + prev_entry.control_byte.distance_index. We keep in mind that in reality there’s this JUMP_DISTANCES array indirection.

§Query

For a given query_key the retrieval of the value works through the following steps:

  • Let hash_value := hash(query_key)
  • Let index := hash_to_index(hash_value)
  • Let block := blocks[index % number_of_blocks]
  • Let entry := block.entries[index % N]
  • If entry.control_byte.type != DIRECT_HIT then return NONE
  • If entry.control_byte.type == DIRECT_HIT then:
    • loop:
      • If entry.key == query_key then return (entry.key, entry.value)
      • If entry.control_byte.distance_index == 0 then return NONE
      • entry := entry + entry.control_byte.distance_index

§Delete

For a given delete_key the deletion of a key works like this:

  • Let hash_value := hash(delete_key)
  • Let index := hash_to_index(hash_value)
  • Let block := blocks[index % number_of_blocks]
  • Let entry := block.entries[index % N]
  • If entry.control_byte.type != DIRECT_HIT then return NONE
  • loop:
    • If entry.key == delete_key then break with found_entry := entry
    • If entry.control_byte.distance_index == 0 then return NONE
    • entry := entry + entry.control_byte.distance_index
  • We found matching entry. But in order to delete it, we first swap it with the last in the linked list.
  • Let tail := last_linked_entry(found_entry)
  • If tail != found_entry then swap(tail, found_entry). Now found_entry is at the end of the list, it has no successor.
  • Let parent_entry := parent(found_entry)
  • parent_entry.control_byte.distance_index := 0
  • found_entry.control_byte := EMPTY
  • Return found_entry

§Insert

For a given (insert_key, insert_value) pair the insertion works like this:

  • Let hash_value := hash(insert_key)
  • Let index := hash_to_index(hash_value)
  • Let block := blocks[index % number_of_blocks]
  • Let entry := block.entries[index % N]
  • If entry.control_byte.type == DIRECT_HIT:
    • loop “direct”:
      • If entry.key == insert_key then entry.value := insert_value and return old_value. We did an in-place update.
      • If entry.control_byte.distance_index != 0 then:
        • entry := entry + entry.control_byte.distance_index
        • continue loop “direct”
      • We reached the end of the linked list. We need to insert a new element.
      • If should_grow() then grow the table
      • loop “inner”:
        • Let (free_entry, free_entry_distance) := search_for_free_entry(entry)
        • If found then break “inner”, otherwise grow the table. Panic if that is not possible.
      • entry.control_byte.distance_index := free_entry_distance
      • free_entry.control_byte.type := STORAGE
      • free_entry.control_byte.distance_index := 0
      • free_entry.key := insert_key
      • free_entry.value := value
      • RETURN
  • If entry.control_byte.type != DIRECT_HIT:
    • We have direct hit on an entry that is actually a storage entry. What we need to do is to move around values to make place for this hit.
    • Let parent_entry := parent(entry)
    • Let tmp_current := entry and entry := reserved. We reserve the entry, because we don’t know the operation of moving around stuff to potentially hit it again.
    • Let original_entry := entry
    • loop “outer”:
      • loop “inner”:
        • Let (free_entry, free_entry_distance) := search_for_free_entry(entry)
        • If found then break “inner”, otherwise grow the table. Panic if that is not possible
      • copy tmp_current to free_entry, set distance on parent_entry
      • If tmp_current.control_byte.distance_index == 0 then break “outer”
      • parent_entry := tmp_current, and tmp_current := tmp_current + tmp_current.control_byte.distance_index
    • Now we have place for a direct hit
    • original_entry.control_byte.type := DIRECT_HIT
    • original_entry.control_byte.distance_index := 0
    • original_entry.key := insert_key
    • original_entry.value := insert_value
    • RETURN

Implementations§

Source§

impl<TKey, TValue, TConfig> BytellHashTable<TKey, TValue, TConfig>
where TKey: Eq + Hash, TConfig: BytellConfig,

Source

pub fn new() -> Self

Creates a new BytellHashTable with the default configuration.

Source

pub fn with_config(config: TConfig) -> Self

Creates a new BytellHashTable with the specified configuration.

Source

pub fn with_capacity(number_of_items: u32) -> Result<Self, BytellError>

Creates a new BytellHashTable with the specified capacity and the default configuration.

§Errors

Returns BytellError::AllocationError if the allocation fails.

§Panics

If the expected_capacity exceeds u32::MAX

Source

pub fn with_capacity_and_config( number_of_items: u32, config: TConfig, ) -> Result<Self, BytellError>

Creates a new BytellHashTable expected to support passed number_of_items.

§Notes

This method will almost surely overallocate, since it takes config.load_factor() into account.

§Errors

Returns BytellError::AllocationError if the allocation fails.

§Panics

If the expected_capacity exceeds u32::MAX.

Source

pub fn shrink_to_fit(&mut self) -> Result<(), BytellError>

Reduces the memory of the current table, if possible.

§Notes

If the number of underlying blocks cannot be reduced, it does nothing.

Otherwise it tries to reduce the number of blocks to the minimal possible, creates a new table with that layout, and moves (rehases) items into it. Then overwrites self with the new table.

§Errors

Returns BytellError::AllocationError if it cannot allocate memory for the new hidden table.

Source

pub fn capacity(&self) -> PowerOfTwo32

Returns the capacity of the BytellHashTable.

Trait Implementations§

Source§

impl<TKey, TValue, TConfig> Clone for BytellHashTable<TKey, TValue, TConfig>
where TKey: Eq + Hash + Clone, TValue: Clone, TConfig: BytellConfig,

Source§

fn clone(&self) -> Self

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<TKey, TValue, TConfig> Default for BytellHashTable<TKey, TValue, TConfig>
where TKey: Eq + Hash, TConfig: BytellConfig,

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl<TKey, TValue, TConfig> Drop for BytellHashTable<TKey, TValue, TConfig>
where TKey: Eq + Hash, TConfig: BytellConfig,

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more
Source§

impl<TKey, TValue, TConfig> Hash for BytellHashTable<TKey, TValue, TConfig>
where TKey: Eq + Hash, TValue: Hash, TConfig: BytellConfig,

Source§

fn hash<H: Hasher>(&self, state: &mut H)

Feeds this value into the given Hasher. Read more
1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
Source§

impl<TKey, TValue, TConfig> ImmutableHashTable<TKey, TValue> for BytellHashTable<TKey, TValue, TConfig>
where TKey: Eq + Hash, TConfig: BytellConfig,

Source§

fn length(&self) -> Length

Returns the number of (TKey, TValue) pairs the table contains. This typically does not correspond to the actual size of the table in bytes.
Source§

fn contains<Q>(&self, key: &Q) -> bool
where TKey: Borrow<Q>, Q: Eq + Hash + ?Sized,

Checks if the table contains the corresponding key. Read more
Source§

fn get<Q>(&self, key: &Q) -> Option<&TValue>
where TKey: Borrow<Q>, Q: Eq + Hash + ?Sized,

Checks if the table contains the corresponding key, and if so then returns the reference to the TValue, or None otherwise. Read more
Source§

fn get_key_value<Q>(&self, key: &Q) -> Option<(&TKey, &TValue)>
where TKey: Borrow<Q>, Q: Eq + Hash + ?Sized,

Checks if the table contains the corresponding key, and if so then returns the (&TKey, &TValue) pair or None otherwise. Read more
Source§

fn iter<'a>(&'a self) -> impl Iterator<Item = (&'a TKey, &'a TValue)>
where TKey: 'a, TValue: 'a, Self: 'a,

Returns an iterator over the key-value pairs in the table. Read more
Source§

fn is_empty(&self) -> bool

Checks if the table is empty. This does not mean that it doesn’t take space in memory. Equivalent to self.length() == 0 check.
Source§

impl<TKey, TValue, TConfig> MutableHashTable<TKey, TValue> for BytellHashTable<TKey, TValue, TConfig>
where TKey: Eq + Hash, TConfig: BytellConfig,

Source§

fn insert(&mut self, key: TKey, value: TValue) -> Option<TValue>

Inserts given (TKey, TValue) pair into the table. Read more
Source§

fn remove_entry<Q>(&mut self, key: &Q) -> Option<(TKey, TValue)>
where TKey: Borrow<Q>, Q: Eq + Hash + ?Sized,

Removes entire entry from the table. Returns (TKey, TValue) pair for the matching key or None if there is no match. Read more
Source§

fn insert_or_update_with<FAdd, FUpdate>( &mut self, key: TKey, adder: FAdd, updater: FUpdate, ) -> &mut TValue
where FAdd: FnOnce() -> TValue, FUpdate: FnOnce(&mut TValue),

Searches the table for a given key. If the table contains it, then it runs updater on the corresponding TValue. Otherwise runs adder to add a new TValue to the table. Returns the mutable reference to the final TValue. Read more
Source§

fn iter_mut<'a>( &'a mut self, ) -> impl Iterator<Item = (&'a TKey, &'a mut TValue)>
where TKey: 'a, TValue: 'a, Self: 'a,

Returns a mutable iterator over the key-value pairs in the table. Read more
Source§

fn remove<Q>(&mut self, key: &Q) -> Option<TValue>
where TKey: Borrow<Q>, Q: Eq + Hash + ?Sized,

Removes entire entry from the table. Similar to MutableHashTable::remove_entry, but returns TValue only for the matching key or None if there is no match. Read more
Source§

fn get_or_insert_default(&mut self, key: TKey) -> &mut TValue
where TValue: Default,

Retrieves an existing TValue, or inserts a default one. Read more
Source§

fn get_or_insert(&mut self, key: TKey, value: TValue) -> &mut TValue

Retrieves an existing TValue, or inserts the passed one. Read more
Source§

impl<TKey, TValue, TConfig> PartialEq for BytellHashTable<TKey, TValue, TConfig>
where TKey: Eq + Hash, TValue: PartialEq, TConfig: BytellConfig,

Source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl<TKey, TValue, TConfig> ReprC for BytellHashTable<TKey, TValue, TConfig>
where TKey: Eq + Hash + ReprC, TValue: ReprC, TConfig: BytellConfig + ReprC,

Source§

const CHECK: ()

This field is used for const checks only.
Source§

impl<TKey, TValue, TConfig> Eq for BytellHashTable<TKey, TValue, TConfig>
where TKey: Eq + Hash, TValue: Eq, TConfig: BytellConfig,

Source§

impl<TKey, TValue, TConfig> Send for BytellHashTable<TKey, TValue, TConfig>
where TKey: Send + Eq + Hash, TValue: Send, TConfig: BytellConfig,

Source§

impl<TKey, TValue, TConfig> Sync for BytellHashTable<TKey, TValue, TConfig>
where TKey: Sync + Eq + Hash, TValue: Sync, TConfig: BytellConfig,

Auto Trait Implementations§

§

impl<TKey, TValue, TConfig> Freeze for BytellHashTable<TKey, TValue, TConfig>
where TConfig: Freeze,

§

impl<TKey, TValue, TConfig> RefUnwindSafe for BytellHashTable<TKey, TValue, TConfig>
where TConfig: RefUnwindSafe, TKey: RefUnwindSafe, TValue: RefUnwindSafe,

§

impl<TKey, TValue, TConfig> Unpin for BytellHashTable<TKey, TValue, TConfig>
where TConfig: Unpin, TKey: Unpin, TValue: Unpin,

§

impl<TKey, TValue, TConfig> UnsafeUnpin for BytellHashTable<TKey, TValue, TConfig>
where TConfig: UnsafeUnpin,

§

impl<TKey, TValue, TConfig> UnwindSafe for BytellHashTable<TKey, TValue, TConfig>
where TConfig: UnwindSafe, TKey: UnwindSafe, TValue: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.