面向持久内存的高效写优化哈希索引方案

摘要

NVM non-volatile memory 因为 data consistency and hardware limitations of NVM,传统为 DRAM 设计的索引技术不适用 inefficient,为了高效索引 NVM 上的数据,本论文提出一种写优化的高性能哈希索引技术 write-optimized and high-performance hashing index,称为 level hashing

  • write-optimized
  • high-performance hashing
  • low-overhead consistency guarantee
  • and cost-efficient resizing.
  • 为了保证 consistency with low overhead

    • level hashing leverages log-free consistency schemes for insertion, deletion, and resizing operations, and an opportunistic log-free scheme for update operation.
  • To cost-efficiently resize this hash table,

    • level hashing leverages an in- place resizing scheme that only needs to rehash 1/3 of buckets instead of the entire table, thus significantly reducing the number of rehashed buckets and improving the resizing performance.

2. Background and Motivation

2.1 Data Consistency in NVM

  • CPU cache lines
  • CPU and memory controller may reorder memory writes.
  • cache line flush instruction (CLFLUSH for short) / memory fence instruction (MFENCE for short), to ensure the ordering of memory writes
  • atomic memory write of NVM is 8 bytes, which is equal to the memory bus width for 64-bit CPUs
  • Existing techniques, such as logging and copy-on-write (CoW), are used to guarantee consistency of the data whose sizes are larger than an atomic-write size.

    • The logging technique first stores the old data (undo logging) or new data (redo logging) into a log and then updates the old data in place.
    • The CoW first creates a new copy of data and then performs updates on the copy. The pointers that point to the old data are finally modified.
    • Nevertheless, logging and CoW have to write twice for each updated data.

Code

RECIPE : high-performance, concurrent indexes for persistent memory (SOSP 2019)