Exploiting Intel optane persistent memory for full text search

Abstract

In our information-driven societies, full-text search is ubiquitous. Search is memory-intensive. Quickly searching massive corpora requires building indices, which consumes big volatile heaps. Search is storage I/O-intensive. Limited main memory necessitates writing large partial indices on non-volatile storage, where they finally live in merged form. These indices reside in memory, in full or in part, during query evaluation. Memory and I/O intensity make it hard to index and search content rapidly and efficiently. On the hardware side, the recently introduced Intel Optane DC persistent memory (PM) offers byte-addressability, high capacity, and non-volatility. This paper evaluates and exploits Optane PM for text indexing and search on multicore platforms. We identify essential structures in inverted indices (hash table, merge tree, and key-value store), where they reside (memory or storage), and key operations over them (sort, flush, and merge). We allocate index structures in DRAM, Optane PM, and block storage by modifying an existing search engine. We then evaluate a myriad of hybrid memory and storage configurations. Our findings include: (1) careful placement of index structures across DRAM, Optane PM, and SSD, speeds up indexing with a single core compared to a high-performance baseline, but does not scale to many cores, (2) crash-consistent indexing with Optane PM is feasible without incurring a high overhead, and (3) the tail latency of the longest multi-term conjunctive queries is lower with a PM-backed index than an SSD-backed one. This paper opens up persistent memory to a practical role in full-text search.

Keywords

This publication has 41 references indexed in Scilit:

Mojim
Published by Association for Computing Machinery (ACM) ,2015
Few-to-Many
ACM SIGPLAN Notices, 2015
Incremental Text Indexing for Fast Disk-Based Search
ACM Transactions on the Web, 2014
Fast candidate generation for real-time tweet search with bloom filter chains
ACM Transactions on Information Systems, 2013
Pollux
Published by Association for Computing Machinery (ACM) ,2013
Mnemosyne
ACM SIGPLAN Notices, 2011
NV-Heaps
ACM SIGARCH Computer Architecture News, 2011
Wake up and smell the coffee
Communications of the ACM, 2008
Inverted files for text search engines
ACM Computing Surveys, 2006
Building a distributed full-text index for the web
ACM Transactions on Information Systems, 2001

Cited by 3 articles