Research

With the burgeoning demand for real-time data processing, the imperative to construct large-scale memory storage systems has become paramount. In light of this, the data center infrastructure is actively exploring innovative storage and network technologies, exemplified by byte-addressable non-volatile memory from Intel (e.g., Optane) and Samsung (CMM-H), and high-speed RDMA interconnections.
My research focus on developing efficient and robust next-generation storage systems utilizing these new hardware devices. I follow three fundamental design principles – CPU-awareness, device-awareness, and workload-awareness – essential for achieving optimal device bandwidth utilization, minimal CPU overhead, and predictable high performance.

Networked Memory Architecture

2017 - 2020

As the storage and network technologies evolve rapidly, the CPU performance remains comparatively stagnant as Moore’s law slows in the past years. Due to this reason, the CPU running with the heavy-weight storage software can easily become the bottleneck. We tackle this problem from various aspects.

CPU-efficient IO Engine

2020 - 2024

Purely reducing the overhead of storage software is still not enough; system designers must be also device-aware since the emerging hardware typically exhibits bizarre performance behavior. For example, NVM has asymmetric read/write performance, device-level IO amplification, and performance variability; RDMA shows limited scalability due to the device-level cache thrashing. In this context, I have designed:

Low Tail Latency Concurrency Control

2020 - 2024

Apart from seeking higher throughput and lower latency, datacenter applications also require their performance to be predictable (often defined as 99th or 99.9th percentile latencies). Latency variability can arise for many reasons, including sharing resources (e.g., CPU cores, caches, memory bandwidth, etc.), background activities, queuing, and others. In the past years, we have witnessed an active line of research work that improves performance predictability at different layers, but they ignore the fact that the workload is another source of incurring latency spikes due to request conflicts. Here, I take a much deeper dive to the concurrency protocol design with the workload-aware principle in mind.