Premium
A scalable lock on NUMA multicore
Author(s) -
Yi ZhengMing,
Yao YiPing
Publication year - 2020
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5964
Subject(s) - computer science , multi core processor , cache coherence , thread (computing) , scalability , parallel computing , lock (firearm) , shared memory , non uniform memory access , cache , distributed computing , operating system , cpu cache , cache coloring , cache algorithms , mechanical engineering , engineering
Summary Modern NUMA multicore architectures exhibit complicated memory behavior, such as cache coherence invalidation and nonuniform memory access where the access from a core to its local memory is significantly faster than crossnode access to memory on a different NUMA node. The complicated memory behavior has a large impact on the efficiency of locking synchronization, which affects the performance of parallel applications. Prior works offer several efficient designs to improve locking performance such as delegation schemes. However, the existing delegation schemes either occupy computing cores or provide nonscalable performance, or offer less portability. In this work, we present a NUMA‐aware delegation lock that occupies no cores while offering scalable performance under high contention for NUMA multicore machines. The new lock is a variant of an efficient FFWD lock, and inherits its performance features, such as buffering responses within a NUMA node to minimize cache coherence traffic. Unlike FFWD, the new lock employs hierarchical NUMA‐aware memory allocation and NUMA‐aware dynamic server thread technique, to reduce crossnode communication between client and server threads. Our evaluation shows that the new lock outperforms FFWD under high contention, achieving the significant performance gains when compared with other state‐of‐the‐art locks.