Thanks to a Linux kernel patch that could be backported to the various stable series, highly threaded software running under CFS quotas to enforce CPU limits is about to be much faster. At least in a synthetic test case, the kernel patch improves performance by 30 times.
Spotted by the Kubernetes community but affecting others with highly threaded workloads and using CFS quota to restrict shared CPU resources, it turns out that highly threaded applications generally don’t get ‘their fair share’ of the CPU. , resulting in lower expected performance and higher latency.
This is a known bug for over a year and a kernel bug report on the unexpected CFS limitation since late 2017. The issue would have been recently fixed for the Linux 5.4 mainline and pending for back-ports after the patch has been circulating around the kernel mailing list for a few months.
There is the fix for a few dozen lines of code that removes the expiration of local CPU slices:
It has been observed that highly threaded, non-CPU bound applications running under cpu.cfs_quota_us constraints can achieve a high percentage of throttled times while not consuming the allocated quota amount. This use case is typical of interactive non-processor-related applications, such as those running in kubernetes or mesos when running on multiple processor cores.
This dramatically improves the performance of high-threaded, non-CPU-bound applications with a low cfs_quota_us allocation on high-core machines. In the case of an artificial test case (10ms / 100ms quota on an 80-CPU machine), this commit resulted in an almost 30-fold performance improvement, while maintaining correct CPU quota restrictions. .
Thanks to Phoronix reader Mark for reporting this recent kernel change.