I’ve been benchmarking the performance of a 1TB sort between Clear Linux and Ubuntu server distributions. Clear Linux comes out ahead, but I have a few questions about why.
One difference I noticed is the sort generates 6711 minor page faults under CL, in contrast to 3166105 minor page faults under Ubuntu. I’m not used to thinking of minor page faults as a performance issue, but anytime I see a difference in the millions, that gets my attention.
Is this difference because CL uses a larger page size to hold code, a smaller memory footprint for the system libraries, or for some other reason?
Another interesting thing I noticed is the sort performance under CL shows a clear improvement (in terms of elapsed time) the more often I run the sort. It’s as if something about the sort is being cached in between consecutive runs. I don’t see this same performance improvement trend under Ubuntu.
The most obvious source of caching is the buffer cache, but before each sort, I’m using drop_caches to clear that out: sync; echo 3 | sudo tee /proc/sys/vm/drop_caches
. So this is making me wonder what other caching might cause this performance improvement.
The sort I’m running is the Nsort product from Ordinal Technologies. This is not an open source product, so I don’t know if it tries to use huge pages, but it does allocate quite a bit of memory. The CPU is an AMD Ryzen Embedded V3C14 with four cores and eight threads.