Benchmarking questions

I’ve been benchmarking the performance of a 1TB sort between Clear Linux and Ubuntu server distributions. Clear Linux comes out ahead, but I have a few questions about why.

One difference I noticed is the sort generates 6711 minor page faults under CL, in contrast to 3166105 minor page faults under Ubuntu. I’m not used to thinking of minor page faults as a performance issue, but anytime I see a difference in the millions, that gets my attention.

Is this difference because CL uses a larger page size to hold code, a smaller memory footprint for the system libraries, or for some other reason?

Another interesting thing I noticed is the sort performance under CL shows a clear improvement (in terms of elapsed time) the more often I run the sort. It’s as if something about the sort is being cached in between consecutive runs. I don’t see this same performance improvement trend under Ubuntu.

The most obvious source of caching is the buffer cache, but before each sort, I’m using drop_caches to clear that out: sync; echo 3 | sudo tee /proc/sys/vm/drop_caches. So this is making me wonder what other caching might cause this performance improvement.

The sort I’m running is the Nsort product from Ordinal Technologies. This is not an open source product, so I don’t know if it tries to use huge pages, but it does allocate quite a bit of memory. The CPU is an AMD Ryzen Embedded V3C14 with four cores and eight threads.

I’m starting to think minor page faults are just a side effect of my sort needing to allocate 12366M of memory, for both CL and Ubuntu. One reason I think this is because 4KB * 3,166,105 minor page faults is roughly 12,366MB for Ubuntu, and 2MB * 6711 minor pages faults is also roughly 12,366MB for CL. (Actually it’s 13,422 MB.)

Where CL has /sys/kernel/mm/transparent_hugepage/enabled set to [always] for Transparent Huge Pages, and Ubuntu has this set to [madvise].

My 1TB sort seems to benefit from having THP enabled, which is good to know.

Then I started to wonder: if 2MB THPs is good, would 1GB THPs be better? That led me down a rabbit hole of passing arcane parameters like default_hugepagesz=1G hugepagesz=1G hugepages=12 to my CL kernel, before sadly concluding that Linux doesn’t yet support 1GB transparent huge pages.

This issue of THP being limited to 2MB is not really spelled out very well in the kernel documentation, so I may be missing something.

Is libhugetlbfs available with Clear Linux?

I’m not finding libhugetlbfs with swupd search, so that makes me think I’d have to download it and compile it myself.

Is this because libhugetlbfs is deprecated in favor of THP?