Clear Linux OS Offering Performance Advantages Even With Low-Power IoT/Edge Hardware

Michael Larabel is used to test Clear Linux OS on high-end desktop and server hardware, however he gave a try to the UP Squared, a dual-core Celeron N3350, 2GB of RAM, 32GB of eMMC storage, and Intel HD Graphics 500. It’s quite a simple and low-end SBC by today’s standards but surprisingly when putting Clear Linux OS on it still there was more performance to be squeezed out compared to Ubuntu 19.04 or Fedora 30.

Clear Linux OS overall was about 20% faster than Fedora 30 and Ubuntu 19.04.

Continue reading at: https://www.phoronix.com/scan.php?page=article&item=up-squared-clear&num=1

2 Likes

Some of those tests are pretty bad and skew the average. Simply taking away the lame test (which is not representative of distribution performance!) reduces the gap to 10% alone.

Geometric Mean > Higher Is Better
Ubuntu 19.04 .......... 1.344
Fedora Workstation 30 . 1.352
Clear Linux 30880 ..... 1.485

Is there any interest in doing proper distro testing to show the real performance differences? This would highlight much larger improvements for CL in some tests and a lot more credible as the PTS tests are not really designed for distribution testing.

Phoronix has been performing these tests for so long, and it is mostly accepted by the Linux community that I’ve never thought about it. I will love to learn more about specific distro tests and if you have suggestions about how to do that in a different way, please, share with us. Thanks for the comment.

Proper benchmarking is extremely difficult and time consuming. Unfortunately the incentives are around mass producing benchmarks for ad views without any focus on quality.

The PTS benchmarks has 3 styles of benchmarks and mixes between them.

  1. Benchmarks that use distro provided packages.
  2. Benchmarks that use pre-compiled binaries.
  3. Benchmarks that download and compile a program (at a fixed version) locally to run, ignoring benefits of rolling release.

Each have their own challenges and are useful for different purposes. One would expect that a distro comparison would use type 1 benchmarks and possibly type 2 only for gaming (which are generally binary via steam). Many of the tests in the PTS are type 3.

Some simple issues with benchmarks can be:

  • Relying solely on a single test number without validation. Without benchmark validation it also makes it difficult to test different versions of software.
  • Interpreting results requires understanding the underlying benchmark to draw accurate conclusions. Oddities in the results also need to be investigated further.
  • Bias and weighting of the overall takeout results.

A few examples (not all were in this particular set but ones I’m familiar with, but the issues aren’t uncommon):
The zstd benchmark actually uses the system provided zstd. It makes no validation of compression size differences, only the time taken. So the best way to improve ‘performance’ on the test would be to patch zstd to make default compression the lowest and use maximum threads (both reduce compression, CL patches for maximum threads by default). Both of these aren’t really optimizations in the normal sense, but changes in default behaviour that are easily controlled by users on the command line. The commentary usually puts this down to CL’s higher compiler optimizations.

The lame benchmark downloads and compiles lame and encodes a file. The issue with this is that the build system provides no default flags to use. Not only does this compile programs with flags that don’t represent the distribution, but actually compiles it at the equivalent of -O0 (no optimization). CL avoids this by exporting the CFLAGS to the environment (+-fassociative-math which I’ve not seen used in the distribution). So the benchmark compares an unoptimized build on other distros to an optimized one on CL. Yet it has been used in the comparison tests for ages with CL. It makes no sense as a benchmark as 1. lame isn’t even included in CL. 2. The builds for comparison have no reflection on the performance of the distro. And as I said before, that one result adds 10% to the overall performance improvement.

The numpy benchmark includes the line shelllines = ['#!/bin/sh', 'export OMP_NUM_THREADS=1', 'cddirname $0'] + shelllines, fixing the thread count when using openmp. To get better results in the benchmark, not using openmp in BLAS can actually be beneficial to the result (and penalizes openmp use). So to improve ‘performance’ working around the benchmark is beneficial to improving PTS results.

Most of the encoder/compression tests download and compile the programs locally. This means that on all distributions they are compiled with -march=native and the defaults are usually -O2/3. Under valgrind this can show less than 2% of the benchmark time used in libraries that are shipped with the distribution. These cases bypass much of the optimization work done in CL with providing AVX2/AVX512 optimized versions and PGO packages. Most of these fell out of favour as they didn’t show much difference between distro’s despite there being one!

For improvement, better benchmarks that actually reflect the distro performance, validation (something I don’t think the PTS can do) and actual analysis of the results. I have previously thrown together some really ugly scripts that were able to run the benchmarks, store values for results and one for validation (say file size for compression) and do a run under valgrind for later analysis. When doing a flac benchmark with GCC vs clang, the valgrind output made it obvious that clang couldn’t hit the SSE4 and AVX optimized functions that GCC would (due to the style which they were written). I believe it was fixed in git, but it showed why GCC gave better results other than it ‘optimized’ better.

1 Like

@sunnyflunk your comments are all valid and I would definitely talk to the folks at phoronix.com about them. We do spend a lot of time ourselves to interpret and read all the numbers posted, and some of the things you’ve noted above are indeed an issue. Or maybe this is better addressed at openbenchmarking.org.

1 Like