How does Clear Linux use AutoFDO?

In short, it doesn’t for packages (as far as I’m aware). FDO (-fprofile-generate, -fprofile-use) is significantly easier and more sane when creating profile data and builds during packaging (and in theory more accurate).

Profiling is generally done when there is a representative workload available to generate the profile (and it generates a performance improvement). Here’s a non-comprehensive list of packages I’ve seen with full profiling.

bzip2
gcc
libjpeg-turbo
libjpeg-turbo-soname8
libxml2
lua
opencv
openssl
p7zip
php
pixman
python
python3
R
zlib
zstd

The more favored optimization is building an avx2/avxx512 optimized package and a non-avx2 optimized package which gets used based on what your CPU supports. This is easy to do as it doesn’t require a workload for profiling!

In terms of AutoFDO, if you can use FDO instead, do that. Otherwise, the GCC tutorial looks correct. GitHub - google/autofdo: AutoFDO is where you’ll find create_gcov. It will profile all parts of the program that are run, including libraries. It all needs to boil down to one file passed to GCC, so the profile workload needs to run all binaries (I think gcov can be run and merged, but it all adds extra complexity). Only the parts that GCC is compiling when -fauto-profile is passed that match up with use in the workload will be profiled.

Use differs a lot between PGO for a distribution and an individual use case. PGO for a distribution needs to be generic and broad coverage (particularly with FDO). Parts that aren’t run in the profile workload can be built to minimize size at the expense of spped to reduce the program size and cache misses. So if your program has A, B and C workloads, profiling only A should make A faster at the expense of B and C. Bad for a distribution. If users only cares about A (or 90% A), then the tradeoff is a big win.

For a singular use case, your workload is the reason you are thinking of using AutoFDO, you want to make something faster. From what I’ve seen, AutoFDO is used more for larger programs and where the profile is collected in a live (or mock live) environment.

1 Like