"-mno-vzeroupper" in many packages targeting x86-64-v3

It seems a lot of packages are using the autospec option floop-unroll (Sign in to GitHub · GitHub), which adds “-mno-vzeroupper”, even when it is not clear that those packages benefit from avoiding emitting VZEROUPPER. It is compiling many packages under the x86-64-v3 target which correlates with Haswell and beyond with AVX2. It seems that Intel still recommends emitting VZEROUPPER, even on Skylake and its recent architecture revisions (Solved: What is the status of VZEROUPPER use? - Intel Communities), to avoid transition penalties between AVX and non VEX SSE. Some of those packages probably mix a lot of AVX and old SSE code and that could create negative results for the x86-64-v3 target. Is there any reason not to trust the compiler in some of those cases? Or maybe I’m misunderstanding something :frowning:

2 Likes

So few specific things:

  • Any binary compiled for v3 will not have any SSE encoded code in it (the assembler will upgrade them to vex encoding)
  • Many many of our core libraries are shipped as v3 today (and growing)
  • the compiler assumes a v1 or v2 universe with a bit of v3; but our universe is a v3 universe with a bit of legacy v2… so for performance sensitive packages we tell the compiler it’s in a v3 universe.
  • quite a few still-v2 libraries aren’t actually using SSE vectorization so don’t matter

so by and large we’re on the path to be a v3 universe (and growing).
If we find a place where it matters, rather than putting vzeroupper back, we’re more likely switching the laggard to v3.

4 Likes

Have you lost your account or is this impersonating our Arjan?

quite a few still-v2 libraries aren’t actually using SSE vectorization so don’t matter

Thanks for the reply @arjan. Indeed, what you’ve said here is true in the CL world.

I was curious, so I’ve used Intel’s SDE (Intel® Software Development Emulator) to check for non-VEXed SSE and AVX transitions in some of the few v2-v3 library interfaces… Unfortunately I accidently deleted my folder with all the scripts and testing results before checking if they were synced to github :angry: . It didn’t find any transition where VZEROUPPER was necessary.

TL,DR: These non-x86-64-v3 libraries aren’t using any SSE vectorization and the CL universe is increasingly fully v3.

This is indeed our Arjan :slight_smile:

1 Like