It seems a lot of packages are using the autospec option floop-unroll (Sign in to GitHub · GitHub), which adds “-mno-vzeroupper”, even when it is not clear that those packages benefit from avoiding emitting VZEROUPPER. It is compiling many packages under the x86-64-v3 target which correlates with Haswell and beyond with AVX2. It seems that Intel still recommends emitting VZEROUPPER, even on Skylake and its recent architecture revisions (Solved: What is the status of VZEROUPPER use? - Intel Communities), to avoid transition penalties between AVX and non VEX SSE. Some of those packages probably mix a lot of AVX and old SSE code and that could create negative results for the x86-64-v3 target. Is there any reason not to trust the compiler in some of those cases? Or maybe I’m misunderstanding something
So few specific things:
- Any binary compiled for v3 will not have any SSE encoded code in it (the assembler will upgrade them to vex encoding)
- Many many of our core libraries are shipped as v3 today (and growing)
- the compiler assumes a v1 or v2 universe with a bit of v3; but our universe is a v3 universe with a bit of legacy v2… so for performance sensitive packages we tell the compiler it’s in a v3 universe.
- quite a few still-v2 libraries aren’t actually using SSE vectorization so don’t matter
so by and large we’re on the path to be a v3 universe (and growing).
If we find a place where it matters, rather than putting vzeroupper back, we’re more likely switching the laggard to v3.
Have you lost your account or is this impersonating our Arjan?
quite a few still-v2 libraries aren’t actually using SSE vectorization so don’t matter
Thanks for the reply @arjan. Indeed, what you’ve said here is true in the CL world.
I was curious, so I’ve used Intel’s SDE (Intel® Software Development Emulator) to check for non-VEXed SSE and AVX transitions in some of the few v2-v3 library interfaces… Unfortunately I accidently deleted my folder with all the scripts and testing results before checking if they were synced to github . It didn’t find any transition where VZEROUPPER was necessary.
TL,DR: These non-x86-64-v3 libraries aren’t using any SSE vectorization and the CL universe is increasingly fully v3.
This is indeed our Arjan