Part 7. CPU SIMD Extension for X86

content by Xing Chen
If we rewind back to 1987, you will find the following two chips on one single motherboard inside your PC: One is the 386 CPU, the other one, on the right hand side, is the 387 Co-Processor. It’s used to help the CPU to perform Floating Point Arithmetic, called FPU, short for Floating Point Unit. This co-processor is connected to the system bus the same as the CPU! Attention, this is the first ever generation “Accelerator”! 😉
Well, that was in 80s! With the advancement of technology, nowadays each CPU has its own Floating Point Units, no exception.

https://upload.wikimedia.org/wikipedia/commons/2/27/80386with387.JPG
https://en.wikipedia.org/wiki/Intel_80387SX
But, you know, the HPC industry always wants more! Made possible by the advancement of technology, the X86 continues its efforts, AVX-512, the SIMD implementation was introduced by Intel and put on the Skylake CPU die, around 2017.
SIMD is the short of Single Instruction Multiple Data, as its name shows, with only one instruction from the CPU, the SIMD functional unit can perform the same operation on multiple data: think about two vectors, A & B, each with eight floating point numbers as elements (total 512bits), one floating point add instruction from the CPU will make the SIMD unit finish the following 8x operations at the same time:
[A1] + [B1] = [C1]
[A2] + [B2] = [C2]
[A3] + [B3] = [C3]
[A4] + [B4] = [C4]
[A5] + [B5] = [C5]
[A6] + [B6] = [C6]
[A7] + [B7] = [C7]
[A8] + [B8] = [C8]
Heavily increased the CPU floating point performance, welcomed by the HPC industry (or not?), there is one guy who doesn’t like it at all, and he is one of the just-formed X86 Ecosystem Advisory Group’s founding member, and his name is Linux Torvalds.

“I hope AVX-512 dies a painful death, and that Intel starts fixing real problems instead of trying to create magic instructions to then create benchmarks that they can look good on,” wrote Torvalds, according to ZDNet.
https://www.zdnet.com/article/linus-torvalds-i-hope-intels-avx-512-dies-a-painful-death
As mentioned in previous section “CISC vs RISC”, AVX-512 extension added ~300 more instruction to the X86 instruction set, and created inconsistency between Intel and AMD, Some guys believe this should be one of the fields the advisory group should look into, besides the other house keeping tasks such as the evolution to a purely 64-bit ISA. (https://videocardz.com/newz/intel-proposes-x86s-a-64-bit-only-architecture).
If this is true, then that is a good news and the advisory group can make AVX-512 great again, right?
https://www.theregister.com/2024/10/15/intel_amd_x86_future/
Sorry, Mr. Torvalds, the HPC industry needs this AVX-512, we can NOT let it die! It is going to be the HPC working horse!