F5F Stay Refreshed Hardware Desktop Exclusive insight: Apple M1 Single "Core" comparisons miss the mark (with reference points)

Exclusive insight: Apple M1 Single "Core" comparisons miss the mark (with reference points)

Exclusive insight: Apple M1 Single "Core" comparisons miss the mark (with reference points)

Pages (3): Previous 1 2 3
S
SniperGirlPro
Junior Member
14
02-26-2026, 10:06 AM
#21
S
SniperGirlPro
02-26-2026, 10:06 AM #21

I
iRaine
Posting Freak
800
02-26-2026, 12:02 PM
#22
This paper offers a clear overview of Tensor cores in Volta GPUs and explains NPUs in simpler terms. Key point to note: "multiply-accumulate" is another way to describe fused multiply–add (FMA).
I
iRaine
02-26-2026, 12:02 PM #22

This paper offers a clear overview of Tensor cores in Volta GPUs and explains NPUs in simpler terms. Key point to note: "multiply-accumulate" is another way to describe fused multiply–add (FMA).

B
Beder822
Member
80
02-26-2026, 02:02 PM
#23
Shifted to CPUs, motherboards and memory systems
B
Beder822
02-26-2026, 02:02 PM #23

Shifted to CPUs, motherboards and memory systems

S
saburo
Member
192
02-26-2026, 04:01 PM
#24
Looking for stronger references would help support your argument. I included a wiki link since it’s clearer to read than a formal paper, and sources are typically cited there. The diagram in Figure 3 demonstrates how a tensor core executes FMA operations, which aligns with the idea that tensor cores boost performance in deep learning tasks. If you’re unsure, you can reach out to NVIDIA and TensorFlow for clarification.
S
saburo
02-26-2026, 04:01 PM #24

Looking for stronger references would help support your argument. I included a wiki link since it’s clearer to read than a formal paper, and sources are typically cited there. The diagram in Figure 3 demonstrates how a tensor core executes FMA operations, which aligns with the idea that tensor cores boost performance in deep learning tasks. If you’re unsure, you can reach out to NVIDIA and TensorFlow for clarification.

E
Electra_Games
Junior Member
4
02-26-2026, 05:57 PM
#25
Additional instruction sets need special hardware to handle them. AVX-512 provides dedicated registers and ALUs for these operations. While it works, the VNNI part of AVX-512 performs the same calculations efficiently. Since GPUs are bigger and process more data per unit area than CPUs, the size of a VNNI module remains minimal compared to a full GPU. Also, GPUs face constraints from VRAM, making them less suitable when large amounts of RAM are required for training. CPUs become more practical if you can afford other dedicated accelerators like TPUs. I’m not sure how this fits into the bigger picture, but do you have any experience with model training or CPU design? (you can safely ignore this question)
E
Electra_Games
02-26-2026, 05:57 PM #25

Additional instruction sets need special hardware to handle them. AVX-512 provides dedicated registers and ALUs for these operations. While it works, the VNNI part of AVX-512 performs the same calculations efficiently. Since GPUs are bigger and process more data per unit area than CPUs, the size of a VNNI module remains minimal compared to a full GPU. Also, GPUs face constraints from VRAM, making them less suitable when large amounts of RAM are required for training. CPUs become more practical if you can afford other dedicated accelerators like TPUs. I’m not sure how this fits into the bigger picture, but do you have any experience with model training or CPU design? (you can safely ignore this question)

Pages (3): Previous 1 2 3