Comparison of hyperthreading in optimized versus non-optimized programs
Comparison of hyperthreading in optimized versus non-optimized programs
Hello. I've got a certain conundrum and met with very varying opinions on the subject. If I have a 6-core CPU that supports Hyperthreading, and an application that only supports paralleling its operation to 6 threads, will it make any difference if I turn hyperthreading off, so that it can use each individual core as its own entity, or will its performance be decreased because the application will only use 6 threads of the 12 available if I leave hyperthreading on? Some say that it will hurt performance because L1 and L2 caches will be divided for the 2 threads, but not otherwise. Others say it will slice the performance in half because the application would only use half of each core in this scenario. There are those who say that there is no appreciable difference. Can anyone shine some more detailed light on the subject? Thank you.
When you disable hyperthreading, the CPU usually runs at a higher frequency, which can improve performance on a single core and help with cache usage.
It varies by software and operating system. I perform compute tasks that typically don’t benefit much from SMT, and the best speed comes from running one thread per core. When comparing between the software and Windows, keeping one thread per core—even with SMT enabled—yields solid results. Occasionally, Windows might assign two threads to a single core, leaving another idle, which isn’t ideal and can hurt performance. Solutions include manually adjusting affinity settings or disabling SMT. I use a 12-core CPU for gaming, but something unusual is happening. Because games don’t reach 24 threads, I disabled SMT, resulting in a 12-core, 12-thread setup. Games now perform better since each thread gets the full core resources without contention. Keep in mind that SMT speeds up multiple tasks but doesn’t always accelerate one critical operation.
Core binding is essential. Align every thread of your program with each actual CPU if possible. If not, depend on the operating system scheduler. Verify that each physical core isn’t busy; it doesn’t matter which logical core holds a thread, as long as its physical counterpart is free.