F5F Stay Refreshed Software PC Gaming Analyzing Per Thread CPU usage in games

Analyzing Per Thread CPU usage in games

Analyzing Per Thread CPU usage in games

D
DreamDragon
Member
201
04-23-2016, 04:15 PM
#1
Alternative Title
: Why single core performance still matters
A question popped up in my head recently: does Windows have some tool or utility to monitor CPU time of an applications threads? The primary reason to ask this question and find out if there's a way to see a trend in games on their thread utilization. Some of us here claim that despite consumers having access to 8+ cores at a relatively affordable price, games still rely on a small number of threads and hence, all those cores really don't matter and what matters is the single core performance. In other words, games haven't been "properly" multithreaded. Of course, rather than just make that claim, why not put, at least, my money where my mouth is?
One concern is this sort of deep dive tends to be limited to development environments, but Windows does keep tabs on threads so maybe there was hope. And there was!
Process Explorer
can show CPU time per thread if you double click on the application and select the "Threads" tab. Though the metric of importance is "Cycles Delta", but it's not enabled by default. Why is this important? Because it tells you how many cycles the thread was on the CPU since the last report. If this value floats around a specific range, it's indicative of how busy it tends to be over time. There's also another way through Performance Monitor, but setting that up is a pain.
So I ran a few games to see how their per-thread usage looks like. Most of them I just let it sit there so this isn't indicative of say when things get busy, but I think it's useful to provide a baseline at least. The games I ran:
Black Mesa Source: I just wanted a Source Engine game and this was installed
Cities Skyline: A popular simulation game, so it'd be interesting to see its pre-thread usage
Call of Duty: Modern Warfare 2 (2022). Spent a few minutes in one of the Single Player levels
Cyberpunk 2077: Just to throw an open world modern game in there
F1 2019: Another simulation game
Final Fantasy XIV: An MMORPG
Outer Worlds: I wanted an Unreal Engine 4 game, and this is one I happened to have installed
Quake 2 RTX: It's built off the original Quake 2 game, but otherwise this is to see what a really old game look like
Resident Evil 4 Chainsaw Demo: Also just wanted to throw some semblance of a modern game
Stray: Also another Unreal Engine 4 game
Spoiler:
Black Mesa Source
This is going to be a common thing: despite there being a ton of threads spawned, a good number of them are sitting there doing nothing. In any case, you can see that three threads dominate the CPU time. The most I can gather what "tier0.dll" is is it's tied to Source games, which I'm guessing because of how Source games actually work, is a server that's running the game logic. The other two represent the main game executable and the NVIDIA driver.
Spoiler:
Cities Skylines
Cities Skylines has been claimed by others that it relies heavily on a single thread to do everything. And this pretty much proves it. The map I used was
https://steamcommunity.com/workshop/file...=785933283
, which is a highly populated map. As a point of comparison, here's the usage from a map that barely has anyone in it.
Spoiler:
Call of Duty: Modern Warfare 2 (2022)
For this one, about 5 threads dominate the CPU time, with one of them showing a higher usage than the rest. Unfortunately the "Start Address" doesn't seem to point to anything useful, so it's hard to tell what's what.
Spoiler:
Cyberpunk 2077
So in this one I wanted to try something different. Run the game at RT Ultra settings at 1440p, then drop it down to Low quality settings at 720p and see the difference.
So here's the usage at 720p with low quality settings
And here it is as 1440p RT Ultra quality
So the one interesting to note is that 11 threads dropped in their CPU time. Given that Cyberpunk 2077 is a DX12 game, it's possible these are threads for rendering graphics.
Either way, this illustrates why lowering the resolution puts more strain on the CPU.
Spoiler:
F1 2019
I took this while running the benchmark, which simulates a race.
The interesting thing is there's 8 threads running fairly evenly. This makes me think these are for the AI racers (even though there's 20 total racers or so). I also ran this in DX11, which would explain the busy driver thread
Spoiler:
Final Fantasy XIV
I took readings from several scenarios this time:
720p High Desktop quality, non busy area
1440p High Desktop quality, non busy area
1440p High desktop quality with a 120FPS cap, non-busy area
1440p High Desktop, Limsa Lominsa Aetheryte plaza (one of the busiest areas in game)
Strangely enough, the game's main thread increases in activity at lower resolution while the driver activity decreases. Either way, these two dominate the CPU usage. The third one is likely the network handler or something related to it since the main thread didn't jump up.
Spoiler:
Outer Worlds
So this is where things get interesting. Unreal Engine 4 games (the other game I tried also does this) seem to spawn a ton of threads that do a lot of work. However, it's also clear that there's still one thread that dominates the entire game.
Spoiler:
Quake 2 RTX
It's funny that the driver thread completely over takes the main game thread. I figured this was going to happen, but I thought it'd be interesting to throw it in anyway
Spoiler:
Resident Evil 4: Chainsaw Demo
Two threads still dominate this game, but there seems to be quite a handful that gets a non-trivial amount of work.
Spoiler:
Stray
I ran this with the -dx12 option on. But similar to Outer Worlds, a ton of threads got spawned and they seem to be doing something. But in the end, two threads dominate the game
Link to the album:
https://imgur.com/a/e5SszIt
View: https://imgur.com/a/e5SszIt
Conclusion
So basically, all of these games have typically one or two threads that tend to be busy all the time, so much more than the other threads they spawn. And note just because there are other threads the game spawned doesn't mean they actually are ready to run. You can't look at one of the Unreal Engine 4 games and go "look! it spawned 30+ threads, clearly this should run great on an Threadripper!", because those threads aren't running all at the same time. And depending on the order of how these threads run, it's conceivable that you can get similar performance on a CPU with fewer cores/threads because you could just run them back to back, taking the same amount of time as one of the threads that took a lot more CPU time.
In any case, this is why single threaded performance in games is still important. Games are still designed in a way that there's not a whole lot of work to do at once, and things tend to be shoved into a single thread.
D
DreamDragon
04-23-2016, 04:15 PM #1

Alternative Title
: Why single core performance still matters
A question popped up in my head recently: does Windows have some tool or utility to monitor CPU time of an applications threads? The primary reason to ask this question and find out if there's a way to see a trend in games on their thread utilization. Some of us here claim that despite consumers having access to 8+ cores at a relatively affordable price, games still rely on a small number of threads and hence, all those cores really don't matter and what matters is the single core performance. In other words, games haven't been "properly" multithreaded. Of course, rather than just make that claim, why not put, at least, my money where my mouth is?
One concern is this sort of deep dive tends to be limited to development environments, but Windows does keep tabs on threads so maybe there was hope. And there was!
Process Explorer
can show CPU time per thread if you double click on the application and select the "Threads" tab. Though the metric of importance is "Cycles Delta", but it's not enabled by default. Why is this important? Because it tells you how many cycles the thread was on the CPU since the last report. If this value floats around a specific range, it's indicative of how busy it tends to be over time. There's also another way through Performance Monitor, but setting that up is a pain.
So I ran a few games to see how their per-thread usage looks like. Most of them I just let it sit there so this isn't indicative of say when things get busy, but I think it's useful to provide a baseline at least. The games I ran:
Black Mesa Source: I just wanted a Source Engine game and this was installed
Cities Skyline: A popular simulation game, so it'd be interesting to see its pre-thread usage
Call of Duty: Modern Warfare 2 (2022). Spent a few minutes in one of the Single Player levels
Cyberpunk 2077: Just to throw an open world modern game in there
F1 2019: Another simulation game
Final Fantasy XIV: An MMORPG
Outer Worlds: I wanted an Unreal Engine 4 game, and this is one I happened to have installed
Quake 2 RTX: It's built off the original Quake 2 game, but otherwise this is to see what a really old game look like
Resident Evil 4 Chainsaw Demo: Also just wanted to throw some semblance of a modern game
Stray: Also another Unreal Engine 4 game
Spoiler:
Black Mesa Source
This is going to be a common thing: despite there being a ton of threads spawned, a good number of them are sitting there doing nothing. In any case, you can see that three threads dominate the CPU time. The most I can gather what "tier0.dll" is is it's tied to Source games, which I'm guessing because of how Source games actually work, is a server that's running the game logic. The other two represent the main game executable and the NVIDIA driver.
Spoiler:
Cities Skylines
Cities Skylines has been claimed by others that it relies heavily on a single thread to do everything. And this pretty much proves it. The map I used was
https://steamcommunity.com/workshop/file...=785933283
, which is a highly populated map. As a point of comparison, here's the usage from a map that barely has anyone in it.
Spoiler:
Call of Duty: Modern Warfare 2 (2022)
For this one, about 5 threads dominate the CPU time, with one of them showing a higher usage than the rest. Unfortunately the "Start Address" doesn't seem to point to anything useful, so it's hard to tell what's what.
Spoiler:
Cyberpunk 2077
So in this one I wanted to try something different. Run the game at RT Ultra settings at 1440p, then drop it down to Low quality settings at 720p and see the difference.
So here's the usage at 720p with low quality settings
And here it is as 1440p RT Ultra quality
So the one interesting to note is that 11 threads dropped in their CPU time. Given that Cyberpunk 2077 is a DX12 game, it's possible these are threads for rendering graphics.
Either way, this illustrates why lowering the resolution puts more strain on the CPU.
Spoiler:
F1 2019
I took this while running the benchmark, which simulates a race.
The interesting thing is there's 8 threads running fairly evenly. This makes me think these are for the AI racers (even though there's 20 total racers or so). I also ran this in DX11, which would explain the busy driver thread
Spoiler:
Final Fantasy XIV
I took readings from several scenarios this time:
720p High Desktop quality, non busy area
1440p High Desktop quality, non busy area
1440p High desktop quality with a 120FPS cap, non-busy area
1440p High Desktop, Limsa Lominsa Aetheryte plaza (one of the busiest areas in game)
Strangely enough, the game's main thread increases in activity at lower resolution while the driver activity decreases. Either way, these two dominate the CPU usage. The third one is likely the network handler or something related to it since the main thread didn't jump up.
Spoiler:
Outer Worlds
So this is where things get interesting. Unreal Engine 4 games (the other game I tried also does this) seem to spawn a ton of threads that do a lot of work. However, it's also clear that there's still one thread that dominates the entire game.
Spoiler:
Quake 2 RTX
It's funny that the driver thread completely over takes the main game thread. I figured this was going to happen, but I thought it'd be interesting to throw it in anyway
Spoiler:
Resident Evil 4: Chainsaw Demo
Two threads still dominate this game, but there seems to be quite a handful that gets a non-trivial amount of work.
Spoiler:
Stray
I ran this with the -dx12 option on. But similar to Outer Worlds, a ton of threads got spawned and they seem to be doing something. But in the end, two threads dominate the game
Link to the album:
https://imgur.com/a/e5SszIt
View: https://imgur.com/a/e5SszIt
Conclusion
So basically, all of these games have typically one or two threads that tend to be busy all the time, so much more than the other threads they spawn. And note just because there are other threads the game spawned doesn't mean they actually are ready to run. You can't look at one of the Unreal Engine 4 games and go "look! it spawned 30+ threads, clearly this should run great on an Threadripper!", because those threads aren't running all at the same time. And depending on the order of how these threads run, it's conceivable that you can get similar performance on a CPU with fewer cores/threads because you could just run them back to back, taking the same amount of time as one of the threads that took a lot more CPU time.
In any case, this is why single threaded performance in games is still important. Games are still designed in a way that there's not a whole lot of work to do at once, and things tend to be shoved into a single thread.

B
boom1shot
Member
127
04-23-2016, 04:44 PM
#2
when examining delta cycles, only certain cities use 4GHz on their single-core CPU, while the rest have available capacity for other games.
B
boom1shot
04-23-2016, 04:44 PM #2

when examining delta cycles, only certain cities use 4GHz on their single-core CPU, while the rest have available capacity for other games.

_
_SIRENDER_
Member
146
04-23-2016, 08:51 PM
#3
That's not how it works.
If I did an estimated cycles count on Cities Skylines, there was about 15,561,014,000 cycles. If we looked at say Cyberpunk 2077 in 1440p, it had 22,994,165,000 cycles. If the thread had a "Cycles Delta" value, it ran during the last sampling period. So
all
of those threads are going to run anyway.
_
_SIRENDER_
04-23-2016, 08:51 PM #3

That's not how it works.
If I did an estimated cycles count on Cities Skylines, there was about 15,561,014,000 cycles. If we looked at say Cyberpunk 2077 in 1440p, it had 22,994,165,000 cycles. If the thread had a "Cycles Delta" value, it ran during the last sampling period. So
all
of those threads are going to run anyway.

M
mjt2789
Senior Member
483
04-23-2016, 09:20 PM
#4
I believe he wants to emphasize not pushing your single thread speed too high, because it doesn’t really matter as you’re essentially wasting it.
Games are designed for consoles with much weaker processors than top-end CPUs.
Counting cycles alone is problematic since a single cycle can carry very little work depending on the task.
Even if many cycles occur at low workloads, they might not generate much output compared to fewer cycles with higher demands.
You can observe in this example that 12584 has an extremely high cycle count but doesn’t contribute much (just the main game loop?).
Intel offers a tool called pcm-core that displays instructions per cycle for each core individually and as IPC.
M
mjt2789
04-23-2016, 09:20 PM #4

I believe he wants to emphasize not pushing your single thread speed too high, because it doesn’t really matter as you’re essentially wasting it.
Games are designed for consoles with much weaker processors than top-end CPUs.
Counting cycles alone is problematic since a single cycle can carry very little work depending on the task.
Even if many cycles occur at low workloads, they might not generate much output compared to fewer cycles with higher demands.
You can observe in this example that 12584 has an extremely high cycle count but doesn’t contribute much (just the main game loop?).
Intel offers a tool called pcm-core that displays instructions per cycle for each core individually and as IPC.

H
HaZe_Hella
Junior Member
18
04-23-2016, 09:35 PM
#5
This discussion focuses on understanding how CPU cycles are distributed across threads in games. It highlights the importance of analyzing thread activity rather than just overall performance numbers. The goal is to determine whether gameplay tends to concentrate work into a few threads instead of being evenly spread. The explanation emphasizes the need for per-thread data, as system-wide metrics can be misleading and process-level details don't fully address the user's concern.
H
HaZe_Hella
04-23-2016, 09:35 PM #5

This discussion focuses on understanding how CPU cycles are distributed across threads in games. It highlights the importance of analyzing thread activity rather than just overall performance numbers. The goal is to determine whether gameplay tends to concentrate work into a few threads instead of being evenly spread. The explanation emphasizes the need for per-thread data, as system-wide metrics can be misleading and process-level details don't fully address the user's concern.

J
Jake_TheDoge
Member
207
04-28-2016, 09:29 AM
#6
Yes, that's the main concern. You demonstrate that certain threads operate more frequently than others, yet you fail to present the critical threshold where performance begins to decline. To effectively argue for the importance of single-core performance, it's essential to illustrate the FPS loss when available cycles drop or clock speeds decrease.

This concept is highlighted by how HTT optimizes performance—showing a CPU capable of handling multiple tasks simultaneously on the same core without needing to execute four operations per cycle. Cycles alone aren't sufficient as a measure because they don't reflect the actual number of instructions processed each cycle.

In short, to emphasize why single-core matters, you must prove that insufficient cycles or lower clock speeds directly impact available performance, making it less significant compared to multi-threaded systems.
J
Jake_TheDoge
04-28-2016, 09:29 AM #6

Yes, that's the main concern. You demonstrate that certain threads operate more frequently than others, yet you fail to present the critical threshold where performance begins to decline. To effectively argue for the importance of single-core performance, it's essential to illustrate the FPS loss when available cycles drop or clock speeds decrease.

This concept is highlighted by how HTT optimizes performance—showing a CPU capable of handling multiple tasks simultaneously on the same core without needing to execute four operations per cycle. Cycles alone aren't sufficient as a measure because they don't reflect the actual number of instructions processed each cycle.

In short, to emphasize why single-core matters, you must prove that insufficient cycles or lower clock speeds directly impact available performance, making it less significant compared to multi-threaded systems.

P
PERKSIE
Junior Member
48
05-06-2016, 02:41 AM
#7
And discovering that connection would only truly be relevant for the specific CPU I'm using. Cycle counting still offers a general insight into the workload of a thread, irrespective of microarchitecture, since a higher number suggests more tasks were performed.
Running instructions per cycle remains an unhelpful measure, as the actual operations executed on the processor at any moment are largely unpredictable. Benchmarking repeatedly would yield significantly different results for each cycle and run. Consistent data on instructions per cycle is only possible if the application operates on a single thread and no other processes are active on the system.
The total number of cycles a thread dedicates to a CPU still holds value, not because it indicates meaningful work, but since the aim is to identify which threads are consuming resources. It's irrelevant whether the thread is performing meaningful tasks or not; as long as it occupies the CPU during that cycle, other threads cannot run.
This approach doesn't depend on external inputs from users or other devices. The metric remains useful regardless of whether a user interacts with the system or not.
The purpose behind this analysis is to determine which threads are actively engaged. The goal isn't to measure actual productivity but to understand CPU utilization. Even if no significant work is being done, occupying the CPU still matters for overall performance evaluation.
P
PERKSIE
05-06-2016, 02:41 AM #7

And discovering that connection would only truly be relevant for the specific CPU I'm using. Cycle counting still offers a general insight into the workload of a thread, irrespective of microarchitecture, since a higher number suggests more tasks were performed.
Running instructions per cycle remains an unhelpful measure, as the actual operations executed on the processor at any moment are largely unpredictable. Benchmarking repeatedly would yield significantly different results for each cycle and run. Consistent data on instructions per cycle is only possible if the application operates on a single thread and no other processes are active on the system.
The total number of cycles a thread dedicates to a CPU still holds value, not because it indicates meaningful work, but since the aim is to identify which threads are consuming resources. It's irrelevant whether the thread is performing meaningful tasks or not; as long as it occupies the CPU during that cycle, other threads cannot run.
This approach doesn't depend on external inputs from users or other devices. The metric remains useful regardless of whether a user interacts with the system or not.
The purpose behind this analysis is to determine which threads are actively engaged. The goal isn't to measure actual productivity but to understand CPU utilization. Even if no significant work is being done, occupying the CPU still matters for overall performance evaluation.

C
chase2694
Member
127
05-06-2016, 02:23 PM
#8
But what you're demonstrating is that individual cores are essential—and I believe no one has ever disputed that point.
Single-core efficiency is crucial, as it highlights the necessity of dedicated (individual) processing units.
Indeed, there are components that handle threads and some operate more intensely... that's the main focus here.
It's widely understood that no game runs flawlessly in a single-threaded mode, and the last instance of a perfectly single-threaded title was in dos times.
If you're aiming to prove the importance of single-core performance, look for a highly multithreaded application and test it under high clock speeds with all cores active versus low clock speeds with fewer cores—this will clearly demonstrate that single-core speed still boosts FPS.
However, it's also worth noting that a game running on more cores at lower clock rates will perform better than one with fewer cores under the same conditions.
C
chase2694
05-06-2016, 02:23 PM #8

But what you're demonstrating is that individual cores are essential—and I believe no one has ever disputed that point.
Single-core efficiency is crucial, as it highlights the necessity of dedicated (individual) processing units.
Indeed, there are components that handle threads and some operate more intensely... that's the main focus here.
It's widely understood that no game runs flawlessly in a single-threaded mode, and the last instance of a perfectly single-threaded title was in dos times.
If you're aiming to prove the importance of single-core performance, look for a highly multithreaded application and test it under high clock speeds with all cores active versus low clock speeds with fewer cores—this will clearly demonstrate that single-core speed still boosts FPS.
However, it's also worth noting that a game running on more cores at lower clock rates will perform better than one with fewer cores under the same conditions.

D
Deadson17
Junior Member
1
05-06-2016, 09:00 PM
#9
A great conversation about this topic.
In any application or game with multiple threads, there is usually one main thread that oversees and sends tasks to others.
Refer to "Amdahl's law" for more details.
D
Deadson17
05-06-2016, 09:00 PM #9

A great conversation about this topic.
In any application or game with multiple threads, there is usually one main thread that oversees and sends tasks to others.
Refer to "Amdahl's law" for more details.