The AMD PC stops working when idle, then stops posting but runs fine under stress.
The AMD PC stops working when idle, then stops posting but runs fine under stress.
Hello everyone,
I'm reaching out about a rather unusual problem I've been experiencing. Despite having built PCs for 15 years, this issue has never happened before.
What's been happening:
My computer will occasionally restart after sitting idle for 30 minutes or longer. Once it powers back on, it enters a POST state with the GPU debug light illuminated on the motherboard. From there, the system won't boot until I press the shutdown button for a long time. Pressing reboot doesn't help—it just attempts a post-check and stays with the GPU light on. During gaming, AI tasks, or rendering, everything works perfectly; there have been no crashes in those situations. It only occurs when I leave it running and go for lunch or something similar, usually with a browser open, Steam, or VS Code—nothing that would heavily stress the CPU or GPU.
I'm wondering if this might be related to memory issues, given that all these applications consume a lot of RAM. Still, even when idle, the crashes are rare, occurring only about 96% of the time.
Errors in EventViewer before the crash:
I mostly encounter these two errors around the crash, but they don't seem very useful right now:
- Event 41: The system rebooted without a clean shutdown. This could be due to the system failing to respond, crashing, or losing power unexpectedly.
- This error appears after the crash, when I finally press the shutdown button and the system manages to post.
- Event 6008: The previous shutdown on June 29, 2024, at 8:31:26 AM was unexpected. The last time this happened wasn't helpful.
- I notice an error in Event 7000: The AMDRyzenMasterDriverV20 service failed to start because it couldn't create a file that already existed.
- I also have this error almost every day in the event log, and it's rare that the post-failure error occurs.
We're dealing with an all-AMD build. Here are the specifications:
- Ryzen 7900x3d (factory settings)
- Asrock PG riptide b650m motherboard
- 2x 16 GB Patriot VIPER VENOM RGB DDR5-6200 DIMM CL40-40-40-76 Dual Kit
- 240 AIO water cooler
- Radeon RX 7900xt (base model from POWERCOLOR)
- 1000 Watt Corsair RM1000e Modular 80+ Gold PSU
- Possibly 2 HDDs, a 2.5 SSD, and two NVMe SSDs connected, with around 4 case fans.
What I've tried so far:
- Switched from a 750W Endorfy PSU to the current Corsair model—no changes.
- Before the 7900x3d, I had a Ryzen 7700 that experienced this issue much more often. It started infrequently but became more frequent over time. After about 6 months, it improved for a few months, then returned. I upgraded the CPU, which seemed to resolve the problem temporarily, but it's reappearing now. AMD provided a replacement (RMA), but that's not relevant at the moment.
- Updated the BIOS; it's not the latest version, but with the 7700 I had the newest BIOS without any impact.
- The only customization on my system is the RAM, which runs at 6200 MT/s with EXPO (the advertised speed). With the 7700, I tested it at 6000 MT/s because the 6200 wasn't stable.
- I've disassembled and reassembled the PC multiple times, making sure everything was installed carefully—no changes made.
- Now the situation is frustrating. I'm not sure what's wrong. The CPU was new, almost brand new, and I bought it in February this year. It hasn't been overclocked.
- My other suspicion was the motherboard, but why did the problem disappear after upgrading the CPU for months?
- My next guess is it's the GPU or possibly the RAM, but both seem to work under load. I haven't reinstalled Windows yet; my PC is about 16 months old, and before these issues started, only the RAM and GPU were changed (the GPU swap was from AMD RX 6800 to RX 7900xt using DDU. I've tried DDU three times already, with GPU-reseats but no effect).
Any suggestions or assistance would be greatly appreciated.
Are you certain the mobile isn't adjusting settings excessively? This suggests the settings might be set to auto, and at low load voltages or amperage they become too high (since watts are low) and cause the system to fail. Alternatively, it could be the expo gradually wearing down the ram controller, which is integrated into the CPU.
I'm not sure if it's not pushing things too far. It's factory stock apart from the ram, but yeah maybe the ram settings are slowly killing the memory controller.
With the 7700 it seemed like a degradation issue, first crash was about after 6 months of use, then nothing for months, then by one year I had 1 crash/week.
Problem is I have no way of knowing if this is a degradation issue and I have no recourse but wait for it to get worse and then RMA the cpu. Then I can lower the memory speed with the next one.
It's also strange that after the crash the system would not post. I'm not sure what's going on there.
I have one update regarding the issue.
Recently, I received two to three driver updates for the GPU and encountered these driver timeout error messages. When did this happen? Exactly during idle time. It works fine under gaming loads, but if I leave the PC running for a few hours, I turn it back on and see the friendly error box from the AMD driver.
Based on this, I’m more inclined to think the GPU is the problem. It seems likely the GPU will need a replacement. Usually I’d wait for a driver fix, but with such a valuable GPU, it’s better to act now.