New build experiences random crashes under low power conditions, yet functions properly during stress tests?
New build experiences random crashes under low power conditions, yet functions properly during stress tests?
PC Specs:
Mobo: MSI B550M PRO VDH
CPU: Ryzen 5 5600X
CPU Cooler: Thermaltake Air Cooler
GPU: Sapphire Pulse 9060XT 16GB
RAM: G. Skill Ripjaws F4-3600C19S-16GVRB (2x16=32GB)
PSU: Thermalright KG650 650W Gold Modular
SSD: Lexar NM620 256GB Gen 3
BIOS Version: 7C95v2N (latest)
BIOS Settings: Default
OS: Fedora Linux 43
I built a new system a week ago. Everything in the build is new and in warranty except for the RAM.
15 minutes in and the screen went black. The system stopped responding but the fans kept running fine. Did a hard shutdown and it started again picking up where it left off. Worked like this for two days crashing every 10-12 minutes. On the 3rd day, the problem evolved and the system won't POST after a crash but instead, get stuck in a restart loop with the VGA EZ Debug LED coming on.
The system boots normally on a cold start. Another accidental discovery I made was that when testing the system at night, when the temperature is cooler, the system requires 5-7 minutes cool off time to become usable after a crash or a proper restart. When testing the system at noon in relatively higher temperature, the system requires around 15 minutes of cool time to become usable again. If the system is started before this cool off time, it gets stuck in the restart loop and doesn't POST.
Usually, systems crash under stress, mine does the opposite. The system crashes when under normal use. I am using OCCT to stress test the system.
I have run a combined test of RAM and CPU for 30 minutes. The PC worked fine and there were no errors. I ran another RAM test for 30 minutes and the PC didn't crash. So that was more than an hour straight without the PC crashing.
I have ran the CPU stress test for 30 minutes and it passed, I ran the power test in OCCT for 30 minutes and it passed. But the system crashed within a minute after the test completed.
I ran 80% VRAM test and 3D adaptability test separately on GPU for 30 minutes each and both passed; another hour straight. However, when I ran both these tests combined, the system consistently crashed between 10-12 minutes.
I ran the Linpack stress test and the system crashed after 19 minutes. But it didn't seem to put too much stress on the system and I didn't touch the mouse for some time. The screen turned off and the system crashed with it.
I have used another set of RAM but the system crashed with it as well.
What do I do? Are there any other ways I can pin point or get closer to the problem? Thank you very much
Thank you for your patience. I understand the delay in responding and apologize for the confusion. The Global C-State Control was turned off in the BIOS, which initially resolved the issue but has since returned to the same problem. The system is experiencing crashes during idle and under normal use, lasting about ten minutes. It appears a component may be failing. A restart needs a cooldown period; otherwise, the VGA LED stays on continuously, causing the loop. Please let me know if you need guidance on adjusting other settings to resolve this.
I'm running out of ideas except to suspect the CPU, motherboard, or power supply might be the issue. I'm a bit doubtful about the power supply, but testing it with a stronger unit could help. You might also consider lowering the graphics card's power to the minimum via Adrenaline settings and then verify stability.
I adjusted various BIOS settings, turning off CPPC and CPPC Preferred Cores. I also configured Power Supply Idle Control to Low Current Idle.
The system failed to boot and would loop back on restart, especially when the CPU Ez debug LED was on, though it worked with the VGA LED even during a cold start.
After resetting the BIOS, the computer started after a long warm-up period.
Disabling Global C-State Control in BIOS didn’t alter the issue; the system still crashes and doesn’t restart properly.
Temperatures were normal overall, but I haven’t checked the chipset temperatures. I’ll follow up once I have more information.
If you have any suggestions, please let me know.
Thank you.
Hello!
I am reporting back for the sake of documentation and to thank you for your time and effort and giving me a very useful tip in the process.
It was a faulty motherboard
. The motherboard has been returned under warranty and I await a replacement. The problem was getting progressively worse. The system refused to boot in the end even on a cold start. No BIOS settings or software level fix was sufficient.
Thank you very much.