Author Topic: Random resets on new workstation  (Read 2596 times)

2021-01-27, 22:17:29

Vuk

  • Active Users
  • **
  • Posts: 113
    • View Profile
Hi my dear Corona colleagues.

So I bought the MSI X570 Tomahawk board for a new workstation build for one on my colleagues at work. I built the system myself and tested it, everything was working fine. I was running XMP on the ram (2x32gb 3200mhz Kingston sticks) and PBO. All was stable until a few days ago when the PC started getting random restarts. In the last 3 days I get 3-4 random restarts in a time span of 8-10 hours of work on this computer. It is not related to any specific action/program or process it just happens randomly. It happened once while doing some 3d modelling work, then while browsing some photos on the server and then just before I launched interactive rendering.

When I enter the Event viewer I have a Critical log Kernel-Power 41 task Category (63).

This is the PC:
Cpu: 5950x
Mobo: X570 MSI Tomahawk bios Ver: V14 - latest stable
RAM: 64gb (2x32 3200mhz)
GPU: RTX 2060super
PSU: 750w asus strix gold
SSD: Kingston A2000 1TB
Cooler: Arctic Liquid Freezer II 280
Win: Running latest win update

Yesterday evening I did several tests for several hours with no issues at all. Of course before all this I turned off both xmp and pbo and restarted the bios to default. I am running bios version V14 from November 2020 which is the latest stable version.
Did Prime95, OCCT, Valley, Furmark. I even ran Prime95 and Furmark at the same time, having both the CPU and the GPU stressed to 100% for 1 hour and without any crashes. Temperatures are just fine below 73C in full PRime95 load, Psu also 12V,3V and 5V rails look good in OCCT after the 1 hour test which didn't give any error. I don't doubt the PSU to be honest I started doubting the ram or the ram slots on the mobo.

Today I ran Memtest86 on my other workstation. Both sticks after 2 straight hours of testing each ( total 2 of 4 passes) gave no errors. I am currently testing the slots on the Tomahawk motherboard for now there are no errors when 1 slot is populated after 2 hours of testing. Will try the test now with both slots populated but my guess that this is fine as well since dimm slot A2 gives no errors.

Anyone have any ideas of what it could be? I could try another PSU since I have a spare one, could even try a different cpu since I also have a 5900x laying around in a box and waiting for another build. But before I do that I would like to hear some opinions from you guys it would help a lot since I am completely stuck :(


2021-02-03, 04:18:05
Reply #1

jalapeno

  • Active Users
  • **
  • Posts: 6
    • View Profile
Hi, after reading your description I was betting on XMP, but after some research I think most likely it will be 1. or 2.
I would start with below in that order:

1. Disable fast startup. I've got many issues because of that functionality. Idk why but after some updates from Microsoft I found that enabled again, so need to recheck it.

2. Change device manager settings: disable wake on lan and wake by devices, I've seen that solved the issue few times.
Start from point 1 and 2.1, I would wait with 2.2
https://www.tenforums.com/bsod-crashes-debugging/38171-kernell-power-even-id-41-category-63-a-post707015.html#post707015

3. Remove any dongles, wifi cards, bluetooth adapters, all that are not necessary. Check if all cables are properly connected.

4. Nvidia drivers, try reinstalling and other rev.

5. Check if there are errors in event viewer right before that Kernel-power

6. Try doing only PBO without XMP profile, then try XMP profile without PBO

7. I think XMP can be issue here. Try "manual" memory OC if you need that, it was somehow automated with usmus dram calculator

8. Try memtest but on the same workstation that you had issues on

9. psu swap

10. Remove unnecessary drivers or addons like "killer lan" etc, this one in particular caused me a lot of pain some time ago.
Try reinstalling remaining ones.

11. Reassemble your pc

Let us know if any of that worked for you.
Hope that helps ;)

2021-02-03, 09:12:50
Reply #2

Juraj

  • Moderator
  • Active Users
  • ***
  • Posts: 4762
    • View Profile
    • studio website
Well, here is my guess:

I doubt it's XMP related. In such case, the PC would crash and mostly stay there, required manual power-off. After such crash, the XMP would turn off by memory training itself.
750W is more than enough for 5950X and 2060X, and the Asus has "good" reviews, there are two issues:

1) Based on https://linustechtips.com/topic/1116640-psucultists-psu-tier-list/ we can see that all generations of this PSU range, has '1' issue mentioned: Transient loads. That means that even despite fitting into overall power envelope of the PSU, spikes will crash it. And 5950X with its high turbo, and particularly large spike voltages, is the perfect unit for this. PBO shouldn't really change this since it foremost ups the total power limits, but doesn't really lead to much higher voltages.

2060S should not be the culprit. But almost all Ampere (30xx) range has also transient issues. For example 3090 has 320-350W limit by stock, but spikes can be up to 500W !. Those will crash most common PSUs.

I would try swapping the PSU first, it's almost easier than playing around with voltage curves, bios and software solutions, etc. That will just get on your nerves very quick.

On top of this, I would highly suggest against PBO, esp. with Zen3 chips. There is absolutely no point to it. You can play with so-called "PBO-2", but its benefits can be a risky trade-of with stability, as is always the case with under-volting or manual voltages in general. Although there will be plenty of people to tell you otherwise, so this is just my personal opinion on PBO2. The opinion on PBO (ver1) is almost universal by now though, don't touch.


Most reputable Tier1 PSU units I will always stand behind: Seasonic Prime, Corsair AXi. There are tens of Tier2 which are more than capable (Higher EVGA range, Superflower Leadex series, etc..) or, you can see how big the list is the link above. But still, I consider stability in workstation to be so paramount that I will always go for the highest PSU.

I had almost no PC crash in a decade. Though I don't count PC memory boot issues (XMP/DOCP related).



Please follow my new Instagram for latest projects, tips&tricks, short video tutorials and free models
Behance  Probably best updated portfolio of my work
lysfaere.com Please check the new stuff!

2021-02-03, 19:06:12
Reply #3

Vuk

  • Active Users
  • **
  • Posts: 113
    • View Profile
Hi, guys thanks for all the replies. The issue went away at some point for 1-2 days the only thing I did was swapping the mouse USB cable from the bios flashback port.

The first thing that came to my mind was PSU or RAM then I ruled out the PSU but seems you may be right Juraj. I usually buy Seasonic or Evga but for some reason, I bought this one since it was on a good deal and I heard it had Seasonic components + a 10-year warranty.

I also wrote on the overclockers forum in the X570 Tomahawk thread and there are quite a few people there having the same issue as I do. Random restarts or random bluescreens. Actually, quite a few people having issues with the 5000 series right now due to the AGESA updates from AMD and bad Bios versions. One of the guys there (overclockers forum) told me to play a bit with LLC, voltage control, C-states, and the current type. I did some changes in the bios and for now 2 days without any issues. I did disable fast boot and all of the above mentioned by jalapeno.

If it crashes again I am going to do a PSU swap the only problem is that I doubt that the vendor will accept a PSU return :( . In case the PSU swap doesn't help I'll just swap the CPU as well luckily I have another one just in case :)