Processors
Intel® Processors, Tools, and Utilities
14663 Discussions

The reason behind the fast 13/14th gen i9 degrade and how to evade it!

Nichronos
Beginner
3,174 Views

The actual cause for the extremely fast degrede of i9 chips is actually intels own turbo boost algorithm on those two "favored" cores. While monitoring with multimeter at the back of the CPU socket i noticed some voltage spikes up to 1.65v on my 14900KF because of the two 6.0GHz P cores when opening apps, alt-tabbing and browsing with Chrome. leaving the CPU for 1-2 months with that voltage wont end well and degrades it with more than 100mv. then the constant BSOD and crashes start to happen!

 

Currently the only solution to actually preserve your new i9 is not with applying some forced power limits which cripples your CPU, but setting the turbo ratio to all core boost instead of the default per core and manually limiting the maximum voltages! Also is good to disable Turbo Boost 3.0 which forces more load on those specific two favored cores mentioned and has nothing to do with the actual turbo.

 

This is an example of what you should set around, use it as a guideline and if your silicon is stable with less than this, the better! Moving above said voltages is an indicator that your chip has already started degrading since they are +50mv above the minimum voltage possible for the "average" bin.

 

13900K/KF - 55p 43e at 1.27v
13900KS - 56p 43e at 1.30v
14900K/KF - 57p 44e at 1.30v
14900KS - 59p 45e at 1.42v

Turbo Boost 3.0 - DISABLE
Enhanced Turbo - DISABLE

0 Kudos
24 Replies
Keean
Novice
2,666 Views

I have done extensive testing of my 14900ks and found it is when both hyper-threads on a p-core are loaded combined with high clock rates that problems occur.


Power limits help pass all-core tests by throttling the cores back, but don't help when less p-cores are active (between 1 to 4 active p-cores seem to be problematic).
Single threaded tests don't load both hyper-threads in the p-cores, so don't fully stress them.


The normal tests people run, single thread, and all-core loads don't really test the failure conditions, and hence setting power limits seem to resolve the problems, whilst actually just making them harder to detect.


I found that setting the p-core ratio limit by cores used to x58 for all cores results in a stable CPU without changing any other settings from default.

 

In fact with the p-core ratio limit set to x58, you can remove the power limits, and the CPU is still stable under an extensive set of tests: Both hyper-threads on 1, 2, 4, 8 p-cores for 1 hour. For 4 cores I am choosing sets of cores to maximise heat for example cores 1, 3, 4 & 5 so that core 3 is surrounded on 3 sides by hot cores. Plus a longer test that alternately loads a single thread and then varying numbers of threads onto p-cores for about 6 hours. These tests all use a compiler so they stress integer address arithmetic which seems to be where the problems are.

0 Kudos
Nichronos
Beginner
2,642 Views

Fact is that we must use only all core ratio and static voltage, power limits dont matter.

As long as you dont have single cores boosting everything is fine if your CPU hasnt yet degraded!

0 Kudos
Keean
Novice
2,632 Views
I am not sure the static voltage is necessary? I don't think it will use the top SVID voltages at x58, and you can save power by using the lower voltages at lower speeds.

There as some BIOS voltage limits which let you use the SVID voltage but cap it at some level? I wonder if that is better than setting a static voltage?
0 Kudos
Nichronos
Beginner
2,558 Views

The option you describe does not exist! Also as long as you have C-states set to "enabled" instead of "Auto" you will have low idle power!

0 Kudos
Keean
Novice
2,556 Views

ASUS PRO w680-ace has this option (not sure about other motherboards), under "Auto Voltage Caps", see "CPU Core Auto Voltage Cap" ...set a ceiling for core auto voltage...

 

There is also: "IA VR Voltage Limit" ... Maximum instantaneous voltage allowed at any given time...

 

Edit: these both seem to behave  weirdly. The first seems to let the core voltage go above 1.42v so not quite sure what it's limiting? The second seems to scale all the voltages, so it really does make sure the cores don't go above 1.42, but they end up actually getting a lower voltage which has the effect of throttling p-cores to 5GHz and e-cores to 4GHz.

0 Kudos
Nichronos
Beginner
2,535 Views

This appears to be an ASUS thing, i dont have any CAP or Limit options beside current (A) and power draw (W) available on MSI Z790 boards, but we both have sepparate AC_LLC and DC_LLC which when set correctly like 0.5/1.1 on ASUS or 50/110 on MSI can reduce voltages in heavy load scenarious like Cinebench, Prime95, Blender... etc. by large amount and works way better than simple using an negative offset.

0 Kudos
Keean
Novice
2,507 Views
Looks like the IA VR limit applies before vdroop, so it has to be set higher. Of course as vdroop varies with CPU current draw, the actual max core voltage will vary too...

So not sure what I would set this to, as if it's set to 1.42 then most of the time because of vdroop the cores will get less than this - however if set higher the cores might get that higher voltage when the CPU current is lower...

0 Kudos
Keean
Novice
2,336 Views
@Nichronos my understanding of the AC/DC load lines are that intel suggest keeping AC = DC and that 1.1 is the mid-point for an average chip?

What is interesting is that with p-core ratio limited to x58 the chip is stable with:
AC_LL=DC_LL=1.0
But with the ratio limited to x59 it requires:
AC_LL=DC_LL=1.4

How can the chip be better than average at x58 and worse than average at x59?

Does this suggest that Intel are not setting the VID correctly, and that the chip actually needs a bigger step up in voltage at x59?
0 Kudos
Nichronos
Beginner
2,322 Views

I woudnt reccomend you doing AC_LL=DC_LL above 1.0, the fact that you require 1.4 to be stable at higher frequency means your chip has already started to degrade. The default SVID from the fused VFcurve the chip is requesting is based on the initial factory quality when it left the warehouse, and its usually around 50 to 60mv higher than the absolute minimum needed to properly function for the planned 10 year degrade cycle.

I would strongly reccomend to RMA the chip if you started experiancing stability issues with its stock SVID. Doing power limits and downclocking it is not a solution! Intel are sudgesting such pathetic measures only to minimise the already excessive RMA requests they are experiencing.

What are the manual all core voltages you need for 59x and 58x ratios on your 14900KS and what is your SP rating on the P cores?

Usually with AC_LL=DC_LL=1.0 you should be stable at 59x between 1.35 to 1.37v and about 1.30v with Hyperthreading disabled for the average KS bin on a normal 360 aftermarket AiO, obviously with delid and exotic cooling such as Chiller or MO-RA3 will be much lower.

0 Kudos
Keean
Novice
2,310 Views

The 14900ks have only been available for about a month, and it was exactly like this from new - no change in stability. The same tests fail now as did when I got it... And I have tested two brand new 14900ks with similar results. 

 

Just finished testing limited to x58 at 0.9/0.9 and it's still stable for all tests...


This is a workstation motherboard not an overclocking one, so it doesn't give an SP score for the CPU.

The tests I am running are to compile code with several threads going, and set the affinity to one of the preferred cores, and it more or less fails immediately. Have you tried any tests like this? I would be interested to know if any K/KS is stable at intel default settings with this kind of load?

 

I should point out that it is the load conditions that are the problem, the CPU is fine boosting to 6.2 on Cinebench, XTU stress test, Intel diagnostic tool etc. It passes prime95 torture tests the works... But compiling code (or shaders in games) triggers the problem, but only rarely (say once every 6 hours continuous compilation test) if no set affinity is used. Of course set affinity does not cause the problem, it's there anyway, it just makes it happen more quickly for testing.

 

I should run the compiler tests without hyper-threading and see what happens, because I think the particular issue is caused by compiler-like loads and hyper-threading.

0 Kudos
Nichronos
Beginner
2,236 Views

I had no idea you were using the ASUS Pro W680-ACE IPMI workstation motherboard. The ASUS Pro and Prime series are their bottom of the barel low-end motherboards with absolute trash VRMs which are meant for mainly i5 K or for i7 to i9 65W non-K chips with its stock intel cooler like on the pictures i attached. With the 14900KS you are asking more than double what it can actually handle. The only reason this board is still even working is because it has excessive amount of capacitors to ensure longevity which is the actual difference between Pro and Prime.  You should best be using the ASUS Z790 Apex with high-end digi VRMs and other exceptional features such as superior RAM compatibility and overclocking. Its imposibble for the Pro W680-ACE IPMI to handle your chip, I am honestly amazed that it can even sustain 5.8GHz at all. Untill you buy another motherboard i strongly sudgest to disable its Hyperthreading to reduce the stress on this motherboard and limit the P-cores to somethign more reasonable like 56x all core at lower voltage cap!

0 Kudos
Keean
Novice
2,221 Views

> Untill you buy another motherboard i strongly sudgest to disable its Hyperthreading to reduce the stress on this motherboard and limit the P-cores to somethign more reasonable like 56x all core at lower voltage cap!

What motherboard do you suggest which supports ECC RAM?

Actually the VRM this motherboard is something like 12+2 at 60A per phase that's 720A of current. Seeing as the limit of ICCMAX is 511.75A, the VRM has plenty of current for this CPU.

 

As for cooling, there's a custom water loop, with two 360 radiators on there, which I think is plenty.

 

Its not the motherboard that is the problem here, the CPU can sustain all-core loads fine, it tends to be thermally limited by the IHS first. The water temp is a steady 31°C so there is plenty of radiator/fan power, and the cores are hiting 100°C, so it would require delidding to improve the cooling and get any more multi-core performance. The VRM is not the limiting factor here, it's the IHS.

 

In any case, the errors happen most easily running both hyper-threads on a single p-core, which is neither thermally limited, current limited, nor power limited.

 

0 Kudos
Nichronos
Beginner
2,212 Views

Wrong, this motehrboard has 8+1+1 60A that only handle lower tier chips as i described.

 


@Keean wrote:
What motherboard do you suggest which supports ECC RAM?


Did you went for the Pro WS W680-ACE IPMI to use ECC RAM? That is an extremely togh question tho, i cant seems to find another board that supports both the consumer 1700 chips with DDR5 ECC except yours... Everything i see opperates them in non-ECC mode. I will keep looking into this. Cheers!

0 Kudos
Keean
Novice
2,208 Views
Well there are 8+1 real and 4+1 virtual which is 12+2... Even limited to just the 8 real phase thats 480A which is plenty considering Intel's ICCMAX of 280A, 307A or 400A depending on power profile.

As I said, all core loads are thermally limited by the IHS.

The failures are occuring when the CPU is not thermally limited, so when only few p-cores are active.
0 Kudos
Nichronos
Beginner
2,199 Views

Ok, can you do an quick experiment. Set manually the P cores to 59x all cores, E to 45x, cache to 45x and disable HT. See how it goes for a day and report back to know is it stable or you are still experiancing crashes and BSOD. Reset everything else to defaults, dont change AC/DC LLC ether, leave them at Auto. Lets start from somewhere with a baseline and move from there, OK ?

0 Kudos
Keean
Novice
2,187 Views
I have already done these tests.

With hyper-threading disabled I don't need to limit the core speed, it will run up to 6.2Ghz with no crashes for loads with a few threads.

With full multi-core load, it hits 100°C and stays there for hours with no errors.

It will do this with default LL settings.

However even with hyper threading enabled, it does not crash on all-core loads. It will run for 6+ hours all cores fully loaded, hyper threading enabled, no errors.
0 Kudos
Nichronos
Beginner
2,185 Views

Dont let the cores boost to 6.2 at all, its whats causing the chip degradation as i described in the main post. Stay at 59x and no HT, this will allow you to also set a way lower Vcore, perhaps 1.30v would be stable and will redure your temps by around 10-15c depending on the load!

0 Kudos
Keean
Novice
2,079 Views

I don't see particularly high voltages at 6.2 with hyper-threading disabled, this runs completely fine at (AC/DC) 0.9/0.9 so lower voltages than intel default. I will see if I can go lower with hyper threading disabled.

I believe it's the combination of hyper-threading and boosting above x59 with certain instruction patterns that causes the crashes... And maybe causes degradation, although I have seen nothing like that on my 13900ks which has been running without limits since they were launched.

Personally I think newer games use more shader compilation, and that's why people are seeing more crashes. The problem only occurs with certain instruction patterns that are used in compilers (like shader compilers in games) and decompression. Maybe in other places too, but not all code has a problem.

0 Kudos
Keean
Novice
1,996 Views

So (AC/DC) 0.8/0.8 was fine on an all-core loads, but failed on a single loaded preferred p-core. (This is with hyper-threading disabled, but no core ratio limits).


0.9/0.9 seems stable on single core loads so far, and it definitely reaches x62 occasionally, although it spends most of its time at x61 in my 1 hour compile test (because the core is between 60°C and 85°C. Max boost to x62 is only allowed when p-core Tj < 60°C and x61 when 60°C <= Tj < 85°C, and x60 when Tj >= 85°C.

 

When loaded the max voltage I have seen on this p-core is 1.37v. The overall max voltage appears during transients on idle p-cores and seems to be about 1.47v. The max core temp is ~75°C on the loaded p-core.


The higher the current the greater the vdroop, so max voltage is going to be when the SVID is requesting a high voltage, and the core has stopped working so the current draw drops, but the VRM has not had time to respond, and you get a high requested voltage with a low vdroop. That appears to be 1.47v but of course you have no idea what the fast transients are.

 

I think that's why it's important to keep AC=DC as matching the impedance minimises the ringing (overshoot) of the voltage. I doubt the voltage sensor is sensitive enough to read any high-frequency ringing that occurs, you would need to hook up an oscilloscope to the power lines to get a good look.

0 Kudos
Keean
Novice
1,877 Views

So for my 14900ks this is what I have found so far

- it's stable at a p-core ratio limit of x58, hyper-threading enabled, no power limits, with a slight dynamic undervolt (AC/DC=0.8/0.8). Because the CPU will be thermally limited for all-core loads, this setup gets full all-core performance, but because of the ratio limit will not get full single-threaded performance.

- it's stable with hyper-threading disabled, no frequency or power limits with a slight dynamic undervolt (AC/DC=0.9/0.9). This setup will get full single-threaded performance, but because hyper-threading is disabled it won't get full all-core performance.

- However at a core ratio limit of x59 with hyper-threading enabled requires an over-volt to be stable (AC/DC=1.4/1.4), and I have not tested at higher frequency limits for the preferred cores, as I don't want to over-volt the CPU.

It will actually boot and appear stable at default settings but might fail on all-core stress tests like Cinebench. Setting power limits does enable it to pass all-core stress tests (intel processor diagnostic tool, XTU, Cinebench, OCCT etc) - however it still fails the multi-thread compiler test when affinity is set to a few p-cores (particularly vCPU 8 and 9 together fails fast).

It is only compiler like loads, and only when more than one thread is loading the same p-core, but other p-cores and e-cores are idle causing this problem, and that makes it hard to test for - I don't think most people are testing for this issue properly.

If this is actually an issue with all 14900k/ks, it seems limiting p-cores to x58 when both hyper-threads are active would solve the problem without sacrificing either single-threaded or all-core performance.

0 Kudos
Reply