Author Topic: CPU, cores, threads and cache (Read 8706 times)

spadestick · 2017-04-18, 19:26:40

Not sure if this is the right location to post this, but here goes....

It seems that from the Benchmark Figures, Corona thrives on these $10,000 (for the latest versions) Intel Xeon CPUs - which are really meant for Servers, and not graphics.

I am really backward and confused by all the numbering and iterations this crazy Intel corp releases, they couldn't be more ridiculous with their no. of versions.

Please can somebody help me understand regarding Corona's rendering speed in relation to these questions :

1) Are the number of cores more important than the number of threads?
2) No of Physical and the logical cores - does it make any difference?
3) What the best frequency in terms of stability and speed 2.0 Ghz to 4.6Ghz - I understand that the higher, the more chances of crashing /overheating. (from the tropics).
4) There's a huge price range between so many Xeons CPUs - What is the best compromise in terms of speed and price?
5) Is it worth getting an E7 instead of an E5, or the insane price is unjustifiable?
6) Is cache important to Corona?

What should I get if I only have $1000 USD to blow on the CPU?

Thanks!

Ryuu · 2017-04-18, 20:47:28

Please let me know if I'm providing the overview at the wrong level of detail :)

1+2)
Physical core - the actual hardware executing the code
Logical core - what OS sees as some (possibly virtualized) device capable of executing the code
Thread (of execution) - a piece of code executed with its own state
HyperThreading (and whatever the AMD name for this is) - technology where each physical core provides two logical cores to the OS (each physical core executes two independent pieces of code at once)
=> 1 physical core = 2 logical cores = 2 simultaneously executed threads

3) As long as you are within the specified frequency for the CPU you should not get any instability. When you start overclocking you're entering the danger zone. It kinda doesn't matter if you are overclocking from 2 GHz to 2.2 GHz or from 4 GHz to 4.4 GHz. You can melt your CPU in both cases.

The maximum frequency of CPU is determined by how fast the electrical signal can propagate through the actual CPU hardware. Each CPU has slight manufacturing deviations so the max achievable frequency is different for each CPU (even within the same model). When you get past this max frequency, you'll get system instability simply because some values within the CPU will start to get used before they are actually computed or transported to where they are needed.

The elecrtical signal propagation speed is influenced by voltage. The higher the voltage the higher the propagation speed => higher voltage also means higher max frequency. This is the reason why you often need to increase voltage when overclocking.

The consumed power = K * frequency * voltage². The K is some constant depending on the actual physical CPU. For example when you want to overclock from 4 GHz to 5 GHz you may perhaps also need to raise the voltage from 1 V to 1.2 V and suddenly the CPU consumes almost twice (1.8x) as much power. Since the power consumed is exactly equal to the heat output, this is the main problem with overclocking.

4) Our experience is that as long as you are comparing CPUs with the same architecture/generation, the #cores * frequency is a good metric to estimate their relative performance. As for "speed vs price" this is tradeoff that you must decide for yourself :)

5) Unless power consumption/cooling is your main concern, I'd guess that you're better off with multiple smaller/slower machines using DR that with one big monster. You'll have a bit more to manage when installing updates & stuff, but when a single machine dies, you'll be grateful that you have few other to continue your work.

6) Corona is heavily memory limited - this means that the CPU spends most of its time waiting for the memory to provide data to be operated on. This also means that the cache size and speed is one of the most important factors in Corona performance.

I'm not sure about the prices right now, but with 1k USD I'd probably buy one of the new AMD Ryzen CPUs.

spadestick · 2017-04-19, 15:34:19

Thanks Ryuu for the effort and time put into this super insightful post!
I'll look into DR and AMD, unfortunately I don't have old machines lying around. But perhaps I can DR in the office one day...
Actually, I have no intention nor desire to overclock at all. The worry itself would pretty tiring!

I hope to find just one answer from this investigation :

Out of these factors on a single unit : Number of Cores | Threads | Frequency Ghz | Cache
Which the most important for Corona's speed?

Thank you!

Juraj · 2017-04-19, 19:13:44

Quote from: Ryuu on 2017-04-18, 20:47:28

6) Corona is heavily memory limited - this means that the CPU spends most of its time waiting for the memory to provide data to be operated on. This also means that the cache size and speed is one of the most important factors in Corona performance.

When does it show in practicality ?

On my v4 Xeons, DDR4 memory provides identical result down to single point in Cinebench R15 whether it;s clocker to 1333 or 2400 MHz.
Now I think should try this on Corona to see if it will provide different result. With above statement, should the difference be dramatic ? Apparently it already does matter for architecture like Ryzen, but that is out of hardware reason.

What does it mean for future ? That Corona's performance will rise as DDR4 (and eventual DDR5) will mature into better speed ? It doesn't seem like the development on this front is any dramatic, so Corona will continue to be bottlenecked in this way ?

What's the underlying cause of this ? The memory speed itself, or the connection between memory and CPU ? Do GPU renderers suffer from this too given they have much higher memory speeds and tighter connections ?

Ondra · 2017-04-19, 20:50:46

Quote from: Juraj_Talcik on 2017-04-19, 19:13:44

What does it mean for future ? That Corona's performance will rise as DDR4 (and eventual DDR5) will mature into better speed ? It doesn't seem like the development on this front is any dramatic, so Corona will continue to be bottlenecked in this way ?

I think the situation is not as dramatic as ryuu's post made it to be - in fact, most of modern applications are memory-bound. CPU speed just historically advanced much faster than memory speed:

Note that the graph is logarithmic in y, so it shows appox 300times gap. There is probably some physics behind this, but I would guess simple reason is that CPUs just had to get faster, memory had to get faster AND bigger.

The most important fact is that Corona scales flawlessly with amount of cores, because that is the only direction we are making solid advance. Can you imagine CPUs getting twice as fast from one generation to the next? Such as from 4GHz to 8GHz? Well, Ryzens just doubled the standard number of cores (4->8) in our usual price range for a workstation. And we are waiting for naples ;).

Quote from: Juraj_Talcik on 2017-04-19, 19:13:44

What's the underlying cause of this ? The memory speed itself, or the connection between memory and CPU ? Do GPU renderers suffer from this too given they have much higher memory speeds and tighter connections ?

Problem is memory latency - CPUs while rendering are constantly making a lot of decisions that are very difficult to predict (such as "which of the millions of tiny polygons will this ray hit? Which material will the polygon have? Which of the millions of pixels from its texture do we need to fetch?"). The whole scene cannot be kept in cache. Every time a polygon not in cache, or texture region not in cache is hit, it has to be fetched from the main memory, which costs hundreds of cycles. CPU has to basically stall during this time.

GPU memory, even though running at higher clock, has basically the same latency (remember the timing values? Those 8-12-12-24 etc. numbers? Those usually get higher when the RAM clock gets highrer - so even fast running DDR4 has basically the same latency as slower memory). GPUs have different strategy - they are optimized for fast context switching, so you launch hundreds of thousands of threads and when one thread stalls on memory access, the core quickly switches to another. This works nicely, but not nearly removes the problem (which is why yeah, "100* faster" in rendering is not happening ;)). It also means that the executed context for each thread must be small to allow fast switches, which limits code complexity

mferster · 2017-04-19, 21:58:56

Interesting info. So that being the case what would a theoretical huge technological break through in regards to RAM look like?

Ryuu · 2017-04-20, 10:58:31

Quote from: spadestick on 2017-04-19, 15:34:19

Out of these factors on a single unit : Number of Cores | Threads | Frequency Ghz | Cache
Which the most important for Corona's speed?

#cores * frequency is pretty good estimate of the rendering performance. So 4 cores at 4 GHz should be roughly equivalent to 8 cores at 2 GHz. This is just for rendering and other easily parallelizable tasks though. Many tasks are single threaded (Word document editing, most of the non-rendering work in 3ds Max, ...) and for these you want the highest frequency possible since the core count does not matter. This is the reason why many people choose few cores + high frequency on their workstation while using many cores + low frequency on DR slaves.

Quote from: Juraj_Talcik on 2017-04-19, 19:13:44

On my v4 Xeons, DDR4 memory provides identical result down to single point in Cinebench R15 whether it;s clocker to 1333 or 2400 MHz.
Now I think should try this on Corona to see if it will provide different result. With above statement, should the difference be dramatic ?

While we haven't done any benchmarks with changing memory frequency, my guess is that you probably won't see any effect with Corona either. Higher frequencies increase the memory bandwidth while the latency stays pretty much the same (due to the same signal propagation speeds which limit CPU frequency). Higher bandwidth could help a lot in cases where the memory is accessed sequentially, which is unfortunately not the case when rendering (as Ondra already explained). This is pretty much the same thing as with HDD/SSD sequential vs random access speeds.

Quote from: Juraj_Talcik on 2017-04-19, 19:13:44

What does it mean for future ? That Corona's performance will rise as DDR4 (and eventual DDR5) will mature into better speed ? It doesn't seem like the development on this front is any dramatic, so Corona will continue to be bottlenecked in this way ?

Well, I'm pretty pessimistic about the future here :) The CPU vs memory speed situation is pretty much the same at least for the last 10 years so we can't expect much from the near future. There are even some developments in the other direction with even slower (and bigger) memories being used instead of (or together with) the traditional RAM.

The most promising thing was the L4 cache in some (mostly mobile) Broadwell CPUs. We have one of those in the office, but I don't remember if that actually helps the performance in any singificant way. Not that it matters since AFAIK Intel removed those in Skylake.

Quote from: Juraj_Talcik on 2017-04-19, 19:13:44

What's the underlying cause of this ? The memory speed itself, or the connection between memory and CPU ? Do GPU renderers suffer from this too given they have much higher memory speeds and tighter connections ?

Physics is the culprit here unfortunately. The memory latency is determined by how fast can the electrical signal propagate from one end of the chip to the other (+ necessary signal amplification & stuff). While memories benefit from newer technologies just as much as the CPUs do, we just keep making bigger memories negating any performance gains.

The greatest example are the CPU caches themselves. L1 & L3 caches are implemented pretty much the same way. L1 runs roughly at the same speed as the CPU cores do while it takes ~40 clock cycles to get data from L3 just because its bigger.

GPU suffers from this the same way CPU does. It's optimized as long as all the threads from a single thread group (called warp) access the same data, but it all goes to shit when the access pattern becomes random (up to 32x slower than the optimal case). The GPUs are just a little better at hiding these latencies since they run many threads on a single physical core and when one thread waits for the data other threads are executed instead. HyperThreading does the same thing but with just two threads per physical CPU (4 threads in case of Xeon Phi).

Quote from: mferster on 2017-04-19, 21:58:56

Interesting info. So that being the case what would a theoretical huge technological break through in regards to RAM look like?

Adding few more layers of smaller & faster memory acting as additional cache may help.

There have also been reports of "holographics memories being the game changer next year" for the past 20 years, so I'm keeping my fingers crossed ;) IIRC the electrical signal propagation is around 2/3 of the speed of light, so moving this way may actually help if it's ever feasible.

Juraj · 2017-04-20, 15:34:28

Quote

There are even some developments in the other direction with even slower (and bigger) memories being used instead of (or together with) the traditional RAM.

Like Optane ?

Anyway, lot's of interesting info. Are you guys getting the Naples samples ? Those had some solid cache amount.

Ryuu · 2017-04-20, 15:39:37

Yeah, like that.

The engineering samples usually come with NDAs written in a way that we would have to kill you if we told you ;)

spadestick · 2017-04-21, 03:54:49

Thanks Ryuu, especially helpful with the description of Cores x Ghz to get a rough picture of matching performance.

News:

Author Topic: CPU, cores, threads and cache (Read 8706 times)

spadestick

CPU, cores, threads and cache

Ryuu

Re: CPU, cores, threads and cache

spadestick

Re: CPU, cores, threads and cache

Juraj

Re: CPU, cores, threads and cache

Ondra

Re: CPU, cores, threads and cache

mferster

Re: CPU, cores, threads and cache

Ryuu

Re: CPU, cores, threads and cache

Juraj

Re: CPU, cores, threads and cache

Ryuu

Re: CPU, cores, threads and cache

spadestick

Re: CPU, cores, threads and cache