Author Topic: Better CPU multithreading support? (Read 11699 times)

SaY · 2013-06-18, 16:46:01

Seems like Corona is not optimized that well for the large number of cpu cores.
I'm using these 2 machines:
dual x5680 @ 4ghz (12 cores - 24 threads)
dual e5-2689 @3.4ghz (16 cores - 32 threads)
Corona benchmark results are 8.2M rays/s vs 7.5M rays/s in favor of x5680 system.
But both Vray and C4D results are exactly opposite - e5-2689 is significantly faster, by about 15% (which makes sense considering never generation and number of cores difference).
Is this something you could resolve in future releases?
Thanks.

Juraj · 2013-06-18, 23:48:00

The E5 result looks pretty bad...something I was quite worried about. Judging by 3930k unclocked score of roughly 5mil rays, the dual top E5 should easily scale to atleast 10. They quite do so in all synthetic tests, and practical Vray benchmarks too.

Ondra · 2013-06-19, 00:12:33

Because the raw CPU computation in corona is optimized pretty well, you will hit the memory bandwidth limit pretty hard with huge number of threads. But it might be possible you are hitting other limits:
Frame buffer access: try editing the export.conf file of the benchmark, change int pathtracingSamples = 16 to 128, what will the comparison look like?
HD cache: try setting gi.secondarySolver to 1

Juraj · 2013-06-19, 15:20:23

Interesting, the sandy xeon platform should have quite a memory bandwich. If one would lend you such machine for some time, is this something you would be willing to test Keymaster eventually :- ) ? I think able to get the most out of server machines might be quite important soon, as Ivy bridge gen xeons are approaching by 2014, with down to Octo-socket (and up to 256 cores I think). Also, I hope I am not lonely with my investments into them (I dread the idea of 20 small boxes as "personal renderfarm"...so I opted for easily mantenainced multi-socket xeons who also make nice workstations.

SaY · 2013-06-19, 16:54:35

Quote from: Juraj_Talcik on 2013-06-18, 23:48:00

Judging by 3930k unclocked score of roughly 5mil rays, the dual top E5 should easily scale to atleast 10.

Well, in my Vray tests dual e5 is about 100-120% faster then two mildly overclocked 3930k machines (distributed render). That's why I got dual cpu systems - I don't want to have 5-6 computers sitting under the desk in my home office.
Corona is a different story though.

SaY · 2013-06-19, 16:59:11

Quote from: Keymaster on 2013-06-19, 00:12:33

Because the raw CPU computation in corona is optimized pretty well, you will hit the memory bandwidth limit pretty hard with huge number of threads. But it might be possible you are hitting other limits:
Frame buffer access: try editing the export.conf file of the benchmark, change int pathtracingSamples = 16 to 128, what will the comparison look like?

I tried that, speed dropped by about 20% on both computers.
I'm not sure if memory bandwidth is a problem, according to intel it E5 has almost 2x more bandwidth than the x56xx platform.

http://www.intel.com/content/www/us/en/benchmarks/server/xeon-e5-hpc/xeon-e5-hpc-memory-bandwidth-stream.html

Juraj · 2013-06-19, 17:02:36

According to this post, all seems rather well with the E5 gen, look at those 20mil/ray/sec :- )

http://forum.corona-renderer.com/index.php/topic,357.0.html

Ondra · 2013-06-19, 17:11:15

Quote from: SaY on 2013-06-19, 16:59:11

Quote from: Keymaster on 2013-06-19, 00:12:33
Because the raw CPU computation in corona is optimized pretty well, you will hit the memory bandwidth limit pretty hard with huge number of threads. But it might be possible you are hitting other limits:
Frame buffer access: try editing the export.conf file of the benchmark, change int pathtracingSamples = 16 to 128, what will the comparison look like?
I tried that, speed dropped by about 20% on both computers.
I'm not sure if memory bandwidth is a problem, according to intel it E5 has almost 2x more bandwidth than the x56xx platform.

http://www.intel.com/content/www/us/en/benchmarks/server/xeon-e5-hpc/xeon-e5-hpc-memory-bandwidth-stream.html

The problem is not with sequential bandwidth, but random access, which is sadly completely different story

SaY · 2013-06-19, 20:49:53

Quote from: Keymaster on 2013-06-19, 17:11:15

The problem is not with sequential bandwidth, but random access, which is sadly completely different story

But random access can't be lower then the previous generation xeons, right?
What I see is 54GHz (combined) new generation cpu is 15% slower then the previous generations 48GHz...

Ondra · 2013-06-19, 21:12:38

no, but it can kill the parallel scaling. Fewer more powerful cores (higher frequency) can then be better. But this is a speculation. I would have to do some profiling with those machines, and I dont have the hardware nor skills for that right now. I am hoping eventually I'll be able to hire a specialist for this kind of optimization.

BTW: have you tried using PT+PT instead of PT+HD as I told you?

SaY · 2013-06-20, 03:50:10

gi.secondarySolver set to 1 helped:
E5 - 18.2M
x5680 - 16M
So I guess HD is what slows down the E5?

Ondra · 2013-06-20, 10:07:01

ok, that makes sense. Good news is that I can optimize this myself

SaY · 2013-06-20, 15:49:00

That's great, thanks.
Vray actually had a similar problem initially, the light cache was acting weird with the E5 xeons. They got it fixed pretty fast.

Ondra · 2013-08-12, 16:13:21

ok, if the problem is localized to HD cache, then it is not a bug. I have put HD cache multithreading optimization into my TODO list

News:

Author Topic: Better CPU multithreading support? (Read 11699 times)

SaY

Better CPU multithreading support?

Juraj

Re: Better CPU multithreading support?

Ondra

Re: Better CPU multithreading support?

Juraj

Re: Better CPU multithreading support?

SaY

Re: Better CPU multithreading support?

SaY

Re: Better CPU multithreading support?

Juraj

Re: Better CPU multithreading support?

Ondra

Re: Better CPU multithreading support?

SaY

Re: Better CPU multithreading support?

Ondra

Re: Better CPU multithreading support?

SaY

Re: Better CPU multithreading support?

Ondra

Re: Better CPU multithreading support?

SaY

Re: Better CPU multithreading support?

Ondra

Re: Better CPU multithreading support?