Author Topic: Better CPU multithreading support?  (Read 11699 times)

2013-06-18, 16:46:01

SaY

  • Active Users
  • **
  • Posts: 72
    • View Profile
Seems like Corona is not optimized that well for the large number of cpu cores.
I'm using these 2 machines:
dual x5680 @ 4ghz  (12 cores - 24  threads)
dual e5-2689 @3.4ghz (16 cores - 32 threads)
Corona benchmark results are 8.2M rays/s vs 7.5M rays/s  in favor of x5680 system.
But both Vray and C4D results are exactly opposite - e5-2689 is significantly faster, by about 15% (which makes sense considering never generation and number of cores difference).
Is this something you could resolve in future releases?
Thanks.


2013-06-18, 23:48:00
Reply #1

Juraj

  • Active Users
  • **
  • Posts: 4761
    • View Profile
    • studio website
The E5 result looks pretty bad...something I was quite worried about. Judging by 3930k unclocked score of roughly 5mil rays, the dual top E5 should easily scale to atleast 10. They quite do so in all synthetic tests, and practical Vray benchmarks too.
Please follow my new Instagram for latest projects, tips&tricks, short video tutorials and free models
Behance  Probably best updated portfolio of my work
lysfaere.com Please check the new stuff!

2013-06-19, 00:12:33
Reply #2

Ondra

  • Administrator
  • Active Users
  • *****
  • Posts: 9048
  • Turning coffee to features since 2009
    • View Profile
Because the raw CPU computation in corona is optimized pretty well, you will hit the memory bandwidth limit pretty hard with huge number of threads. But it might be possible you are hitting other limits:
Frame buffer access: try editing the export.conf file of the benchmark, change int pathtracingSamples      = 16 to 128, what will the comparison look like?
HD cache: try setting gi.secondarySolver to 1
Rendering is magic.How to get minidumps for crashed/frozen 3ds Max | Sorry for short replies, brief responses = more time to develop Corona ;)

2013-06-19, 15:20:23
Reply #3

Juraj

  • Active Users
  • **
  • Posts: 4761
    • View Profile
    • studio website
Interesting, the sandy xeon platform should have quite a memory bandwich. If one would lend you such machine for some time, is this something you would be willing to test Keymaster eventually :- ) ? I think able to get the most out of server machines might be quite important soon, as Ivy bridge gen xeons are approaching by 2014, with down to Octo-socket (and up to 256 cores I think). Also, I hope I am not lonely with my investments into them (I dread the idea of 20 small boxes as "personal renderfarm"...so I opted for easily mantenainced multi-socket xeons who also make nice workstations.

Please follow my new Instagram for latest projects, tips&tricks, short video tutorials and free models
Behance  Probably best updated portfolio of my work
lysfaere.com Please check the new stuff!

2013-06-19, 16:54:35
Reply #4

SaY

  • Active Users
  • **
  • Posts: 72
    • View Profile
Judging by 3930k unclocked score of roughly 5mil rays, the dual top E5 should easily scale to atleast 10.
Well, in my Vray tests dual e5 is about 100-120% faster then two mildly overclocked 3930k machines (distributed render). That's why I got dual cpu systems -  I don't want to have 5-6 computers sitting under the desk in my home office.
Corona is a different story though.

2013-06-19, 16:59:11
Reply #5

SaY

  • Active Users
  • **
  • Posts: 72
    • View Profile
Because the raw CPU computation in corona is optimized pretty well, you will hit the memory bandwidth limit pretty hard with huge number of threads. But it might be possible you are hitting other limits:
Frame buffer access: try editing the export.conf file of the benchmark, change int pathtracingSamples      = 16 to 128, what will the comparison look like?
I tried that, speed dropped by about 20% on both computers.
I'm not sure if memory bandwidth is a problem, according to intel it E5  has almost 2x more bandwidth than the x56xx platform.

http://www.intel.com/content/www/us/en/benchmarks/server/xeon-e5-hpc/xeon-e5-hpc-memory-bandwidth-stream.html

2013-06-19, 17:02:36
Reply #6

Juraj

  • Active Users
  • **
  • Posts: 4761
    • View Profile
    • studio website
According to this post, all seems rather well with the E5 gen, look at those 20mil/ray/sec :- )

http://forum.corona-renderer.com/index.php/topic,357.0.html
Please follow my new Instagram for latest projects, tips&tricks, short video tutorials and free models
Behance  Probably best updated portfolio of my work
lysfaere.com Please check the new stuff!

2013-06-19, 17:11:15
Reply #7

Ondra

  • Administrator
  • Active Users
  • *****
  • Posts: 9048
  • Turning coffee to features since 2009
    • View Profile
Because the raw CPU computation in corona is optimized pretty well, you will hit the memory bandwidth limit pretty hard with huge number of threads. But it might be possible you are hitting other limits:
Frame buffer access: try editing the export.conf file of the benchmark, change int pathtracingSamples      = 16 to 128, what will the comparison look like?
I tried that, speed dropped by about 20% on both computers.
I'm not sure if memory bandwidth is a problem, according to intel it E5  has almost 2x more bandwidth than the x56xx platform.

http://www.intel.com/content/www/us/en/benchmarks/server/xeon-e5-hpc/xeon-e5-hpc-memory-bandwidth-stream.html
The problem is not with sequential bandwidth, but random access, which is sadly completely different story
Rendering is magic.How to get minidumps for crashed/frozen 3ds Max | Sorry for short replies, brief responses = more time to develop Corona ;)

2013-06-19, 20:49:53
Reply #8

SaY

  • Active Users
  • **
  • Posts: 72
    • View Profile
The problem is not with sequential bandwidth, but random access, which is sadly completely different story
But random access can't be lower then the previous generation xeons, right?
What I see is 54GHz (combined) new generation cpu is 15% slower then the previous generations 48GHz...

2013-06-19, 21:12:38
Reply #9

Ondra

  • Administrator
  • Active Users
  • *****
  • Posts: 9048
  • Turning coffee to features since 2009
    • View Profile
no, but it can kill the parallel scaling. Fewer more powerful cores (higher frequency) can then be better. But this is a speculation. I would have to do some profiling with those machines, and I dont have the hardware nor skills for that right now. I am hoping eventually I'll be able to hire a specialist for this kind of optimization.

BTW: have you tried using PT+PT instead of PT+HD as I told you?
Rendering is magic.How to get minidumps for crashed/frozen 3ds Max | Sorry for short replies, brief responses = more time to develop Corona ;)

2013-06-20, 03:50:10
Reply #10

SaY

  • Active Users
  • **
  • Posts: 72
    • View Profile
gi.secondarySolver set to 1 helped:
E5   - 18.2M
x5680 - 16M
So I guess HD is what slows down the E5?

2013-06-20, 10:07:01
Reply #11

Ondra

  • Administrator
  • Active Users
  • *****
  • Posts: 9048
  • Turning coffee to features since 2009
    • View Profile
ok, that makes sense. Good news is that I can optimize this myself
Rendering is magic.How to get minidumps for crashed/frozen 3ds Max | Sorry for short replies, brief responses = more time to develop Corona ;)

2013-06-20, 15:49:00
Reply #12

SaY

  • Active Users
  • **
  • Posts: 72
    • View Profile
That's great, thanks.
Vray actually had a similar problem initially, the light cache was acting weird with the E5 xeons. They got it fixed pretty fast.
« Last Edit: 2013-06-20, 16:08:40 by SaY »

2013-08-12, 16:13:21
Reply #13

Ondra

  • Administrator
  • Active Users
  • *****
  • Posts: 9048
  • Turning coffee to features since 2009
    • View Profile
ok, if the problem is localized to HD cache, then it is not a bug. I have put HD cache multithreading optimization into my TODO list
Rendering is magic.How to get minidumps for crashed/frozen 3ds Max | Sorry for short replies, brief responses = more time to develop Corona ;)