Author Topic: 10Gbe setup  (Read 1236 times)

2022-01-20, 16:12:43

hrvojezg00

  • Active Users
  • **
  • Posts: 259
    • View Profile
    • www.as-soba.com
Hi all, we`ve been running 10GbE setup for quite some time now, and I would like to be sure we have it setup right as sometimes it doesn`t perform as (I think) it should. We have Qnap TS-2888 connected to Cisco 350x 10GbE via 8 ports in port trunk showing 80Gbps speed. All workstations and render nodes have 10 GbE nics and run full 10 Gbps speed. Thing is that when rendering in DR, when 7-8 nodes are downloading max file, they download at around 300-500 Mbps, even though theoratically should be close to 1000 Mbps. When we purchased the NAS, Qnap support recommended this setup, what do you think?

2022-01-22, 12:53:55
Reply #1

1equals2

  • Active Users
  • **
  • Posts: 75
    • View Profile
The 10 Gbe network setup is bit more complex than many of us expected. :)

What type of disks do you have the Qnap populated with? Are all of them ssd's or hdd's?Do you use ssd cache in case of hdd? Are they in raid  or  each bay is used separately?
What brand are the nics - intel ones or asus/aquantia? Do you use the Qnap with the embedded 10GBase-T or you have dedicated card for the 10Gbit connection?
 Personally have the few Asus nics and experience inconsistent speeds to the server, which uses onboard CPU embedded 2 x 10GBASE-T. Would say that is not the best combo, because the speeds are achievable but inconsistent, probably due to overheating which force the nics and the embedded 10GBASE-T to throttle down and then speed drops from 300-350MBps  to 100-120MBps. 


Other places to find the work of Angel Kostov
https://www.instagram.com/1equals2/
https://www.behance.net/ARCHO

2022-01-24, 08:01:11
Reply #2

hrvojezg00

  • Active Users
  • **
  • Posts: 259
    • View Profile
    • www.as-soba.com
Hi,

Yes it`s a very complex subject, I agree. We use ssd for everything except backup, so there shouldn`t be any bottleneck there. Also, all nodes and NAS run Intel x540-t2 nics, not embedded ones. So my question is should we expect 10 GbE speeds on at least 5-6 at a time since we have 80Gbps speeds or it just doesn`t work that way?

2022-01-24, 09:53:44
Reply #3

1equals2

  • Active Users
  • **
  • Posts: 75
    • View Profile
Maybe I am  wrong but isn't the intel x540-t2  only capable of 10 Gbe - max speed of 1-1 1.2 GBps and this under specific circumstances.
As far as can understand -  although the switch is usung  port trunk and showcases 80 Gbps  the speed to the Qnap is limitted from the intel nic, which supports only 10Gb, maybe 20 if both ports are used with Link aggregation, which is a story of its own.
Another colleague, living in different country had to switch from RJ46 copper cable to SFP.  Regardless of patch cables he tried (CAT6a, CAT7, even CAT8), he never managedto get  sustained speed with his  small 10Gb network setup. He eventually swapt  the copper NICs to  SFP ones and shares different experience.

Since You mentioned that rendering in DR does not actually engage the network in full capacity, think the  issue is within the file formats which are being transferred over the network  and the  software environment. Several hundred jpeg files with file size of up to 10 Mb  from random locations is overkill  regardless of system. Have you tested the setup and speed if you copy  single 1-5 GB file (psd, exr) from the Qnap to all computers simultaneously - do you get the very same network speed?

Other places to find the work of Angel Kostov
https://www.instagram.com/1equals2/
https://www.behance.net/ARCHO

2022-01-24, 13:32:00
Reply #4

hrvojezg00

  • Active Users
  • **
  • Posts: 259
    • View Profile
    • www.as-soba.com
You are correct, but shouldn`t all nodes be at max 1.1-2. GBps since network isn`t bottleneck? Issue occurs when uploading 3dsmax file in DR and sending passes to the master, doesn`t go below 300 MBps, but rarely higher when 7-8 nodes upload at the same time. I`m not benchmarking when copying multiple files, but only at DR. Could be 3dsmax issue, don`t know.


2022-01-25, 10:04:22
Reply #5

1equals2

  • Active Users
  • **
  • Posts: 75
    • View Profile
Should not the speed of the  nodes be limitted to the potential network/drive speed? Meaning if you have  8 nodes each of them having potential speed of 300 MBps (megabytes, not megabits) while rendering, this gives approximate network saturation of 2500 MBytes, which is roughly 20Gbit network( 2x 10 Gbit used in link aggregation).  Even though your nvme network storage potentially could reach let's say 6000 MBps, there is no way to reach its full capacity, due to the network limitation.
If your disk storage/array for some readon is not that capable  in terms of speed and  reaches speeds of 1500 MBps, then it is what is limitting your speeds to the nodes.

What speeds do you get if You use less  nodes in distrubuted - like 2-3. Is it the same? I would personally test each node speed to the network mapped drive with a simple tool like Crystal mark  and see its potential speed to the network mapped drive.
Check all nodes and see if they behave the same way and get similar speeds to the network place. Hopefully you reach almost equal speeds from each nodes to the server. My experience is  bit different, despite the same  settings of  the NICs installed and same brand.

These rough numbers are withount bringing the software in the equation. Maybe someone else could help, but pretty sure that Network mapped drives under windows environment in conjunction with the Microsoft Defender limits already the crippled 3ds max network reading speeds. There are some claims of better speeds with 3ds max 2022, but the story is the same with big scenes with many FP and RC objects. Speaking from user experience.
Other places to find the work of Angel Kostov
https://www.instagram.com/1equals2/
https://www.behance.net/ARCHO

2022-01-26, 09:22:28
Reply #6

hrvojezg00

  • Active Users
  • **
  • Posts: 259
    • View Profile
    • www.as-soba.com
Thanks for getting into it. I will do some more testing with 2-3 vs 8-10 nodes without any other work besided DR is done. Crystal Mark shows on each node 1150 RW, so all good with that.

Regarding hardware setup with 8 adapters port trunked, is that the way to go or we should alter it?

2022-01-28, 15:55:04
Reply #7

1equals2

  • Active Users
  • **
  • Posts: 75
    • View Profile
Port trunking  is the other name for link aggregation.
If your switch has enought ports, sure, why not have several ports trunked.
But what is the point to use 8 ports, thus having potential 80 Gbit, whereas your nics could potentially reach 20Gbit (link aggregation), which is  bascailly limit of 2.5 GBps.

Maybe I am oversimplifing  equation :)
Other places to find the work of Angel Kostov
https://www.instagram.com/1equals2/
https://www.behance.net/ARCHO

2022-02-02, 11:28:25
Reply #8

hrvojezg00

  • Active Users
  • **
  • Posts: 259
    • View Profile
    • www.as-soba.com
That is what I want to find out. Does 80 Gbit work only for each node at that speed (if it can achieve it) or it allows eight nodes on 10 Gbit connection to run at pretty much max speed?

2022-02-02, 19:08:25
Reply #9

1equals2

  • Active Users
  • **
  • Posts: 75
    • View Profile
The switch is the only device which works at 80Gbit, as far as can understand from the setup.
If all PCs/servers use NICs with 1 port x 10Gbit, there is no way any device to be working at higher speed than these 10Gbit.

Maybe am wrong but here is how understand your setup -   10Gbit Server NIC -> 80Gbit Switch (8x 10Gbit trunked ports) ->10Gbit Workstation/Node NIC - so the speeds are capped 10 Gbit - 1250 MBps per each node. If 3 or 4 nodesconnect directly and simultanously - then it is normal to get 300-400 MBps for each node.

If you server configuration (SSD or NVMe storage setup) is capable of speeds above 1250 MBps, just try using the server NICs (if has more than 1 port) with link - aggregation activated. Thus You will get at least 20Gbit  from the server and it will teoretically increase speeds  to nodes if they are used simultanously. Still there is no way to have  all nodes transfering at 1250 MBps for all nodes simultanously unless, the server raid setup is capable of a speed of at least 5000-7000 MBps and if it has a NIC which has more than 4 x 10Gbit ports, used in link aggregation.

Once again, there are just general assumptions and pretty rough numbers.

Other places to find the work of Angel Kostov
https://www.instagram.com/1equals2/
https://www.behance.net/ARCHO