1
[C4D] Bug Reporting / Teamrendering - Clients do not start next job in queue after some time
« on: 2025-02-20, 11:46:57 »
Hey there,
first of all, here are our Renderfarm Specs:
Teamrender Server Machine:
CPU: 32 x 2,4 GHz (I think its an intel, but not sure about that)
RAM: 512 GB
System: Windows 10
Teamrender Server and Corona are on the latest version. No other 3rd party plugins besides corona are installed.
Teamrender Clients (3x):
CPU: AMD Threadripper 24 x 4,0 GHz
RAM: 64 GB
System: Windows 11
Teamrender Client and Corona are on the latest version. No other 3rd party plugins besides corona are installed.
The Machines are all connected via a dedicated 10 Gigabit Switch.
Since quite some time now, we have problems with our renderfarm.
For example, i put 10 jobs (Still images) in the servers queue and start them all at once.
The first jobs are running fine and every client is participating. Then at some undefined time the clients dont start the nex job in the queue. In fact, not all clients stop at once. There might be one machine which switches to idle after completing a job and not starting the next one. Machines that turn idle, will stay there until you add and start a new renderjob. The jobs which are pending in the active Queue seem to be ignored. So after a few Jobs, no client machine is rendering anymore, status is idle and there are still jobs in the active queue with status pending. There is no error or warning in any log what so ever. It almost semms, that the clients don't see anymore jobs in the queue after turning idle. Only "Workaround" is to restart the clients software (i.e. via the Temarender Webinterface). Once restarted, the clients start immediatly with the next job in the queue. We will contact the maxon support parallel to this post. But i have some theories what could cause this issue:
While monitoring the Rendering process on the farm i saw, that one machine renders a bit slower than the other two. When the "faster" machines are done with their part, they directly start with the next job in the queue while the slower machine still renders the old job. When this machine has finished, it will also start on the job, the other machines already are rendering. Maybe at some point, the gap between the faster machines and the slower one is too big and the machine doesn't know where to continue. But on the other hand, the two faster machines also switch to idle eventually.
Another theory would be, that maybe the teamrender software counts a job a finished before denoising starts and therefore there will be an expanding gap between what the machine thinks means finished and when it is actually finished. This might also explain why the web interfaces progress bar goes over 100% (around 120% mostly).
Another point might be the different Windows versions. But AFAIK the Server-Machine can not be upgraded to win 11. Maybe there is a general problem with teamrendering on win11?
I'm not sure, if the Teamrender settings in the corona rendersettings could help solving this problem. With our old farm (same server machine, two older client machines) i figured out the values for fastest rendering (60s / 150MB). On our old Farm we didn't have the clients "idle" problem.
I hope, I explained our issue somewhat understandable ;)
If you have further questions, let me know!
Have a nice day!
Moritz
first of all, here are our Renderfarm Specs:
Teamrender Server Machine:
CPU: 32 x 2,4 GHz (I think its an intel, but not sure about that)
RAM: 512 GB
System: Windows 10
Teamrender Server and Corona are on the latest version. No other 3rd party plugins besides corona are installed.
Teamrender Clients (3x):
CPU: AMD Threadripper 24 x 4,0 GHz
RAM: 64 GB
System: Windows 11
Teamrender Client and Corona are on the latest version. No other 3rd party plugins besides corona are installed.
The Machines are all connected via a dedicated 10 Gigabit Switch.
Since quite some time now, we have problems with our renderfarm.
For example, i put 10 jobs (Still images) in the servers queue and start them all at once.
The first jobs are running fine and every client is participating. Then at some undefined time the clients dont start the nex job in the queue. In fact, not all clients stop at once. There might be one machine which switches to idle after completing a job and not starting the next one. Machines that turn idle, will stay there until you add and start a new renderjob. The jobs which are pending in the active Queue seem to be ignored. So after a few Jobs, no client machine is rendering anymore, status is idle and there are still jobs in the active queue with status pending. There is no error or warning in any log what so ever. It almost semms, that the clients don't see anymore jobs in the queue after turning idle. Only "Workaround" is to restart the clients software (i.e. via the Temarender Webinterface). Once restarted, the clients start immediatly with the next job in the queue. We will contact the maxon support parallel to this post. But i have some theories what could cause this issue:
While monitoring the Rendering process on the farm i saw, that one machine renders a bit slower than the other two. When the "faster" machines are done with their part, they directly start with the next job in the queue while the slower machine still renders the old job. When this machine has finished, it will also start on the job, the other machines already are rendering. Maybe at some point, the gap between the faster machines and the slower one is too big and the machine doesn't know where to continue. But on the other hand, the two faster machines also switch to idle eventually.
Another theory would be, that maybe the teamrender software counts a job a finished before denoising starts and therefore there will be an expanding gap between what the machine thinks means finished and when it is actually finished. This might also explain why the web interfaces progress bar goes over 100% (around 120% mostly).
Another point might be the different Windows versions. But AFAIK the Server-Machine can not be upgraded to win 11. Maybe there is a general problem with teamrendering on win11?
I'm not sure, if the Teamrender settings in the corona rendersettings could help solving this problem. With our old farm (same server machine, two older client machines) i figured out the values for fastest rendering (60s / 150MB). On our old Farm we didn't have the clients "idle" problem.
I hope, I explained our issue somewhat understandable ;)
If you have further questions, let me know!
Have a nice day!
Moritz