The NUMA issue was dominant in Opteron architecture, but not needed in 2p Xeons. "Distributed rendering" in single machine is also very stupid workaround, absolutely defeating the point of machine.
There must be something else. If 40 and 48 threads machines are still used fine, it's strange it's 72 that suddenly causes issue to only use half. Maybe just don't use HT until then, it's better then to use just half.