I think the mistakes in this test are:
1) Using pass limit instead of time limit - this does not tell us anything. Maybe the rendering was super fast in one case, and slow in the other, or maybe not...? So I would suggest using time limit.
2) Using unrealistic number of passes rendered and resolution. There was simply not enough passes to render details in reasonable quality, denoised or not. The resolution is also so low that we cannot really see the details, and this is not something you would do in real work.
So it would be best to re-render in something like at least 1280x720, with some more realistic time limit - 10 minutes at least?