I know I should not talk about it, because it will almost surely get misinterpreted into some horrible advices, but whatever ;))
If you need to render 100 000 000 grass blades (for simplicity 1 blade = 1 polygon), obviously you dont want to do no instancing ( 1 * 100 000 000) - it would not fit into RAM.
You also dont want to scatter individual blade 100M times (100 000 000 * 1), the scatter would compute for ages (and then it would crash on RAM).
Ideally you want to do 10 000 blades 10 000 times. It will give you the lowest RAM consumption (which is calculated as 10 000 instances PLUS 10 000 polygons - so 20 000 units of RAM - in other examples it would be 100 000 000 instances + 1 polygon = 100 000 001 units of memory, or 1 instance + 100 000 000 polygons = again 100 000 001 units of memory). You will also get the lowest precomputation time (which is again rougly computed as number of polygons PLUS (+) number of instances, NOT TIMES (*)). You will also get reasonable speed (highest speed is usually when using no instancing at all, but we already established that is not practical).
Obviously this is not applicable for many situations (such as scattering cars/trees- you cannot replace 5 highpoly cars with 5 000 lowpoly ones ;)). For scenarios like 1000 poly/instances this does not matter at all. Also cost per instance might be bigger than cost per triangle, so maybe it is better to go for instance polycount = 3*instance count. Or maybe 10? Hard to say ;)