The aim of optimization is to:
Important note: increasing infrastructure reliability should be the main objective. Once infrastructure is fully stable and capable to face datacenter external and internal failures, free time will be available to optimize, experiment new optimizations, etc.
Power usage effectiveness (PUE) is an indicator to measure data center energetic performance. PUE should takes into account all energy losses. There are however many formulae variations seen in datacenters. In theory, PUE should be:
PUE=(Total facility energy used)/(Useful energy used)
Note that this formulae can leads to PUE bellow 1 if heat is re-used.
This also leads to interpretations (commercial guys really like this ). Most of the time, this result in:
PUE=(Total facility energy used)/(IT energy used at PSU)
However, this is not really accurate as for example, there are often other components in the datacenter than the cluster on the transformers (lights, offices, etc), and also, if measure is taken at PSU, with an air cooled server for example, this does not take into account the fans inside server (which is cooling part), etc. Also, this PUE does not include IT PSU performances, as IT energy used is taken at IT plug, before IT PSU. Measuring energy used by the whole IT system after PSU would be difficult.
In general, the more probes on Electrical/cooling/IT equipments, the more accurate is PUE. You need hypothesis for the remaining variables.
Standard PUE are 1.4-1.6 for air cooled datacenters, and 1.1-1.2 for watercooled datacenters.
The best way to really reduce PUE are:
However, PUE does not render the calculation performances, i.e. flops/watt which is also an important performances indicator.
With new green computers, it is important to take into account flops per watts, considering the energy measured before IT PSU (Local):
Flops/W_L=(flops delivered)/(IT energy used)
Still, this indicator do not take into account network and real applications performances per delivered watt.
First and most important: think globally. If optimizing somewhere generates a major loss somewhere else, then it’s not globally efficient.
Second and important thing: information is key. To optimize a datacenter, you need probes everywhere, and you need to monitor these probes before a modification, after, and then after on the long term. If you don't have money, use Arduinos combined with cheap sensors like DHT11.
To optimize cooling, main objective is to increase temperatures delta at maximum, taking into account:
A good strategy would be:
Of course, to increase Delta T in IT room, use confinement. There is no need to buy very expensive confinement, just ensure it resists fire and doesn't disturb fire detection/extinction strategy. This will massively increase Delta T into air handling unit.
Same strategy apply for watercooled IT equipments (means CPU is directly cooled by water). Try to increase Delta T at maximum. However, consider seriously the delay to shut everything down in case of emergency. Main water loop of watercooled systems has often small volume, and temperature increase very quickly in case of cooling failure.
There are not a lot of ways to optimize power consumption. It is only a mater of good calibration:
In general, optimize equipment range of use: too much loaded means equipment is in danger, too less loaded equipment means low efficiency.
Resources :
Articles :