|Title||To distribute or not to distribute: the question of load balancing for performance or energy|
|Publication Type||Conference Proceedings|
|Year of Publication||2017|
|Authors||Stafford, E, Pérez, B, Bosque, JLuis, Beivide, R, Valero, M|
|Place Published||Euro-Par 2017: Parallel Processing, Santiago de Compostela|
There is an ever growing interest on heterogeneous systems in the HPC comunity, by integrating GPUs, as they increase the computing power and improve the energy efficiency of these large systems . The programming of these is based mainly in frameworks or APIs like CUDA and OpenCL, designed around the Host-Device programming model. Which relies on offloading data-parallel sections to the accelerator while the CPU remains idle. During the latter, despite not contributing computational effort to the system, the devices still draw a significant amount of power, known as static power consumption . This leads to think that a load-balanced co-excution might be necessary to improve the efficiency of the system. However, with the above frameworks, co-execution is possible but far from trivial, and neither is determining the optimal load balance.