4.5.2. Memory Bandwidth

In [124], Jouppi et al. compare the performance of several processors used by Google cloud-based systems on inference tasks when running various types of NNs. The analysis uses a roofline model, where the performance of the algorithms is plotted based on the computational performance (operations per second) versus the operational intensity (number of operations per byte of data). Typically, in cloud-based architectures, the overall performance is limited by the memory bandwidth, and as the operational intensity tends to increase, the performance is limited by the computational capacity of the computer system architecture. Recent hardware architectures, notably SoC architectures, are focused on increasing the memory bandwidth to address the continuously growing demand of AI [98].
