We use cookies to provide you with a better service. Carry on browsing if you're happy with this. You can find out more about cookies here

Accept

Which is faster for Machine Learning and AI: CPU or GPU?


If you were plowing a field, which would you rather use: [eight] strong oxen or [32768] chickens?”*. Seymour Cray (1925 ~ 1996).

The short answer is: it all depends.


You probably have come across countless articles that praise the  power of the GPUs (Graphics Processing Units) and how they are much  capable than CPUs (Central Processing Units). While this is true in many  use cases, it’s quite the contrary in many others.

 In order to determine where GPUs can be preferred over CPUs (or the  other way around), it is helpful to have a basic understanding of the  architectures that underpin each of these platforms.

The original architecture of CPUs assumed that software would be  executed sequentially, line after line. As such, the Organisation,  Instruction Set and Memory Management were designed accordingly. The  emphasis was to ensure that every instruction is executed as fast as  possible. Practically, however, there is a limit to how fast a CPU can  operate. The fastest CPUs from Intel and AMD today are “officially”  clocked well below 5GHz. This limitation can be felt very clearly in  Multimedia, Gaming and Big Data applications where the fastest CPUs  still perform poorly.

 In order to overcome this performance bottleneck, a new architecture  was proposed. In sharp contrast to the traditional CPU architecture,  emphasis was put on executing as many instructions as possible in  parallel at the cost of slow sequential execution. This had implications  on the Memory Management, in which case the GPU memory delivers much  higher bandwidth (more Gigabytes per second) but stretches the latency  significantly (the time it takes for data to be transferred). Hence, the  GPU architecture is commonly described as “Gather/Scatter”. GPUs are  also clocked at a considerably lower frequency, typically around 1GHz  only to keep energy consumption and heat under control.

Today, a typical high-end CPU has 24 Cores with Hyperthreading and  85GBps memory bandwidth (Intel Xeon E7-8894 v4). A high end GPU has 3584  Cores and over 700GBps memory bandwidth (Nvidia Tesla P100).

To summarise, the architectural trade-offs have put the CPUs and the  GPUs in contradicting positions. To determine which one is most  appropriate for a given task, Amdahl’s Law can be of great help. In  essence, not every task can fully benefit from parallel execution (for  various reasons). Ideally, the “sequential” part should be assigned to  the CPU, while the parallelised part to the GPU.


The key question here is how much of a program is sequential, and how much is parallel?

From a Machine Learning perspective, algorithms such as Artificial  Neural Network and Random Forest exhibit very high degree of  parallelism, making them excellent fit for GPUs. On the other hand,  algorithms such as GLM tend to be very sequential and would perform very  poorly on GPUs.


After all, 32758 hens could beat 8 strong oxen, but only if all of the hens were put to work, simultaneously. 

 

* this is a slightly modified version.