blog |

Invest 1k to Save 100k

Data Scientists are scarce resource, and the situation is about to get much worse. Their time is very precious and better be well utilized. They are highly paid, but more importantly however, the outcome of their work is critical for the survival of the business in highly competitive markets where time is of the essence.

It’s surprising how many data scientists are still using low-specs laptops or desktops computers. To put this into perspective (and for fun too), I have written a simple benchmarking software in C++ that estimates the value of Pi (π). The benchmark assigns the computation to a single processor thread, and scales up exponentially to 1024 threads!

Here, I used OpenMP pragmas to help parallelise the computation, though there are other important considerations to make this work (code snipped at the end of this article).

The benchmark was run on different hardware platforms, ranging from 2-Core Intel i3 laptop to a beefy 10-Core Xeon workstation. For convenience, the results were normalized to the legendary Intel i7-920. Higher values indicate better performance.

Technically speaking, the results should make two points clear. First, the most expensive (by far) computer in this round is not necessarily the best. In my opinion, a high-end desktop processor such as Intel i7-6800K is a good compromise between cost and performance. Secondly, software optimisation has far more potential to increase performance (i.e. parallelising single-threaded code). You should aim to get the best of the two worlds.

The diagram below measures execution time in seconds. Lower values indicate better performance.

Okay, so what does this mean for your business?

Well, if your data scientists are still using slow computers, then it’s time to open up your wallet for a little investment.

As promised, here is code snippet from the benchmark.

omp_set_num_threads(threads);

time = omp_get_wtime();

for (i = 0; i < num_steps; i++)

{

double x = 0.0;

double y = 0.0;

x = (i + 0.5)*step;

y = y + 4.0 / (1.0 + x*x);

sum[i] = y;

}

#pragma omp parallel for reduction (+:grand_sum)

for (i = 0; i < num_steps; i++)

{

grand_sum += sum[i];

loop++;

}

pi = step * grand_sum;

time = omp_get_wtime() - time;

#pragma omp parallel for

for (i = 0; i < num_steps; i++)

sum[i] = 0.0;

pi = 0.0;

grand_sum = 0.0;

threads = threads * 2;

see more posts