![Lead Image © Lucy Baldwin, 123RF.com Lead Image © Lucy Baldwin, 123RF.com](/var/ezflow_site/storage/images/archive/2019/54/thrashing-the-data-cache-for-fun-and-profit/123rf_9719495_martialarts_lucybaldwin_resized.png/168877-1-eng-US/123RF_9719495_MartialArts_LucyBaldwin_resized.png_medium.png)
Lead Image © Lucy Baldwin, 123RF.com
Thrashing the data cache for fun and profit
Ducks in a Row
In the past [1], I discussed how instruction cache misses can completely ruin your CPU's day when it comes to execution speed by stalling processing for thousands or even millions of cycles while program text is being retrieved from slower caches, or even from (gosh!) RAM [2] [3]. Now the time has come to turn your attention to the data cache: Where large enough datasets are involved, spatial locality and cache-friendly algorithms are among the first optimizations you should employ.
Three Times Slower
Compare the code presented in Listings 1 and 2. You might use diff
[4] to verify that the only difference is in the order of the i
and j
variables in the array lookup. Listing 1 increments 100 million integers, scanning the array row by row, whereas Listing 2 increments the 10,000 elements of a column before moving to the next column.
Listing 1
row.c
#include <stdio.h> #define SIZE 10000 ** int array[SIZE][SIZE] = {0}; int main() { for (int i = 0; i < SIZE; i++) for (int j = 0; j < SIZE; j++) array[i][j]++; return 0; }
Listing 2
column.c
#include <stdio.h> #define
Buy this article as PDF
(incl. VAT)