1 year ago

#199964

test-img

Elliot Gorokhovsky

Is prefetch useless if it doesn't complete before load?

Let's say we have this pseudo code, where ptr is not in any CPU cache:

prefetch_to_L1 ptr
/* 20 cycles */
load ptr

Since ptr is in main memory, the latency of the prefetch operation (from prefetch instruction decoding to ptr being available in L1 cache) is much greater than 20 cycles. Will the latency of the load be reduced at all by the in-progress prefetch? Or is the prefetch useless unless it completes before the load?

Naively (without much understanding of how the memory system works) I could see it working two ways:

  • When the CPU executes the load, it somehow identifies that a prefetch is in progress for the same address, and waits for the prefetch to complete before loading from L1.
  • The CPU sees that the address is not currently in cache and goes to main memory, ignoring the prefetch operation executing in parallel.

Is one of these correct? Is there some third option I haven't thought of? I'm interested in Skylake in particular, but also just trying to build some general intuition.

performance

optimization

intel

cpu-architecture

micro-architecture

0 Answers

Your Answer

Accepted video resources