In other words, the store has to wait in the store buffer until either the target line is written into the LFB, and then the line is modified in the LFB, or the target line is written into the L1D, and then the line is modified in the L1D. This suggests that cacheable stores are not committed to the LFB if the target line is not in the L1D. To the line fill buffers (LFB) in the case of non-temporal stores. It can maintain up to 36 store operations fromĪllocation until the store value is committed to the cache, or written The L1 DCache can maintain up to 64 load micro-ops from allocation This is supported by the following statement from Section 2.4.5.2 of the Intel optimization manual: My understanding is that for cacheable stores, only the RFO request is held in the LFB, but the data to be store waits in the store buffer until the target line is fetched into the LFB entry allocated for it. (2) A fill buffer is allocated on a cacheable store to the L1 cache and the target line is not in a coherence state that allows modifications. The LFB holds theĭata as it comes in to satisfy the L1D miss but before all the data is
If you search for 'fill buffer' in the PDF you can see that the Lineįill buffer (LFB) is allocated after an L1D miss. But the requested part of the cache line can still be provided to the destination register even if the line has not yet been written to the cache data array. In case of a load request, the allocated fill buffer is used to temporarily hold requested lines from lower levels of the memory hierarchy until they can be written to the cache data array. If there was no fill buffer available, load requests keep piling up in the load buffers, which may eventually lead to stalling the issue stage. (1) A fill buffer is allocated on a load miss (demand or prefetch) in the cache. However, Section 12.10.3 says that streaming loads fetch the target line into buffers called streaming load buffers, which are apparently physically different from the LFBs/WCBs.Ī line fill buffer is used in the following cases: This could be interpreted as the LFBs are also used by streaming load requests ( MOVNTDQA). That same quote also says that the term WCB was used on older processors because streaming loads were not supported on them.
Basically, Intel sneakily renamed WCBs to LFBs at that time, but did not clarify this in their manuals since then. I think the term LFB was first introduced by Intel with the Intel Core microarchitecture, on which all of the 8 LFBs are WCBs as well. Processors only streaming stores were supported. Referred to as "Write Combining Buffers", since on some older
Manual for the number of fill buffers in a particular processor However, this article written by a person from Intel says:Ĭonsult the Intel® 64 and IA-32 Architectures Optimization Reference I could not find a clear statement in the manual that says LFBs are the same as WCBs on all of these microarchitectures. has observed that there are 20 LFBs on Cannon Lake. According to this, there are 12 LFBs on Skylake. Core and Core2 have 8 LFBs per physical core.
Nehalem up to Broadwell include 10 fill buffers at each L1 data cache. The total number of line buffers and those capable of write-combing has increased steadily with newer processors. In early processors (such as Pentium II), only one of the fill buffers is capable of write-combining (and write-collapsing). Each fill buffer can hold a single cache line and additional information that describes the cache line (if it's occupied) including the address of the cache line, the memory type, and a set of validity bits where the number of bits depends on the granularity of tracking the individual bytes of the cache line. If the cache is shared between logical cores or physical cores, then the associated fill buffers are shared as well between the cores. The collection of fill buffers at L2 are called the super queue or superqueue (each entry in the super queue is a fill buffer). Write-Combining Buffers on Intel ProcessorsĮach cache might be accompanied with zero or more line fill buffers (also called fill buffers). This answer is about Intel and AMD processors only. I'd like to emphasis that the term "write buffer" may mean different things in different contexts. This answer may not apply to processors not specifically mentioned. Write buffers can have different purposes or different uses in different processors.