Branch target address alignment on Golden Cove

RakeshD · ‎03-26-2024

Intel cores used to fetch aligned 16 bytes per cycle from instruction cache; hence, Intel recommended to align branch targets to 16-byte boundaries. However, Golden Cove increased the fetch bandwidth to 32 bytes per cycle. I was wondering about its implications on branch target alignment. Do the branch targets now need to be aligned to 32-byte boundaries or does Golden Cove fetch “unaligned” 32 bytes per cycle?

Best,

Rakesh

RamyerM_Intel · ‎03-29-2024

Hello RakeshD,

Thank you for posting in the communities. To explain this to you in detail, may I please know the specific model of your CPU? I will be waiting for your reply. Thank you.

Ramyer M.

Intel Customer Support Technician

RakeshD · ‎03-29-2024

Hello Ramyer,

The question was not about a particular product, rather the generic microarchitecture.

Intel® 64 and IA-32 Architectures Optimization Reference Manual Volume 1 (https://www.intel.com/content/www/us/en/content-details/671488/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html) mentions in Section 2.3.1 that Golden Cove fetch bandwidth in increased from 16 to 32 bytes/cycle. Further, Section 3.4.1.4 recommends to align branch targets to 16 byte boundaries. So, I assume that the 32 byte fetch (16 byte in earlier microarchitectures) has to be aligned. Is that correct?

I was also wondering why the fetch from instruction cache must be aligned to 16-byte boundaries, especially when the data-cache does not have this constraint? A downside of this constraint is that a 16-byte fetch needs to be split into two fetch requests if it crosses a 16-byte boundary even within the same 64-byte cache block. For example, if we want to fetch byte_8 to byte_23 (16 bytes) from an instruction cache block, we need to make two cache accesses: first fetching byte_0 to byte_15 in one cycle and then byte_16 to byte_31 in the next cycle. However, if the instruction cache allows to cross 16-byte boundaries, just like the data cache, we need only one cycle to fetch these bytes.

Thanks,

Rakesh

RamyerM_Intel · ‎04-02-2024

Hello RakeshD,

Thank you for sharing this information. I will coordinate this internally with our team so we can answer your inquiry. Rest assured that I will keep this thread updated once the information is already available. Thank you for your patience and cooperation.

Ramyer M.

Intel Customer Support Technician