Processors
Intel® Processors, Tools, and Utilities
14551 Discussions

Branch target address alignment on Golden Cove

RakeshD
Novice
333 Views

Intel cores used to fetch aligned 16 bytes per cycle from instruction cache; hence, Intel recommended to align branch targets to 16-byte boundaries. However, Golden Cove increased the fetch bandwidth to 32 bytes per cycle. I was wondering about its implications on branch target alignment. Do the branch targets now need to be aligned to 32-byte boundaries or does Golden Cove fetch “unaligned” 32 bytes per cycle?

 

Best,

Rakesh

0 Kudos
3 Replies
RamyerM_Intel
Moderator
284 Views

Hello RakeshD, 


Thank you for posting in the communities. To explain this to you in detail, may I please know the specific model of your CPU? I will be waiting for your reply. Thank you. 


Ramyer M.

Intel Customer Support Technician 



0 Kudos
RakeshD
Novice
276 Views

Hello Ramyer,

 

The question was not about a particular product, rather the generic microarchitecture. 

 

Intel® 64 and IA-32 Architectures Optimization Reference Manual Volume 1 (https://www.intel.com/content/www/us/en/content-details/671488/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html) mentions in Section 2.3.1 that Golden Cove fetch bandwidth in increased from 16 to 32 bytes/cycle. Further, Section 3.4.1.4 recommends to align branch targets to 16 byte boundaries. So, I assume that the 32 byte fetch (16 byte in earlier microarchitectures) has to be aligned. Is that correct? 

 

I was also wondering why the fetch from instruction cache must be aligned to 16-byte boundaries, especially when the data-cache does not have this constraint? A downside of this constraint is that a 16-byte fetch needs to be split into two fetch requests if it crosses a 16-byte boundary even within the same 64-byte cache block. For example, if we want to fetch byte_8 to byte_23 (16 bytes) from an instruction cache block, we need to make two cache accesses: first fetching byte_0 to byte_15 in one cycle and then byte_16 to byte_31 in the next cycle. However, if the instruction cache allows to cross 16-byte boundaries, just like the data cache, we need only one cycle to fetch these bytes. 

 

Thanks,

Rakesh

0 Kudos
RamyerM_Intel
Moderator
212 Views

Hello RakeshD, 


Thank you for sharing this information. I will coordinate this internally with our team so we can answer your inquiry. Rest assured that I will keep this thread updated once the information is already available. Thank you for your patience and cooperation. 


Ramyer M.

Intel Customer Support Technician 



0 Kudos
Reply