important question for anyone good at x86. can microcode cache the top of the stack in processor registers for sufficiently nearby pushes and pops or do stack accesses always require a cache access no matter what
@mothcompute Apparently so; I was thinking that 'store forwarding' would be the thing that lets this happen; but when I was hunting for a reference (e.g. see 24.17 below); I came across 'Mirroring memory operands' 24.17/page 236 which says 'It also works with PUSH and POP instructions.'. Note, I doubt Microcode gets involved - I think microcode only happens for big complex stuff, not anything fast.
@penguin42 thats *exactly* what i was looking for. thank you
@penguin42 its very interesting that it mentions that its present in zen 2 but not zen 3 because those are the two machines i usually write for. maybe i can try comparing performance per clock between them in these sections
@mothcompute Yeh I guess all these forwarding mechanisms are really complex and interact with the rest of the out of order pipeline; its possible they hit a bug in zen 3 and decided to take it out/turn it off rather than fix it before release; difficult to know; and I'm guessing some of the other parts of the forwarding from the store buffer might get you some of the performance anyway; shrug.