ARTICLE AD BOX
I am writing my first interpreter and am interested in using tail calls to make branch prediciton better.
Consider I have something like
template<void *Handler> void wrapper(Cpu& cpu) { Handler(cpu); cpu.post_instruction(); [[clang::musttail]] return dispatch_table[cpu.get_next_instr_token()](cpu); }Can the compiler fold or merge the tail return call sites for different template-instantiated functions? if so then the whole point of having different branch sites that the branch predictor can learn from is defeated.
