The research continues in 2023 and the fastest multiply got faster! Thanks to a 6502 simulator written in C and the analyzing of statistics in the program branches and boundary crossings, the exact speed of routines are now known. This analysis has inspired new optimizations! The new routine executes in a blazing 188.1 cycles on average (a 6% speedup or 10.5 cycles faster from my original code which was 198.6), with inputs in zero page and outputs in zero page plus one register, not including caller setup or RTS.

