pick cards, set your kernel's arithmetic intensity, read the roofline. every number derives from physical parameters tested against vendor spec sheets. the terminal version does far more: github.com/nuemaan/kernelmeter
rough guide: elementwise ops ~0.2, softmax ~1, conv ~10 to 100, big matmul ~300+