Just an idea. Could you use a 1ms timer ISR and 16 counter bytes (one per pin), if byte is not 0, flip coresponding pin high, decrement the counter each time (if not 0). When zero flip the pin low and ignore it? Then feed the 16 bytes from you program, 20 is 20ms high then back to 0. Main program could track the delay times?
Or would this cause a power issue and why the delay routine is better?
Or you maybe need some blocking to let them get were they are going to?