Seems pretty self explanatory to me. Judgeing by the delay being .2us and the frequency being 20mhz you're using a pic (the time constant of 20mhz is actually .05us but the PIC instruction cycle is four clocks) So just calculate it that way. 1/Clock frequency * 4 = single cycle delay time. I don't believe such a routine can delay for just a couple cycles as the branching alone takes a certain number of instructions. Delays of only a few instruction cycles would best be achieved by using inline assembly NOP's, there may even be a C function for it, but if you need controlling of timeing that severe you should be programming in assembly anyways.