Nigel, just to help you out, I ran it on my simulator.
Here are the results:
Including 2 states for the initial JMP past the standard 8051 interrupt vectors (which is good programming).
If you don't want this jump considered in the calculation, subtract two instruction cycles from everything that follows.
start:
2 instruction states to get to this point) MOV R1,A
loop1:
MOV R2, #250
loop2:
MOV R3, #250
loop3:
NOP
NOP
7 states to get to this point, and none of the above is in the loop, so is never counted again) DJNZ R3,loop3
1005 states to get to this point) DJNZ R2,loop2
250754 states to get this point) DJNZ R1,loop1
64192771 states to get to this point) end
Now, take all those states and multiple them by your instruction cycle time to get the actual run time (in minutes and seconds, etc..)
For example, at 1 microsecond instruction time, that would be: 64192771 * 0.000001 Sec = 64.192771 Seconds.
Hope this helps!
BTW, my first point.
I joined in the hopes of maybe getting an answer to an ugly problem I'm encountering with an 8051 derivative UART operating on the internal 8 MHz clock. I'll post that separately, shortly.