hi Ron,
I have been following with interest this thread, I applaud your persistence.
I think what the OP is doing is he is adding the propagation delay at the start and at the end of the square wave pulses, so he always finishes up with an overlapping clock pulse train.
IMO he should consider the propagation delay as acting on the leading and trailing edges [ high to low and low to high transition] at the gate inputs.
example:
Say the propagation delay is 10uSec thru the gate for the leading edge, the leading edge will change the state of the gate output 10uS later, so the pulse width has 'shrunk' in width by 10uS.
The same applies to the trailing or falling edge, it takes 10uS to propagate thru the gate, so the pulse width has 'shrunk' by a further 10uS.
As this happens thru every gate the pulse width will get shorter and at the outputs the two pulse streams will not overlap..
Regards.