The routine reads the incoming bit (either HIGH or LOW), and sets the Carry bit accordingly. This is then rotated in to the Rcv_Byte GPR, the 'f' at the end of the instruction means to write the result back to Rcv_Byte (a W would mean to write the result back to W).
This is repeated 8 times in a simple loop, to build up the complete 8 bits for the full byte.
As for software delays, they are far more versatile and often more accurate than using timers - you can adjust software delays to give timing to one instruction cycle, you can't do that with timers.