Hi Mike, I read that code you did, nice work.
The code offloads the heavy lifting to your built in peripherals.
RX is edge driven by INT0 and no oversampling as the bits are sampled at as close to 1/2 way through the next bit as possible, based on a timer IRQ that's been synced and 1/2 baud clock delayed to the INT0 interrupt. I was considering making this a high priority IRQ if my other routines take too much time.
TX has its own clock and just chugs out one bit at a time every 104µS. I'll have to run it through the simulator to count the clock cycles.
I've had it working in hardware using Swordfish BASIC, just haven't tested it in XC8.