Doing 4 simultaneous 250kbit data receives at the same time without hardware assistance is a bit tricky. Doing a single 250kbit async means that you need to find the bit start to within say 1/10 of a bit time = 400nsec, which is barely doable in software.
If use use a clock monster like the scenix, it becomes reasonable, except for the fact that the SX family doesn't have anywhere near enough RAM space.
I would recommend either a system which has multiple hardware UARTs (either discrete, or multiple processor) - on a parallel bus, or a more practical CPLD/FPGA based approach. You can dump this entire thing into a $15 FPGA, add a hundred or so lines of code, and be done with it.
BTW, multiport memory isn't required, just have the memory run at channels*max data rate and interleave everything. Also, the trickiest thing is probably trying to get the timing of the signals - how is the output synchronized versus the inputs - what sort of delays are allowed - how out of sync are the signals going to get. The main specification that needs to be written down is the behaviour of the output signal when the input signals are out of sync with each other.