In the datasheet of the E 510 Chip one velocitiy count is 250µs.
Maximum length for whole velocity is 125 x 250µs = 31,25ms.
With the used keybed it is to test if this values fit's for proper playing.
I guess it should be done in shorter periods, because the rubber contacts will close faster one after the other than the old rail ones.
When it has to go faster as the the chip can do this, You can countdown 2 steps per scan, that doubles the scan time.
Before processing the last values the pulling down for the next bank could be done, so there is a little bit of time to stabilize the voltage.
I think an optimized C-Code will be fast enough for the keybed scan, but assembler that could be run faster.
I'm thinking something like this.
I guess the used Microcontroller has not enough memeory to steer this Display properly.
For every pixel You need 3 bytes of data = 128 x 128 x 3 = 49152 Bytes to transmit.
I think it's better to use a monocrome one, there every pixel have one bit = 2048 Bytes.
Additional I would use SPI for display steering, because it will be faster then I²C and easyer to handle.
And that could be done in C without problems.
Most displays can be programmed via solder points to the used protocoll.
For comunication between the main controller and the display controller You can use a hardware USART, like suggested at the start of this threat.
A possible protocoll could be: x-position, y-position, font, a few bytes ASCII, CR, LF.
The overhead is then 5 Bytes.
The display controller would setup the display then.
But then You need a main controller with a minimum of 2 USART's.
One for MiDi, one for display.