If you look at the signals connecting to the "screen" block, they are the same as with the RAM block, except at a different address area (SelScr rather than SelRAM from the address decoder block).
The CPU stores the pixel data in RAM, but a different block or unit of RAM to the normal working memory.
Note that the address bus is connected to the screen to control what word of the screen RAM is written or read.
In the simulator, the screen RAM is mapped to visual pixels by software.
To do it in hardware, you need dual-port RAM that can be read independently with timing synchronised to the required video sync signals.
eg. The column counter (something like in my testcard circuit) addresses pixels across the screen. For your display, you could use the lowest four bits to address a 16 way multiplexer which selects one but from the addressed word as output, then the rest of the bits of the column count become then low address to the RAM.
512 pixels / 16 bits = 32 words; 5 bits address for the column address of a word within one pixel row.
The next 8 RAM address bits are the row address, to give the full 512 x 256 pixel screen.