Well, you could try the example code you posted above. Looks like you'd have to mod your hardware a bit though. I noticed that the author is setting up the RW, RS, CS and data bits first and then just toggling CS high again to finish the write cycle. Might be worth giving it a try as the code has been tested so if it doesn't work then it must be a hardware issue. Might narrow things down for you.