You'd have to send out a very short pulse of a it's resonant frequency and then immediatly listen for the return and measure the time different between the two microphones. Pretty precise timing is needed, and avoiding confusion from echo's is difficult (which is why you have to use a short pulse)
It depends on exactly what you have to work with.
IR sensing can be much simpler but not as precise. Basically you design optics so it has a very narrow angle of view and then spin the IR Sensor, preferably with IR illumination. Anything that glows from the ir light will provide a strong return signal on the IR sensor, but it's subject to ambient light conditions and the IR reflectivity of the object you're looking at and the rest of the enironment. You would basically spin the IR transmitter/receiver in a 360 degree circle and look for sharp peaks in the IR sensor output, that would corespond with the edges of an object that reflected IR light. Pretty simple in theory, but coding an actual effective routine that does it is a different story.
I hope that helps a bit.