Unfortunately, they are not sharing the cross correlation done in the 16F628.
Do you have any resources on doing that with integer calcs?
It wodn't be difficult to do it in integers and make it very efficient.
The problems you're facing with correlations are:
1. You need a correlation for every lag. If you analyze 50 possible lags, you need 50 calculations. Each includes one multiplication and one addition at every step, plus comparisons. Even if you fit it into 10 cycles, you'll need 500 cycles per sample or 1500 cycles if you have 3 pairs of microphones. Assuming 200ns/cycle, it'll give you 300us worth of calculations. It's enough time for the sound to travel 10 cm. You need much better CPU. For comparison, you don't need any significant computing power to analyze zero-crossings.
2. Sampling rate. You need to sample 3 channels (one for each microphone). To get 1mm resolution, you need a sample from each channel every 3us, which requires sampling rate at 1MHz, which you probably can't do. With zero crossings you get 0.07mm (assumong 5MIPS CPU)resolution without any sweat.
3. You need to scale your input so that your signal covers substantially all of your ADC range. With zero crossings you don't, because the crossings are right in the middle.
Therefore, calculating correlations is not a good approach to detect phase shift compared to zero crossings. However, I still think that regardless of how you do it, the phase shift detection is inferior to amplitude based approaches (as with this robot, for example).