I think you would use an ADC to digitize the sounds coming in after passing it through some filters. THen you would use some Fourier transform (you would probably need a DSP rather than an MCU), and then maybe normalize the fourier spectra and run it against a library, making sure to take into account possible fuzziness and variation in readings?
I guess you would probably also average it over time in order to ignore sounds that are constantly there, but that's later on.