A simple FastICA example

Wikipedia describes independent component analysis as “a computational method for separating a multivariate signal into additive subcomponents supposing the mutual statistical independence of the non-Gaussian source signals”. (Clearly, this was written as part of their campaign to make technical articles accessible.)

In normal people words, ICA is a form of blind source separation — a method of unmixing signals after they have been mixed together, without knowing exactly how they were mixed. It’s not as bad as Wikipedia makes it sound. It’s just the signal processing equivalent of this:

One of the problems I always have with learning stuff like this is the lack of clear examples. They exist, but they’re not generally very good. (And why do researchers always work with awful noisy 3-second 8 kHz recordings?) So, upon getting working results, I wrote up this little example.  This is in Python and requires the MDP (python-mdp in Ubuntu) and Audiolab packages (sudo easy_install scikits.audiolab).

In order for ICA to work, it requires at least one different recording for each signal you want to unmix. So if you have two musical instruments playing together in a room, and want to unmix them to get separate recordings of each individual instrument, you’ll need two different recordings of the mixture to work with (like a stereo microphone). If you have three instruments playing together, you’ll need three microphones to separate out all three original signals, etc. So, first, create the mix:

  1. Find or make two mono sound files. I just used clips of music.
  2. Mix them together to a stereo track, with both sounds mixed into both channels, but with each panned a little differently, so the two channels are not identical. They should sound all jumbled together, but the left channel should sound slightly different from the right.
  3. Save in a format that libsndfile can read, like FLAC or WAV (not mp3):
    • Mixed music
    • [audio:http://www.endolith.com/wordpress/wp-content/uploads/2009/11/Mixed-NIN-and-Mazzy-Star.mp3]

Alternatively, just mix them in Python:

sig1, fs1, enc1 = wavread('file1.wav')
sig2, fs2, enc2 = wavread('file2.wav')
mixed1 = sig1 + 0.5 * sig2
mixed2 = sig2 + 0.6 * sig1

So now you have the mixed signals, and you can pretend you don’t know how they were mixed. To unmix them automatically, run something like this in Python:

from mdp import fastica
from scikits.audiolab import flacread, flacwrite
from numpy import abs, max

# Load in the stereo file
recording, fs, enc = flacread('mix.flac')

# Perform FastICA algorithm on the two channels
sources = fastica(recording)

# The output levels of this algorithm are arbitrary, so normalize them to 1.0.
sources /= max(abs(sources), axis = 0)

# Write back to a file
flacwrite(sources, 'sources.flac', fs, enc)

The output has each signal in its own channel:

You can hear some crosstalk, but it’s pretty good:

[audio:http://www.endolith.com/wordpress/wp-content/uploads/2009/11/Unmixed-Mazzy.mp3]
[audio:http://www.endolith.com/wordpress/wp-content/uploads/2009/11/Unmixed-NIN.mp3]

For more than two sources, I just read them in separately and combined them in Python:

rec1, fs, enc = flacread('Mixdown (1).flac') # Mono file
rec2, fs, enc = flacread('Mixdown (2).flac')
rec3, fs, enc = flacread('Mixdown (3).flac')

sources = fastica(array([rec1,rec2,rec3]).transpose())

flacwrite() has no problem writing multi-channel files.

Mixed speech:

[audio:http://www.endolith.com/wordpress/wp-content/uploads/2009/11/Mix.mp3]

After demixing, there’s very little crosstalk, though the noise floor increases considerably.  This seems to be the case when the mixes are very similar:

[audio:http://www.endolith.com/wordpress/wp-content/uploads/2009/11/Source-1.mp3] [audio:http://www.endolith.com/wordpress/wp-content/uploads/2009/11/Source-2.mp3] [audio:http://www.endolith.com/wordpress/wp-content/uploads/2009/11/Source-3.mp3]

Although this method was recommended to me for real-life audio signals and microphones, as I’ve described above, it turns out that ICA doesn’t actually work well when the signals occur at different delays in the different sensor channels; it assumes instantaneous mixing (that the signals are in perfect sync with each other in all the different recordings).  Delay would happen in a real-life situation with performers and microphones, since each source is a different distance from each microphone. This is exactly the application I had in mind, though, so I don’t really have any further interest in ICA…