We all are aware of how modern sound studio recording consists of numerous audio sources – vocals, guitars, piano, bass, drums, other percussions, synths. These sources are recorded separately, edited digitally and mixed together to provide a single recording. The production recording has multiple channels, but these channels are only for spatial separation for audio reproduction (Left, Right, Front, Rear etc.). But once the track is “flattened”, it becomes very difficult to separate these sources from the track. This separation has many applications – simplest ones that come to mind are generating karaoke tracks for practicing the vocals or piano track for learning how to play it easily etc.
This is where this new piece of software comes into play! On the outset, it looks like a simple python based software, but it is powered by Google’s TensorFlow machine learning framework and it leverages pre-trained models for separating 2, 4 or 5 stems (sources). You just give it the original track and a few command line arguments (basically how many sources do you want to identify) and it gets busy and produces its output.
I just used this on a random audio track (well semi-random, because it has to be an A. R. Rahman track 🙂 ). I thought that the software would not work so well on the heavily engineered and processed soundtrack. But I am positively surprised with the results. Check out how the output looks on 2stem , 4step and 5 stem separation!
(If the embedded players below do not work, click here to go directly to the demo)