The idea behind this project was to see if I could use a neural network, specifically, an autoencoder, to embed the parameter space of a physical modeling audio synth.
There are some good/bad issues but in general it worked. The result is a small neural network (2- or 3-unit wide input layer, 100 hidden, 200- wide output layer) that transforms a set of parameters into a sound that is very similar to that generated by a digital waveguide model from the STK.
Underneath you can watch a video describing it. Don’t let it throw you off too much that it’s basically just a “buzzing” sound, it lacks modulation so it’s just a constant tone that changes timbre as I move the knobs. Notably it certainly lacks the nice dynamics of the physical model, which is something I find actually very interesting and that I’d like to work on in the future.
By the way, technically this was kind of nice to work on. There is some training code that is basically a messy Tensorflow script. But the result produces a set of weights, and I made an interface using Python, Qt, and RtAudio in just a few days. It makes for a reasonably efficient implementation. The synthesizer is real-time in C++, but the interface is entirely in Python, and I used boost.python to exchange arrays between the two languages.
The neural network code itself is just a dead simple matrix
multiplication passed through a
tanh() function. This generates two
cycles of the audio, which are then overlap-added to produce the