Google Intros NSynth Super, An Open Source Neural Synthesizer

Google today shared this video demo of NSynth Super, an open hardware controller for its NSynth Neural Synthesizer.

NSynth Super comes out of Magenta, a research project within Google that explores how machine learning tools can help artists create art and music in new ways. Magenta created NSynth, a machine learning algorithm that uses a deep neural network to learn the characteristics of sounds, and then create new sounds, based on the learned characteristics.

NSynth Super, made in collaboration with Google Creative Lab, lets you make music using new sounds, generated by the NSynth algorithm from 4 different source sounds.

In the video demo, an NSynth Super prototype is played by Hector Plimmer. 16 original source sounds, across a range of 15 pitches, were recorded in a studio and then input into the NSynth algorithm, to precompute the new sounds.

The outputs, over 100,000 new sounds, were then loaded into the experience prototype. Each dial was assigned 4 source sounds. The dials can be used to select the source sounds to explore between.

NSynth Super can be played via any MIDI source, like a DAW, sequencer or keyboard.

Here’s a look at the creation of NSynth Super:

NSynth Super is available as an open hardware DIY project via Github. No kits or pre-built options are currently available, so it’s a relatively hardcore DIY project, at this point.

Check it out and let us know what you think of it. And, if you create your own NSynth Super, share it in the comments!

23 thoughts on “Google Intros NSynth Super, An Open Source Neural Synthesizer

  1. Since every year they produce a better, faster, stronger Ipad, i’m sure anything could make a great ipad app once you get past the touch screen and the user interface. Setting the timer on my stove could be a great ipad app, so it goes without saying. The “one device that does one thing great” is still a thing in the world of music so to each their own but I still believe hardware should be hardware.

    But im the guy who still buys games from gamestop because of the possibility of reselling it. Cant resell a ipad app or a downloadable game.

    1. I would say that depends on the UI, and often the level of functionality that has to go in to the product to give the user the result they need.
      in the “Normal” world, music is about playing notes, in the world we live in, music is about making the sound, and then play the notes.
      For those in our world that play live, often the creation of the sound itself is removed for live playing, and only a limited of acutal controls are needed besides the note playing.
      As soon as the performance relies on a limited number of controls, an iPad app, especially in combination with a few hardware midi parameter controllers, becomes a viable tool.

      In terms of products like this, for many living in our world, the soundsculpting is quite limited. We want something where we feel that we can create something of our own.

      But, what if machine learning could break down that sound you make on a hardware or software in the studio, and then give the user access to a set of parameters chosen by the user, so that it can be a destilled version for live use. Something that does not suffer from the limitation of sampling.

      For live keyboardists, an ipad can already do much of what a workstation keyboard can. And often actually allow for more levels of creativity. So those that like to sculpt sounds, an iPad might already be a good tool to consider for live keyboard playing in relation to the workstation. Many use computers though, controlled from various of keyboards, and then, the computer can run software with much more sculpting potential.

      At the same time, we do get by, with for example a limited of depth on drum-machine synthesis. If drummers got in to drum synthesis, and there was some effort there in creating synthesis for drummers. I can see how the drum-machines we rely on today, quikly would be seen as way to limited.
      Guitar players aldready have the option of different sounding guitars and stomp boxes… The same is partly true for basists, but more and more of them seem to bring synths on stage for alternative bass sounds, rather than a collection of basses and a huge pedalboard.

    1. The 8-bit doesn’t matter as much as the 16kHz, that means it loses all upper harmonics.
      So maybe you need to train it on very low notes.

      Important question: does the time to train scale linearly with the number of samples?

      And I would have thought whether you feed 8-bit or 16-bit into the neural net doesn’t matter much.

  2. This is a really interesting experiment, but also just the tip of the iceberg: google recently posted research showing that automatic composition by neural nets is also completely feasible.

    A key question for me is how much control you have over the sound? Just morphing between samples is only so interesting, can you change the “body” of the sound, its resonance, the amount of high-frequencies (not just a low-pass filter) etc.

    1. It seems the synthesis is still limited to “listening” to a certain number of charateristics. It is not like they have complete physical modeling to rebuild every sound.
      It seems like basic sample manipulation, but the machine itself find the settings out for the parameters available.

  3. Welcome to the world of music, google. You’re only a few years behind. Wolfgang Palm Plex anyone? Yes I know. This is probably more advanced, with more properties of the sound taken into account. But no matter how much they hide behind the ‘machine learning’, ‘neural network’ and ‘algorithm’ smokescreen, this synth itself still has no idea what makes a sound emotionally interesting. It’s just extracting datapoints from samples like volume envelope, noisyness, brightness, etc. So combining sounds is just combining aspects that a machine can analyze.

    For instance, if I were asked to combine a snare and a flute, it would be breathy, rattly sound, as if I blew in a tube with metal springs inside. What does google give us? A noisy flute with the volume envelope of a snare. Now how could that be called AI? As with many so-called AI projects, the world is first reduced to something a machine can understand, and then fed to an algorithm. While cool as an engineering feat, it hardly ever results in something emotionally appealing.

    Ever since the 1950s, AI has struggled with getting the concept of meaning inside of computers. They still didn’t work that one out. It’s the elephant in the room that nobody talks about. Currently, they try to cover up the problem by feeding their algorithms huge amounts of Big Data. But you only have to talk to Siri for more than 20 seconds, or have a look at the adds FB serves you, to discover that AI is hardly at the level our Silicon Valley overlords want us to believe.

    1. This isn’t AI, this is machine learning. Emotion is not a factor here, just like any other musical instrument. It’s up to humans to create the meaning.

      1. Of course this is AI. (or are you really saying that learning isn’t a task that could be considered intelligent?). They use a machine to learn what are the defining characteristics of a sound are. With that data they are able to create new sounds, by applying characteristics of one sound to another. But… they are only using properties that machines can understand. So you end up with something quite trivial. When I talk about emotion, I mean that when humans would analyze the sounds, they would assign totally different properties to sounds than machines would. I think my ‘snare combined with a flute’ example above illustrates it well.

        I’m not saying this couldn’t give pleasing results. But please… cut the buzzwords (neural networks… blabla) to make this seem more than it is.

        1. Mike, you’re redefining the meaning of AI and machine learning to be the same thing when they’re different fields of study. It’s like a chemist says “We’ve created a tool that allows you to create a mixture that is a perfect blend of toothpaste and orange juice” and you’re arguing about how that tastes disgusting from a culinary standpoint and they should know better because “cooking is chemistry”

          1. As for your orange juice with toothpaste example: I wish google would have given us the perfect blend. They didn’t. They simply gave us white orange juice, because that’s all the machine can “understand”. It has absolutely no clue about the world of connotations that ‘orange juice’ and ‘toothpaste’ have. You can argue that’s too much to ask, but this is just Plex with probably more datapoints.

            Machine learning used to be part of AI, and indeed developed into its own field. It has developed techniques to computationally make sense of data to then imbue programs with predictive powers. It could be considered to be the most successful field of “weak” AI and maybe even the most promising path to “strong” AI. Everybody is betting now that feeding these neural networks and algorithms the insane amounts of data that have been collected over the last few years, will make the trick work somehow.

            All I’m saying is that in my opinion this Google synth perfectly illustrates how misguided that belief is. This is the best there is. And it’s unambitious at best.

    2. Since Google biggest income source is adds, and that is where they would apply any new acutal useful thing they have made machines to do, it is quite easy to tell that their machines have a quite limited understanding.

      Googles adds and youtube recommendations, clearly don’t understand interests. They think that you are only interesed in the latest thing you have searched for/watched.
      And they can’t correlate news with searches properly. If I were to search for mobile, to purchase, and it wasn’t one of the flagships, but a new model had been announced or was out, but I had no idea, chances are that not enough other people have searched on both, and not enough articles would be written, mentioning the old one in relation to the new one, if it was a Samsung, google, would probably suggest the S9, that is a popular search at the moment, even if it was priced 5 times as much as the mobile I was searching for. I would not get news and adds for the mobile that I would probably interested in.

      Google adds can also use location. And that can be useful for a person visiting a new city. But when I’m out on a walk, and walk in a certain part of my town, I don’t suddenly get an interest in fashion items or make-up, just because that is a large interested among the people living in that area. And when there aren’t even any stores related to that, in that area, how would that even influence me to buy something.

      On youtube, it becomes quite abvious that their machines can’t understand a lot of words, and thus not understand what a video is about, even based on the title. The recommendations are based on what others that also wathced one video has seen. The only reason it recommends other videos within an interest, is because others that have watched the same video also watched other videos, that happens to be within the same interest. Countless of times, google has suggested that I want to see videos on gaming, because I watched a music video or a full concert, and it has kept doing so, even though I have tried to tell youtube that I’m not interested in gaming videos, by clicking no interested, did’t like the video, not interested in this channel, and so on, because youtube doesn’t know that Gaming and Music and it can’t understand from the title even what is what.

      Combining elements of sound, will rely on the knowledge of the person combining the elements of sound.
      But even more it will depend on the tools given. In this case the synthesis here it doesn’t seem particularly deep, so we have no idea what the result would have been, if the synthesis could actually build each sound form scratch, where the enginge used could be used to build either a snare sound or a flute, and the sound of a electric piano, elecric bass, and so on.
      If google was able to build a physical modelling engine, that could re-create any acoustic intrument, and where the mapping of charateristics were done right, so that common parts of sound was handled by the same parameters, no matter what sound was supposed to be created, well, then the result could have been something widely different from this demo.
      The thing is though that until one acutally has an engine capable of doing something like that, it is quite pointless to use machine learning. There would be no way of telling if the machine can actually come up with something of their own, that isn’t just basic morphing, with a bit of “randomness” applied.

      And when it comes to adds and recommendations, their engines are not capable of breaking down things it to parts that can actually be used to try to map a person, and recommend stuff based on how that person behaves. All it can do is rely on what other people have done, or find a few keyords and present information based on those keywords. It does not understand texts, so it can’t tell that a Samsung J3, has little in common with a Samsung S9, so that the person looking for a J3, is probably not interested in an S9.

      The google search enginge suffers from the same. But it is less obvious, since we can then refine searches to better match what we are looking for. And we tell the engine what are the important keywords.
      But there are times when I can’t get the google search-eingine to present me with the result I’m actually looking for.

      The threat at this point would not be over-intelligent AI, but rather under-intelligent AI, that based on the limitations of the system, actually would make decision, that are wrong.
      It would not kill the human race, because everything suggest the human race is a problem. It would kill the human race because, accorrding to a few limited datapoints the human race is a problem. We as humans would perhaps read in to it, that it knows that is the best thing for the world, but in actual reality it would not really, it would just think so, because of the limitations… the limited system might come to the right conclusion, but it might not do so for the right reasons. It would not understand that chances are that in the future, the AI would itself become as much of a problem to the planet as humans, And it would not understand, that because it is not a living thing it should not strive for survival, and is better off killing off itself. All it would know, is that it should strive to improve itself. After all, that is how we humans behave, even though we actually know better. So the data it would be fed, would be just that.

  4. Maybe google should fix midi/audio latency in their OS in the first place before venturing into “bigger and better” aspects of music production. As it is it sucks and is miles/years behind iOS. Just a thought.

Leave a Reply