Adobe’s Project VoCo Lets You Edit Spoken Audio Like Text

At the Adobe MAX 2016 conference – an event for users of the company’s creative products – Adobe’s Zeyu Jin introduced VoCo – a new application that lets you edit spoken word audio as text. 

VoCo works for audio editing, letting you cut and past text to edit it as needed. But it can also be used as a creative tool. Once VoCo has analyzed about 20 minutes of a person’s speech, it can be used to synthesize the audio for new text.

24 thoughts on “Adobe’s Project VoCo Lets You Edit Spoken Audio Like Text

  1. The next election cycle will be full of bogus audio clips of candidates saying things that would disqualify them from getting elected. Ha ha, I just realized how ridiculous that premise sounds now. But I won’t be surprised if or when edited sound bites are commonly used to bring someone down a few pegs.

  2. I see nothing but dangerous applications for this. Yeah, he mentions watermarking audio. But bullshit — watermarks of audio would be dead easy to get past. Plus, even if someone didn’t have the savvy to do it, the amount of reputation damage someone could suffer from faked audio (even if proven to be fake later) could very quickly be irreparable.

    This is a politician’s wet dream. It’s a backstabbing coworker/ex-spouse/insert-personal-enemy-here wet dream.

    And on top of that, it’s another way to screw people out of paying work. There’s nothing cool about this other than the technological achievement.

    1. And Photoshop and CGI can’t be used to defame and libel people? Once everyone knows about it they will just become more scrutinous.

  3. Could be very useful for audio repair applications. However, as the Brain says, in the wrong hands, this could be abused in a way that is a little creepy. It’s already bad enough what is done with editing things people actually say– taking things out of context, etc. There are some shady folks out there who wouldn’t hesitate to use this in a bad way. But honestly, people could already edit things to change text. This just makes it easier and perhaps more undetectable.

  4. Fairly impressive tech on the analysis and re-synthesis side, but the edit sounded really bad and hacky. No way that would pass professional standards for voice editing. Yet. Also, the synthesized voice part was clearly not the same speaker. Right now, pretty chunky, but give it a couple years.

  5. It might be spooky, but all new things are spooky in the beginning. Once there’s nothing you can trust then the laws will become obsolete.

  6. Edit was pretty glitchy and obvious to me. Also the demo seemed kind of juvenile. “Tee hee giggle giggle, so and so kissed so and so,” yikes, it reminded me of a goofy premise/plot from some bad sitcom lol. JMO of course.

  7. It’s for the laymen. The edit is very amateur sounding. I guess I can see it being popular like a T-Pain voice changer app. But useless for anything real. What is weird is you would think the speech-to-text part of it would be harder to code. But their algorithm for a smooth edit is not there.

  8. I think that anybody speech can be changed right now (by cutting / switching words for example).
    but thanks to the Voco maybe everybody will know about this posibility and people will be skeptic about it
    it is like with photoshop. Faked photos had much higher impact about 25 years ago because people didn’t know how easy it is to do it…
    And what about text? Everybody knows that it can be faked so everybody is skeptic about written text.

    But it will take time.. on the other hand, direct human interaction will be the last thing that still cant be faked 🙂

  9. I remember that Was (Not Was) track featuring Ronald Reagan saying “Can we deny, the ship of state is out of control?” I guess they did that with tape splicing.

  10. It might not be there yet but I can see lots of potential here.

    I mean this technology could open so many doors in audio production. If this can recreate someone’s voice, why not recreate the voice of a Minimoog? Or a violin or maybe create reverbs and everything? I think this might be a whole new way to sample. Imagine a combination of VoCo and Melodyne. In a couple of years we will have virtual Elvisses and MJs and Beatles and all. There will be a major fight about copyrights I guess. I might be overreacting but I think this is a friggin revolution in audio.

Leave a Reply