Multivocal AI Voice 3: The Fluctuating Voice

This is the final of the three voice explorations in our multivocal AI voice cloning explorations.

This voice design approach builds on a dataset with two different speakers, similar to The Pooled Voice. However, whereas The Pooled Voice splits the different speakers into separate audio files, The Fluctuating Voice instead puts both speakers into the same audio files. For our experiments into this approach, we used two completely different speakers reading different scripts. The audio files do not contain a complete 50/50 split between speaker 1 and speaker 2, but the total amount of audio from both speakers is more or less equal.

The voices in The Fluctuating Voice dataset come from Kimberly Krause’s reading of Eight Girls and a Dog and Piotr Nater’s reading of The Mysterious Island, both found on the public domain audiobook site Librivox.

The voice is trained using Tacotron2 in justinjohn036’s Google Colab notebook.

An example of the dataset behind The Fluctating Voice can be heard here saying “Yes, said brilliant Nan. The wind veers to the northwest”:

The end result of The Fluctuating Voice is a synthetic voice that switches between both speakers in the middle of an utterance. The speaker usually changes between words, but in some cases, the shift occurs inside a word pronunciation. When the switch happens in the middle of a word, the shift can be audibly heard as a type of modulation between the two voices.

You can hear the shifting and bending nature of The Fluctuating Voice in the following two examples, reading two different paragraphs from this article on Vox.com:

The Fluctuating Voice seems to have a lot of aesthetic potential. The way that the voice switches in the middle of an utterance is quite unique to synthetic voices, and is hard to reproduce in traditional audio software. The artist does not really have any control of when and how the voice shifts from one to the other, but this loss of control can again be quite interesting as an artistic tool. Leaving the voice change up to statistical probability opens up opportunities for surprising and serendipitous vocal experiences.

Leave a Reply

Your email address will not be published. Required fields are marked *