Research assistant Ada Ada Ada presented her work on post-anthropocene AI voice cloning at the POM conference 2024. The slides from the presentation can be seen here. In this post, we’ll share a bit more about what those experiments entailed.
The experiments emerge from the follow research question:
How can we use AI voice cloning to engage with the more-than-human world through vocal aesthetics?
Inspired by Donna Haraway’s writing in Staying with the trouble: Making kin in the Chthulucene, we took a look at how we can perform AI voice-to-voice (a.k.a. speech-to-speech) conversion on pigeon vocalities.
First, we tried using the hugely popular voice cloning platform, ElevenLabs. We conducted two experiments here:
1. We converted pigeon sounds to human voices.
2. We converted human voices to pigeon sounds.
Pigeon to human on ElevenLabs
In the first experiment, we used the following clip as input:
This was then converted to a voice clone of Ada:
We also tried converted it to the default ElevenLabs voice known as Bill:
Generally, we consider these experiments quite a success. The sounds still feel pigeony in their rhythm and cadence, while definitely retaining a sense of humanity to them. The end result is something that feels like multispecies vocality.
Human to pigeon on ElevenLabs
Secondly, we attempted to convert human speech into pigeon sounds on ElevenLabs.
We used the following as the input:
The input was fed into a pigeon voice clone converter, which resulted in this:
In this case, we did not get a sense of any multispecies collaboration. The output is almost exclusively human in its aesthetics.
It seems that ElevenLabs is built from a anthropocentric mindset. This approach restricts the potential for multispecies voice cloning.
ElevenLabs’ reliance on pretrained models might cause some of these issues, so we decided to try training our own voice cloning model from scratch instead.
Human to pigeon on SoftVC VITS
We used the SoftVC VITS framework in Google Colab to train our own pigeon voice cloning model. The notebook used for this has been prepared by justinjohn0306, and can be found here.
We used the same input as on ElevenLabs, and the output ended up being much more pigeony.
Since it can be a bit difficult to tell whether this audio clip is just random pigeon sounds, we also overlaid the two sounds on top of each other, so we can more easily tell that the rhythm and cadence of the input human voice has indeed been cloned onto the pigeon sounds.
Conclusion
With these experiments, we have shown that multispecies voice cloning shows an expansion of capabilities.
The sounds being made are neither human nor non-human. They are simultaneously both and more than.