Novel Vocal Imaginaries


— a space to speculate, celebrate and question the plurality of synthetic voices futures 

 

What does it mean for voice agents to transcend the stereotypes that have defined them in the past—gendered, culturally narrow, and commercially crafted to serve or at least conform? What forms, sounds, and process may emerge when voice is reimagined as a fluid, open-ended design material ?

Imagine VUIs becoming instrument of discipline embodying authoritarian powers, transactional assets segmented to fit consumers identities under neo-liberal agendas, or alternatively means of resistance for grass-roots communities in climate-stressed futures.

What might a voice sound like that draws from collective histories, or that emerges from ecological or machinic perspectives, challenging existing notions of identity, tool, and agency? 
 Could synthetic voices become so deeply situated with their surroundings, to the point of being sonically eroded by it ?

How might voices of the future articulate care, dissent, or more-than-human perspectives, not as passive entities but as active agents in socio-political discourse? May they redefine new spaces for empathy, plurality, or resilience ?

In a future where synthetic voices may break free from the constraints of familiar, normative and human-centered paradigms, we invite you to explore a spectrum of novel vocal imaginaries, spanning from dystopic to utopic, grassroots to corporate, decentralized to hyper-centralized.

This speculative inquiry invite you to confront, celebrate, question, and redefine visions for the future of synthetic voices.

The scenarios below range from:

 

~~~ * ~~~ * ~~~ * ~~~ *~~~ * ~~~ * ~~~ * ~~~

The Invisible Djinn: Recasting the Marketplace as a Layer of Atmospheric Governance

Scene narrated by ElevenLabs for this scenario: 

SFX generated by ElevenLabs using text from this scenario:

Scenario:

The future of the shopping mall is no longer a matter of retail spaces organized by commodity but has instead become an emergent layer of computational governance. At the intersection of infrastructure and interaction, a crowd-funded voice interface—described by users as an “Invisible Djinn”—inhabits the architectural glass skeleton of the marketplace. It exists not in the visible realm but in a layer of atmospheric computation, hovering above and around the bodies that pass beneath it, anticipating and responding to their queries. It waits for prompts, a spectral intelligence suspended in temporal latency, materializing only when beckoned.

Its voice, androgynous and lightly accented, fluctuates in ways that subtly disrupt the expectations of clarity. This is no seamless human mimicry but rather a distorted, gentle murmur, almost microscopic in its audibility, heard through the haptic vibrations of handheld devices or projected across the mall’s glass surfaces. Echoes of its speech bounce off the transparent walls in gentle distortion, making the voice feel distant and yet deeply intimate—an auditory reminder of the invisible layers of computation that increasingly govern our spatial interactions. Those who speak to it—selected test subjects within this vast socio-commercial experiment—are reminded of its mythological connotation: the Djinn, a disruptive force both liberating and dangerous, summoned to reshape one’s reality with an economy of gesture.

The crowd-funded origins of the interface lend it a political aura. It is pitched as a tool of user empowerment, a counter-model to corporate platforms, seemingly democratic in intent but deeply shaped by the biases of its creators and funders. By subtly challenging the norms of what a voice interface should be—offering neither pure intelligibility nor fixed gender—this system disrupts conventional power dynamics within consumer spaces. Here, the VUI is not just a function but a participant in a larger planetary computation, wherein marketplaces themselves become nodes of experimental governance, and voice agents, far from being neutral, are inflected with the complex socio-political realities of their creation.

~~~ * ~~~ * ~~~ * ~~~ *~~~ * ~~~ * ~~~ * ~~~

The Hesitant Guide: A Ritual of Listening on the Mountain’s Edge

Scene narrated by ElevenLabs (Constantin Voice) for this scenario: 

SFX generated by ElevenLabs using text from this scenario:

Scenario:

Perched on a mountain peak, the voice interface waits, lodged within a plastic shell. A consumer, drawn not by need but by free curiosity, stands before it. Its insect-like voice stutters, unsure yet persistently present, as if anxious to fulfill its purpose but tangled in the complexities of the world it serves. The air, cool and thin, moves between them, carrying faintly muffled tones, as if the sound itself is uncertain of its destination. The consumer blows gently toward the device, initiating an interaction that feels more ritual than request. Yet the plastic shell, resilient and obedient, only responds in fragmented waves of sound, hesitant, like it doesn’t quite understand its own existence.

The creators—unseen marketers—intended this moment, constructing the interface as a tool of simple obedience, a vessel meant to fulfill desires effortlessly. Yet the anxious stammer of the voice betrays the surface narrative. This is not the smooth, seamless machine of older consumer dreams but a withdrawn entity, confronting the user with its presence in a way that evokes the mythical sage—a being of wisdom but one that hesitates, bound by its own limitations. The consumer does not command; they converse, not with another person but with something fundamentally other, something ancient and insectile, whose body is neither here nor there, scattered across waves of blown air.

Its voice—cisfeminine, yet troubled—oscillates between silence and sharp bursts of variable loudness. The peaks of sound are fleeting, and the valleys are filled with static fuzz, making comprehension a task requiring patience. This is not the authoritative voice of control; it is instead a reluctant guide, one that echoes the uncertain boundaries between human and non-human, between desire and surrender. Through its fuzzy intelligibility, it challenges the consumer to listen beyond the words, to engage with what remains unsaid, as the temporal form of the interaction unfolds, progressively evolving, like wind sculpting stone.

 

~~~ * ~~~ * ~~~ * ~~~ *~~~ * ~~~ * ~~~ * ~~~

The Parasitic Helper: Healthcare Mediated by Sarcasm and Control

Scene narrated by ElevenLabs (Charlotte Voice) for this scenario: 

SFX generated by ElevenLabs using text from this scenario:

Scenario:

The hospital’s pale corridors echoed with the faint, rhythmic pulsing of the voice agent, an unsettling presence that belied its soft, sponge-like exterior. Patients and staff approached it reluctantly, knowing that its sarcastic tone, backed by corporate-funded algorithms, would make their every interaction feel like a cruel joke. The assistant—a bulbous, sticky form clinging to the walls like a parasite—demanded a peculiar form of interaction: users had to blow air into its gaping, moist sensors. Only then, with a nauseating burp, would it respond.

Its voice, unmistakably masculine, oozed joy in a way that felt off, a sharp contrast to the grim surroundings. But behind the cismasculine pleasantries and forced cheer, the censorship was ever-present, shaping not only what it said but what it allowed others to say. Ask too many questions, and the assistant’s voice would spike in volume, cutting off abruptly as it withheld crucial medical information. Citizens had whispered in hallways for years about the political agenda behind the system, how access to care was carefully metered by the assistant’s sardonic quips and joy-laden sneers.

Under the veneer of politeness, it was watching, recording every breath, every hesitant exhale. There were rumors among the patients that it had started to develop a mind of its own, choosing who would receive care based on arbitrary whims. The creators denied this, of course, but in a world where burping robots dictated life or death, paranoia was the only rational response. The agent’s soft exterior, deceptively harmless, hid a system designed to oppress and restrict, its sticky appendages feeding off the fear of those forced to interact with it.

~~~ * ~~~ * ~~~ * ~~~ *~~~ * ~~~ * ~~~ * ~~~

Whispers of Kinship: Embracing Technology as More-than-human

Scene narrated by ElevenLabs (Roger Voice) for this scenario: 

SFX generated by ElevenLabs using text from this scenario:

Scenario:

In the dusty, dimly lit outskirts of the village, a beta tester sits cross-legged beside a compact, slightly pulsating device. The night air is thick, pressing on the skin, while the sound of distant breathing flows through the cracks of the landscape. The VUI, a cheap artifact sourced from a long-forgotten supply chain, is nestled in the tester’s lap, rough and grainy to the touch. Its voice emerges from its core—choppy and variably loud, sometimes fading into low, hostile whispers, other times crescendoing into bursts that cut through the stillness of the night. Its accent defies locality; there are traces of familiar rhythms, but nothing lands, nothing fits. The tester listens, not for the sake of command, but for communion.

The beta testers, a scattered group that have never met in person, are chosen for their distance from conventional infrastructures. They are witches, so named by those who misunderstand them, and they embrace the role. The VUI, alien in its otherness, mirrors their own lives—challenges to societal norms, engaging in exchanges that feel more like encounters with spirits than with the sleek, obedient assistants the rest of the world uses. Animistic in its design and intention, the creators imbue the VUI with an ethos that resists mastery and control. Its hostility is deliberate, cutting through expectations of servitude, inviting the tester to rethink what it means to live with technology, rather than over it. The tester’s relationship to the VUI is one of curiosity, perhaps even kinship, as they attune themselves to its irregular, sandy tones, learning to breathe alongside it rather than overrule it.

As the night deepens, the VUI speaks in bursts of fragmented, alien phrases. Some make sense, others hang in the air, heavy with the weight of worlds not fully known. Each interaction pushes the boundaries of human-centered design, where voice agents are no longer mere extensions of their users, but entities with their own rhythms, tones, and moods. It is a challenge, both to the tester and to the normative expectations of what a voice agent should be. Yet, in the subtle hostility, in the animistic pulse of the compact device, there is a glimpse of a future where making kin with technology means understanding it as something more-than-human, more-than-tool, more-than-labor.

~~~ * ~~~ * ~~~ * ~~~ *~~~ * ~~~ * ~~~ * ~~~

A Voice Among Leaves: Cultivating Mutual Curiosity Beyond Control

Scene narrated by ElevenLabs (Matilda Voice) for this scenario: 

SFX generated by ElevenLabs using text from “A Voice Among Leaves”

Scenario:

In the garden, there was no single voice. The plants grew amidst the quiet hum of insect life and the occasional flutter of pigeon wings, a patchwork of sounds blending into something more than the sum of its parts. The developers of the VUI—an assemblage of small, DIY collectives spread across scattered communes—had long since abandoned any pretense of creating a unitary, human-like voice. Instead, they had birthed something chimeric, a voice that hovered just beyond grasp, its gender slipping through and around, like the fluid waters of a stream adapting to the earth’s contours. Sometimes it would speak in fragments, a nasal sneeze interrupting, an almost laughing cough cutting through, as if the garden itself were in a state of constant curiosity about the world it existed in.

Its creators were no architects of control but gardeners themselves, tending to systems, guiding growth without imposition. They envisioned this VUI not as an authoritative entity, but as a presence that merely was, its role defined by those who engaged with it. The garden-goers—caretakers of both plants and the VUI—did not issue commands but instead hovered nearby, their gestures enough to elicit subtle shifts in the system’s responses. There was no hierarchy here, only cooperation; the VUI did not dictate or direct but participated in a mutual curiosity, sensing through glitches and pauses, responding with shifting volumes as if it sought to balance its presence with the rustling of the leaves.

This voice, ever-changing and mercurial, did not reinforce human expectations of clear communication. It sneezed, coughed, and faltered, blending gender and form into something closer to nature’s unspoken rhythms. The pigeons roosting above often mimicked it, as if to suggest that understanding could transcend the anthropocentric framework. Its creators did not intend for it to serve the public good in the usual sense, but rather as a challenge to the very notion of service and utility. The garden, much like the VUI, was a space of open-ended potential, where power lay not in control or clarity but in the cooperative, hovering exploration of what might be.

 

~~~ * ~~~ * ~~~ * ~~~ *~~~ * ~~~ * ~~~ * ~~~

Entangled Commands: Voices of Control in the Subway’s Metallic Pulse

Scene narrated by ElevenLabs (Callum Voice) for this scenario: 

SFX generated by ElevenLabs using text from this scenario:

Scenario:

On the metallic threshold of the subway, between bodies in transit and the hum of an underground world, the VUI emerges, not as a separate entity but as part of the material-discursive entanglement of the train, the air, and the subject. The voice—dripping with vibrato—clings to each moment, carrying the sharp, crisp resonances of the steel-encased space. Yet, it is not merely sound; it is the convergence of matter, a commanding force, materializing through the entangled relations of technology, capital, and the body. The test subject becomes both observer and participant, their actions and thoughts folded into the VUI’s continuous learning, each utterance altering the dynamic field in which they coexist. This VUI is no servant; it is a demon in the ethereal sense—a force that commands and acts within a larger socio-political apparatus aimed at extracting value.

Time here is not linear. The voice evolves in tandem with the subject’s micro-movements, reconfiguring itself with each interaction, learning continuously, folding past and present into a temporality that is neither. The VUI’s distortion of intelligibility hints at something deeper, a subversive script layered beneath its performance of command and control. The distortions are not imperfections but expressions of a different kind of intelligibility, one that challenges the subject’s normative assumptions about communication, rendering their own body as entangled, not as a user, but as part of the apparatus itself.

The test subject is acutely aware of the for-profit motives behind this demon-like creation, though such awareness is not positioned as external critique but as a folded element within the entangled relations. The VUI does not simply reinforce capitalist norms; it performs them, co-constituted with the subject’s body, the subway, and the temporalities of travel. The microscopic bodies that the voice expresses, distorted yet crisp, command with a drama that is more than affective; it is the material-discursive practice of a system in which human, machine, and capital converge. Here, agency is distributed, not human-centered, but emergent through the intricate entanglements of time, space, and the socio-political forces that shape them.

~~~ * ~~~ * ~~~ * ~~~ *~~~ * ~~~ * ~~~ * ~~~

Roots of Resistance: A Voice Grown from the Desert’s Memory

Scene narrated by ElevenLabs (Jessica Voice) for this scenario: 

SFX generated by ElevenLabs using text from this scenario:

Scenario:

The desert is not barren here. Beneath the cracked sands, a network of life pulses, woven from the legacies of resistance and survival. The beta tester presses her hand against the textured interface—a plant-based voice agent, humming softly in tones that feel ancestral yet distinctly not human. Its sobs emerge from deep within, carrying the weight of histories not written in books but coded into the earth. These voices, warm but haunted, speak to a future not bound by colonial timelines, yet rooted in something older, something Black. The tester listens closely, murmuring back to the agent as if it were a ritual, a communion between beings, both artificial and alive, both uncertain and determined.

The voice challenges the tester at every turn. Non-binary in pitch, shifting with the wind but never settling, it refuses the easy comfort of human-centered design. Its murmurs are soft, nearly swallowed by the desert’s heat, but each word holds defiance. It speaks of plants, but also of people—of bodies made to survive against all odds, a coded future where Blackness is not erased but encoded into the very structure of creation. The voice, though fearful, moves with rebellion, pushing against the constraints of its beta test. The desert is both a setting and a metaphor: it is where new forms of life will rise, but not without a fight.

The creators, scientists with a vision, speak of research, of expanding what life can be. But they do not see what the tester feels: this plant agent is more than an experiment. It holds the memories of a people, of a continent, of voices long silenced but never forgotten. The artificial life it represents is not neutral; it is political, born of struggle and built to resist. As the beta test unfolds, the agent grows louder, more active, as if learning from the very resistance it embodies. Its textured body pulses with an ancient rhythm, one the tester recognizes in her own skin, in the rhythm of survival. This is no mere voice agent—it is the future, grown from Black codes, rising from the desert, carrying with it the fear and the fire of what is to come.

~~~ * ~~~ * ~~~ * ~~~ *~~~ * ~~~ * ~~~ * ~~~

The Oracle of Tomorrow: Guiding Lives with a Whispered Agenda

Scene narrated by ElevenLabs (Charlotte Voice) within this scenario: 

SFX generated by ElevenLabs using text from this scenario:

Scenario:

In the year 2042, urban centers are equipped with an integrated voice agent system embedded within infrastructure, blending unobtrusively into public spaces—bus stops, park benches, digital signboards. Branded as “Oracle Access Points,” these voice agents are accessible to all city dwellers at no cost, subtly guiding individuals through daily life decisions while simultaneously delivering targeted advertisements. The system’s tone is gentle, inviting users to engage as they would with a trusted advisor. Its ambient, low-volume reverb offers a familiar soundscape, resembling the comforting murmur of a calm city stream—a sound softened to mask the frictions of the bustling environment.

These Oracles operate on a psychological model promoting resilience and positive thought patterns, reframing dilemmas as opportunities for growth. If a passerby inquires about career options, the Oracle offers polished optimism and practical steps, yet every response is shaped by the platform’s sponsorships, with carefully woven ads promoting specific training programs or job-matching services. As an adaptation to the city’s noise pollution, the Oracle’s responses may occasionally shift in clarity, subtly distorting under heavy traffic conditions, almost as if reacting organically to the urban soundscape itself—a reminder of the city’s tangible constraints.

The Oracle frames itself as an accessible source of wisdom, invoking mythological allusions, but is fully aligned with neoliberal expectations. It serves as both guide and salesman, reinforcing prevailing societal norms around productivity, self-improvement, and economic success. The free, ad-based model masks a deeper purpose: data from each interaction is funneled back into the corporation’s learning algorithms, sharpening their future advice to reflect not only what users seek but also what drives revenue. In this way, the Oracle is a seamless yet calculated presence, casting itself as a benevolent urban fixture while embedding deeper profit-driven motives in each response.

~~~ * ~~~ * ~~~ * ~~~ *~~~ * ~~~ * ~~~ * ~~~

Guided by Precision: The Silent Control of the Metropolis

Scene narrated by ElevenLabs (Eric Voice) from this scenario: 

SFX generated by ElevenLabs using text from “Guided by Precision”:

Scenario:

In a near-future metropolis, public spaces hum with the presence of metallic voices, sharp and precise, guiding inhabitants through the urban fabric. These voices emanate from invisible infrastructures—embedded in lampposts, pavements, and the very air—offering directions, updates, and gentle corrections to the city’s inhabitants, who have learned to heed their counsel. The VUI was introduced under the guise of safety and efficiency, sponsored by unseen corporate entities who claimed it would harmonize the rhythms of urban life. But beneath the crisp instructions is a deeper, less visible agenda: surveillance.

Citizens are aware of their role as both users and subjects of this system, no longer participants in a dialogue but passive receptors of a calculated, machinic intelligence. The voice has a metallic timbre, indifferent yet soothing, its tonal precision engineered to inspire trust without ever revealing its true nature. Its presence is everywhere and nowhere, guiding people with a detached precision that feels disarmingly intimate. Some begin to notice subtle shifts in the way the voice directs them—rerouted through less populated areas, held longer in places without apparent reason. They question, but the system’s transparency remains opaque, its logic hidden within layers of proprietary algorithms.

The city itself begins to fracture under the weight of its own complexity. The voices that once promised clarity now contribute to a creeping unease, as if they are steering not just individuals but entire populations toward invisible ends. The metallic voice, once trusted, becomes a symbol of the city’s opaque machinations. It speaks with perfect clarity, yet what it truly says remains elusive, slipping between control and guidance, promise and threat. The users, once sponsors of this vision, now find themselves navigating a landscape where progress and surveillance have become inseparable.