Introducing Music & Speech Generation on getimg.ai
Picture, motion, and now sound. As of today, getimg.ai generates audio: two new Actions in the sidebar, one for music and one for speech, sitting on the same plan as everything else you make here. The song, the voiceover, and the visuals they belong to finally come out of one app.
Videos generated on getimg.ai already come with sound: models like Google Veo 3.1 and Kling 2.6 produce audio natively, right with the picture. But a lot of professional work needs audio on its own. Sometimes that's about length: a full song, narration for a ten-minute edit, an ambience bed under an entire scene, none of which fits in a clip that tops out around 15 seconds. And sometimes the job is simply audio: a podcast segment, a voiceover for footage you already have, a theme for a campaign. The new Actions cover both.
Two new Actions
Like every Action on getimg.ai, each one is a dedicated page with a simple shape: describe what you need, set a couple of controls, generate.
Create music
Create music Action is powered by Google Lyria 3 Pro, currently the strongest music generation model available. Two things set it apart.
First, cohesion. Lyria 3 Pro holds a song together from first note to last: the voice stays the same singer, the genre doesn't drift, and the arrangement builds the way a produced track should. That's the area where AI music generation has historically fallen apart, and it's the reason the output sounds finished rather than generated.
Folk (Polish)
Rap (English)
Instrumental
Folk (Polish)
Rap (English)
Instrumental
Second, control on your terms. You can give it almost nothing, just a genre and a general theme, and the model writes the lyrics and structure itself. Or you can be exact: paste a finished lyric sheet, describe the voice that should sing it, and Lyria 3 Pro performs it word for word.
It works across genres and languages, you pick the length (30 seconds, one, two, or three minutes) and how many variations to hear back, and generation is priced per song, so the cost of a track is predictable before you make it.
Pop (Spanish)
Rock (French)
Pop (Spanish)
Rock (French)
Create speech
Create speech turns a written script into natural spoken audio. Paste a script, pick a voice, and get a finished AI voice over for explainers, e-learning modules, product demos, podcast segments, and ads. When the script changes, edit the text and run the voice generator again instead of re-recording.
How-To
Nature Documentary
Video Game Character
How-To
Nature Documentary
Video Game Character
There are plenty of voices to choose from, each with its own character, from soft and warm to bright and upbeat. And you direct the delivery inside the script itself: drop a cue like `[gentle, emotional]` before a line and the read shifts to match, as many times as the script needs. Pricing is per 1,000 characters, so a read costs what its length costs.
How to prompt for audio
Simple prompts already work. Describe what you need in plain language and the model fills in the rest, so detail is a form of control, not a requirement for quality. When you do want to steer the result, here's where the steering happens.
Music
Start as small as a genre and a theme: "an upbeat indie-pop song about leaving town" is a complete prompt, and Lyria 3 Pro writes the lyrics and structure itself. Add detail only where you want ownership. Paste exact lyrics and they're sung as written.
Describe the voice ("a low, smoky vocal with an unhurried delivery") and that's the voice that performs the track. You have up to 4,056 characters to work with, and prompt enhancement can be turned on or off.
a piano power ballad about missing home
cathedral choir fused with euphoric drum and bass energy
industrial techno cowboy rave with distorted harmonicas and galloping percussion
a piano power ballad about missing home
cathedral choir fused with euphoric drum and bass energy
industrial techno cowboy rave with distorted harmonicas and galloping percussion
Check out our guide to prompting for music to learn more!
Speech
The script is the prompt: write the text exactly as it should be read, and treat revisions the way you'd treat a copy edit: change the line, generate again, compare the takes.
Video Essay
Luxury Ad
News Reporter
Video Essay
Luxury Ad
News Reporter
Choose a voice for the overall character, then steer tone and pace line by line with inline cues like `[warm]` or `[brisk, matter-of-fact]`. It reads in more than just English, too.
One subscription, the whole production
A campaign rarely needs just one kind of asset. The song, the spot, the stills, and the voiceover that ties them together can now all come out of the same app: audio generation sits alongside 16 image models and 17 video models, including FLUX.2, Seedream 5.0 Lite, Nano Banana 2, Google Veo 3.1, HappyHorse 1, Seedance 2.0, and Kling 3.0 Pro. No exporting between tools, no juggling logins, and no separate audio subscription to justify.
The Bottom Line
The picture was never the whole production. getimg.ai now makes the sound too: Lyria 3 Pro for music that holds together from first note to last, and speech you direct line by line.
Sign in, look at the sidebar, and pick an Action: Create music or Create speech. Bring a lyric sheet, a script, or just two words about the mood you're after, and see what comes back.

