Voice Inflection in AI Music

I’m not sure if this is the right place for this, but I wanted to share my experience. I added AI-generated music to a travel video from Utah, and I’m really impressed with the AI vocals. I’ve used SSML to add voice inflection before, but these AI vocals are on a whole new level. How is this achieved? I assume it’s a result of LLMs being trained on high-quality content. Any references to technical papers on this topic would be appreciated.

The AI Language Model song at 2:42 is a hoot!

So where exactly were the vocals generated? How do you know these are AI generated? It sounds more like a tongue in cheek song about someone unfortunate enough to date Replika.

What I have seen is having a person sing nonsense lyrics to an existing song, then generating an ai voice that follows the tonal shifts of the original singer while substituting different words, but I have no idea if that is what is happening here.