Mimic 3 TTS Preview

Using Pi4 64bit mimic3-server.

m-ailabs_low:

  • On short sentences or just single words, most of the times there is a strange noise at the end of the voice output. Not sure, why this happens. If the text gets longer, everything is ok (most of the times).
  • There is a nice and realistic speaking-break after a comma. But no break after the end of a sentence. That makes the text hard to understand.

Should the API of mimic3-server already be functional?

I tested German and English voices on a smartphone running Mobian (Pocophone F1). Impressive speed and quality!

I think Siwis_low is the best one. But this voice is very close to voice use in public transport in Toulouse. So I don’t like it much. In fact it sounds like an actress reading a book. So good voice but not natural.

Else, I would say zeckout, I feel like I’m listening to an old science teacher.

Edit: I re-listen voices with a good headset, Siwis low is better than the other in terms of quality.

I don’t think there is anything special about those words, it’s just that they happen to be in my voice outputs and Larynx has issues with them. There are probably many more words with issues, I’m just not using them in any voice prompts.

1 Like

I’d agree in terms of audio quality. Many of the voices I trained from the M-AILabs dataset don’t have great audio quality since they were recorded by volunteers for Librivox using whatever hardware they had.

Thank you! Was it difficult to get working on the phone at all?

This seems to be a general problem with the TTS model I’m using. If the dataset doesn’t contain the speaker saying single words or very short phrases, the model has a hard time producing them. For now, I think I’ll have to consider the M-AILabs voices as intended for reading long-form text only :confused:

I think I can at least fix the pausing issues after a period for now :+1:

Yes, if you’re running it locally you can check out http://localhost:59125/openapi/ to see what’s available. It should also be compatible with anything that’s meant to talk to MaryTTS. You just have to make sure your “MaryTTS voice” is something like “en_UK/apope_low”.

Was it difficult to get working on the phone at all?

No. It was as easy as for a Debian or Ubuntu computer.

1 Like

Hi there, got the server up and running (manualy for now).

Settings are saved (wrt. to maryTTS) as follows:

    "text_to_speech": {
        "marytts": {
            "voice": "thorsten_low"
        },
        "system": "marytts"
    },

The other keys mentionned in docu / text-to-speech/#marytts are not explicitely stored in the JSON, but visible in the Rhasspy UI (de_DE, thorsten_low).
Putting that combined in the “Voice” field doesn’t help changing back leads to the locale also beeing stored in the JSON, but still this results in

TtsException: file does not start with RIFF id

What did I miss or could do better?

Tests with “http://external-ip:59125/” work quite good, calling with “openapi” postfix results in 404 error…

@synesthesiam
I like the announcement! :slightly_smiling_face:
Is there already a date when the mimic3 repository will be online in the Github? I would like to test the Debian packages.

Greetings, Jens

Wow, how could Portuguese (Brazilian) not be on the list?

I’ll have to check this myself. The Mimic 3 server should also work with Rhasspy’s “remote TTS” option, but I need to double check I haven’t broken anything with that either!

Hopefully next month, but I sent you a link with the beta packages :slight_smile:

It was, but people told me that the voice I trained wasn’t understandable. I used this dataset: https://github.com/Edresson/TTS-Portuguese-Corpus

Do you know of any other TTS Portuguese datasets?

2 Likes

sorry no. I’m clueless about lang models, data, etc.

did you look here? Hugging Face – The AI community building the future.

Nice! :+1:

    "text_to_speech": {
        "command": {
            "say_arguments": " --ssml --voice 'de_DE/m-ailabs_low#rebecca_braunert_plunkett' ",
            "say_program": "mimic3"
        },
        "satellite_site_ids": "default",
        "system": "command"
    },
Das ist ein Test in deutsch <voice name="en_US/vctk_low#p236">and this is an test in english.</voice>

… and Rhasspy speaks two languages in one sentence - cool. :sunglasses:
It runs a bit slow on my old machine without GPU. With enough power and cache it will definitely get better.

Greetings, Jens

4 Likes

I didn’t, but I don’t see any useful data there :frowning:

Awesome! The way to speed this up is to run mimic-server as a service (check the source code for a systemd unit example), and then use mimic3 --remote ... so it will use the web server instead.

Calling it up via the web interface wasn’t faster either. Now I have to pimp my base a bit first…

Btw, I think Mycroft should link to some demo’s in their Mimic 3 blog post announcement.
If people could hear presumably how good the TTS sounds they’d be more likely to sign up and get involved. My 2 cents.

1 Like

Will this be a drop in replacement?

Hello,

unfortunately i cannot sent PM as a new user and therefore cant test RTF’s for different architectures. Can someone give hints about RTFs, maybe for ARM?

Thanks

Hi @The1And0, on 64-bit ARM you can get an RTF of around 0.5. 32-bit ARM is slower, around 1.2 or 1.3. If you’re on a 64-bit x86/64 machine though, it can be 10x faster than ARM :slight_smile:

Try it out for yourself: https://github.com/mycroftAI/mimic3

Oh, there’s a Docker image now. :+1: (Although apparently without harvard-glow_tts yet?)

Is it compatible with the “Remote HTTP” TTS option of Rhasspy?