On short sentences or just single words, most of the times there is a strange noise at the end of the voice output. Not sure, why this happens. If the text gets longer, everything is ok (most of the times).
There is a nice and realistic speaking-break after a comma. But no break after the end of a sentence. That makes the text hard to understand.
Should the API of mimic3-server already be functional?
I think Siwis_low is the best one. But this voice is very close to voice use in public transport in Toulouse. So I don’t like it much. In fact it sounds like an actress reading a book. So good voice but not natural.
Else, I would say zeckout, I feel like I’m listening to an old science teacher.
Edit: I re-listen voices with a good headset, Siwis low is better than the other in terms of quality.
I don’t think there is anything special about those words, it’s just that they happen to be in my voice outputs and Larynx has issues with them. There are probably many more words with issues, I’m just not using them in any voice prompts.
I’d agree in terms of audio quality. Many of the voices I trained from the M-AILabs dataset don’t have great audio quality since they were recorded by volunteers for Librivox using whatever hardware they had.
Thank you! Was it difficult to get working on the phone at all?
This seems to be a general problem with the TTS model I’m using. If the dataset doesn’t contain the speaker saying single words or very short phrases, the model has a hard time producing them. For now, I think I’ll have to consider the M-AILabs voices as intended for reading long-form text only
I think I can at least fix the pausing issues after a period for now
Yes, if you’re running it locally you can check out http://localhost:59125/openapi/ to see what’s available. It should also be compatible with anything that’s meant to talk to MaryTTS. You just have to make sure your “MaryTTS voice” is something like “en_UK/apope_low”.
The other keys mentionned in docu / text-to-speech/#marytts are not explicitely stored in the JSON, but visible in the Rhasspy UI (de_DE, thorsten_low).
Putting that combined in the “Voice” field doesn’t help changing back leads to the locale also beeing stored in the JSON, but still this results in
TtsException: file does not start with RIFF id
What did I miss or could do better?
Tests with “http://external-ip:59125/” work quite good, calling with “openapi” postfix results in 404 error…
@synesthesiam
I like the announcement!
Is there already a date when the mimic3 repository will be online in the Github? I would like to test the Debian packages.
I’ll have to check this myself. The Mimic 3 server should also work with Rhasspy’s “remote TTS” option, but I need to double check I haven’t broken anything with that either!
Hopefully next month, but I sent you a link with the beta packages
Das ist ein Test in deutsch <voice name="en_US/vctk_low#p236">and this is an test in english.</voice>
… and Rhasspy speaks two languages in one sentence - cool.
It runs a bit slow on my old machine without GPU. With enough power and cache it will definitely get better.
Awesome! The way to speed this up is to run mimic-server as a service (check the source code for a systemd unit example), and then use mimic3 --remote ... so it will use the web server instead.
Btw, I think Mycroft should link to some demo’s in their Mimic 3 blog post announcement.
If people could hear presumably how good the TTS sounds they’d be more likely to sign up and get involved. My 2 cents.
unfortunately i cannot sent PM as a new user and therefore cant test RTF’s for different architectures. Can someone give hints about RTFs, maybe for ARM?
Hi @The1And0, on 64-bit ARM you can get an RTF of around 0.5. 32-bit ARM is slower, around 1.2 or 1.3. If you’re on a 64-bit x86/64 machine though, it can be 10x faster than ARM