For things like numbers and dates you should probably preprocess your text with something like Lingua Franca to convert them to words that are pronounceable by the TTS.
There are some undocumented features I’m still experimenting with, but I agree that in general a separate library should be used. Some of the features that are in there but disabled for now:
- Currency recognition
- “$100.12” (sort of works now)
- Number types
- “1_ordinal” becomes “first” in English
- “1902_year” becomes “nineteen oh two” in English
- Alternative pronunciations
- “read_1” and “read_2” are pronounced like “red” and “reed” respectively
I also have the ability to list abbreviations for a language that are automatically expanded. I’ve got a list for English, like mr -> mister, but I don’t know any for Dutch.
Happy to help and expand on those lists.
If I remeber correctly Mycroft has something like a collaborative system on their website.
(translate.mycroft.ai)
I suppose we could do something similar with just a github directory per language and documentation on what is needed for completing a language.
How should I cope with the following issue?
In the trimming phase of the program, it happens I hear a phrase being pronounced in a way that I don’t found 100% OK, but as I don’t know whether I recorded it twice, I accept it anyway. But then I notice there is indeed a better version. How can I find/delete the first version (without going through all phrases once more)?
The WAV files are all named <id>_<timestamp>.wav
where id
is from the prompts file. So just open that up and search for the text. Then, take the id and find all WAV files that start with it.
@synesthesiam No words, thank you upfront for all the relentless effort, again
I tried to set it up on an Intel NUC that acts as a Rhasspy server running Ubuntu 18.10 and the webserver perfectly loads, but when I try any word, e.g. oog
(=eye), I get the following message:
Error: Failed to fetch
Different browsers give a similar error message.
After that, the Docker container just crashes and nothing specific is shown in the output. It seems to go wrong when it tries to invoke the API at http://<ip>:59125/api/tts?text=oog&phonemes=false
Trying to invoke it directly using Postman gives the same behavior.
Let me know if I can help you figure this out and I’m happy to assist.
You’re welcome
Hmmm…this seem to work for me. Maybe I’m doing something different? This is on an x86_64
laptop:
$ docker run -it -p 59125:5002 rhasspy/larynx:nl-rdh-1
and then in a separate tab:
$ curl -X GET --output /tmp/test.wav 'http://localhost:59125/api/tts?text=oog'
When I aplay /tmp/test.wav
it plays just fine.
In case it makes a difference:
$ docker --version
Docker version 18.09.3, build 774a1f4
Thoughts?
For those who want to use this in Hass.io, there is an add-on available now! It even works on the Raspberry Pi (super slow, but has a cache).
I wish
Doing the exact same thing as you gives the same result, the laryx container stops without a message. This is the output from cur:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (52) Empty reply from server
Docker for me is a version higher at Docker version 19.03.6, build 369ce74a3c
. I also tested it on a brand new alpine VM and that has the same behavior.
Not sure what to test, but if somebody has a clue, I’m happy to test things along.
I know it shouldn’t matter, but is there any chance alpine could be causing an issue? Everything is compiled against a Ubuntu base. Docker should hide the differences, but it’s the only thing I can think of.
Another option might be to start the container with --entrypoint bash
to get a shell prompt, and then run the ENTRYPOINT
command in the Dockerfile manually. Maybe you’ll get an error message when it crashes?
I also encounterd the “empty reply from server” problem while trying to setup the rhasspy/larynx:nl-rdh-1 docker image. I tested the docker image on my laptop and it was working fine. The moment I moved it to my server it stopped working.
I followed the above advice and changed the entrypoint, then ran the ENTRYPOINT command. I got the following error:
Illegal instruction (core dumped)
Googling this message I get mostly tensorflow related problems about having an cpu not supporting an specific extension. I think this is also happening with this docker image.
the supported extensions on my server:
model name : Intel® Atom™ CPU D510 @ 1.66GHz
flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts nopl cpuid aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm
The supported extensions on my laptop:
model name : Intel® Core™ i7-8550U CPU @ 1.80GHz
flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
This is all about a lack of AVX instructions on many of Intel’s processors (Cores and Xeons have them).
I’ve spent weeks trying to get a non-AVX build of PyTorch to work, with no success. It compiles fine, but AVX instructions sneak in no matter what I do. There are half a dozen cmake variables that are supposed to disable them, but some of the submodules in the projects still use them!
And here’s the kicker: this happens even when compiling on a non-AVX system. So the compiler supports putting CPU instructions into a binary when it can’t even execute them.
If anyone knows of a way to completely block AVX instructions from ending up in a compiled binary, I’d love to know
I dont compile a lot, so i’m afraid I cant help that much.
The thing I can find is the GCC -march parameter which allows for choosing the target platform of the binary.
-march=cpu-type
Generate instructions for the machine type cpu-type. In contrast to -mtune=cpu-type, which merely tunes the generated code for the specified cpu-type, -march=cpu-type allows GCC to generate
code that may not run at all on processors other than the one indicated. Specifying -march=cpu-type implies -mtune=cpu-type.
The choices for cpu-type are:
native
This selects the CPU to generate code for at compilation time by determining the processor type of the compiling machine. Using -march=native enables all instruction subsets supported by
the local machine (hence the result might not run on different machines). Using -mtune=native produces code optimized for the local machine under the constraints of the selected
instruction set.
x86-64
A generic CPU with 64-bit extensions.
i386
Original Intel i386 CPU.
i486
Intel i486 CPU. (No scheduling is implemented for this chip.)
i586
pentium
Intel Pentium CPU with no MMX support.
etc
For the submodules, maybe you need te recompile the whole toolchain without AVX?
I’m using voicerss api which is cloud service though, but dutch sounds really nice on it.With this custom component for home assistant installed, it has even more options than the standard home assistant integration. For a live demo: http://www.voicerss.org/api/demo.aspx
Nice, maybe we can add it to TextToSpeech engines in Rhasspy