I am trying to adopt custom TTS script written for snips: jarvis_says
And I get this error: [DEBUG:2020-06-23 14:08:01,062] rhasspytts_cli_hermes: Got 554 byte(s) of WAV data [DEBUG:2020-06-23 14:08:01,063] rhasspytts_cli_hermes: -> AudioPlayBytes(554 byte(s)) [ERROR:2020-06-23 14:08:01,064] rhasspytts_cli_hermes: handle_say Traceback (most recent call last): File "rhasspy-tts-cli-hermes/rhasspytts_cli_hermes/__init__.py", line 159, in handle_say File "rhasspy-tts-cli-hermes/rhasspytts_cli_hermes/utils.py", line 9, in get_wav_duration File "wave.py", line 510, in open File "wave.py", line 164, in __init__ File "wave.py", line 131, in initfp wave.Error: file does not start with RIFF id [DEBUG:2020-06-23 14:08:01,069] rhasspytts_cli_hermes: -> TtsError(error='file does not start with RIFF id', site_id='default', context=None, session_id=None)
I found workaround by just adding aplay at the end of script to play output. But would like to know what could be an issue. Script generates wav files as standard output using mpg123 -w.
[DEBUG:2020-06-25 06:31:32,386] rhasspytts_cli_hermes: Got 69718 byte(s) of WAV data
[DEBUG:2020-06-25 06:31:32,388] rhasspytts_cli_hermes: -> AudioPlayBytes(69718 byte(s))
[ERROR:2020-06-25 06:31:32,391] rhasspytts_cli_hermes: handle_say
Traceback (most recent call last):
File “rhasspy-tts-cli-hermes/rhasspytts_cli_hermes/init.py”, line 159, in handle_say
File “rhasspy-tts-cli-hermes/rhasspytts_cli_hermes/utils.py”, line 9, in get_wav_duration
File “wave.py”, line 510, in open
File “wave.py”, line 164, in init
File “wave.py”, line 131, in initfp
wave.Error: file does not start with RIFF id
[DEBUG:2020-06-25 06:31:32,396] rhasspytts_cli_hermes: -> TtsError(error=‘file does not start with RIFF id’, site_id=‘default’, context=None, session_id=None)
Can you save the out file somewhere and than do file your.wav from the command line?
The error says there is something wrong with the headers of the wav or they are missing.
The file command should give you an output like:
instead of the cat you could use sox your.wav -L -e signed-integer -c 1 -r 16000 -b 16 -t wav - which should output the wav to standard out in the right format although im not sure if you are not going to get a header length warning that way but that should work.
Are there any other commands in your script that are printing stuff to standard out? This would explain the behavior, since your WAV file is fine but Rhasspy is getting back text + WAV data from the script.
Here is full script and how to: ( I didn’t develop this script, I just adapted it to use with Rhasspy)
If running Rhasspy in docker environment then copy script to profile folder and the adjust path.
# Shell script to replace TTS in Rhasspy with AWS polly
# Install and configure aws cli as per https://docs.aws.amazon.com/polly/latest/dg/getting-started-cli.html
# Installed in /home/<user>/.local/bin, configure with aws configure and provide key, secret, etc.
# Under Rhasspy Web UI change TTS config to Local Command and provide path to the script f.e: /home/<user>/..rhasspy/profiles/en/rhasspy_says.sh
# make script executable (#chmod a+x /home/<user>/..rhasspy/profiles/en/rhasspy_says.sh)
# install mpg123 (# apt-get install mpg123) for the mp3->wav conversion
# Update following:
# 1. AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION
# 2. Path to cache folder: f.e: /home/<user>/rhasspy/poly (depending on your usage and available space you may need to clean content of this folder)
# 3. Path to awscli (# which aws)
# 4. Change your favorite Voice to use (https://docs.aws.amazon.com/polly/latest/dg/voicelist.html)
# 5. Choose your Language
# Input text and parameters will be used to calculate a hash for caching the mp3 files so only
# "new speech" will call polly, existing mp3s will be transformed in wav files directly
# Folder to cache the files - this also contains the .txt file with all generated mp3
# Path to aws binary
# Voice to use
# Lang to use
echo 'Lang: ' $lang >&2
###### Should not need to change parameters below this
# format to use
# Sample rate to use
# passed text string
echo 'Input text:' $text >&2
name=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 20 | head -n 1)
# target file to return to rhasspy-tts (wav)
echo 'Output file:' $outfile >&2
# check/create cache if needed
mkdir -pv "$cache"
# hash for the string based on params and text
echo 'Using string for hash': $md5string >&2
hash="$(echo -n "$md5string" | md5sum | sed 's/ .*$//')"
echo 'Calculated hash:' $hash >&2
echo 'Cache file:' $cachefile >&2
# do we have this?
if [ -f "$cachefile" ]
echo "$cachefile found." >&2
mpg123 -w "$outfile" "$cachefile"
echo "$cachefile not found, running polly" >&2
# execute polly to get mp3 - check paths, voice set to $voice
$awscli polly synthesize-speech --output-format "$format" --voice-id "$voice" \
--sample-rate "$samplerate" --text-type ssml --text "$text" "$cachefile" >&2
# update index
echo "$hash" "$md5string" >> "$cache"index.txt
# execute conversion to wav
mpg123 -w $outfile $cachefile