When starting the pre-training process I get the UnicodeDecodeError error. I have already tried in different distros and I get the same error… I am trying to train with a dataset that has important characters in Spanish such as the letter “ñ”, is there a solution?
Thanks for your future responses
(.venv) root@debian:~/piper/src/python# python3 -m piper_train.preprocess \
> --language es-419 \
> --input-dir ~/piper/my-dataset \
> --output-dir ~/piper/my-training \
> --dataset-format ljspeech \
> --single-speaker \
> --sample-rate 22050
Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/piper/src/python/piper_train/preprocess.py", line 502, in <module>
main()
File "/root/piper/src/python/piper_train/preprocess.py", line 143, in main
for utt in make_dataset(args):
File "/root/piper/src/python/piper_train/preprocess.py", line 422, in ljspeech_dataset
for row in reader:
File "/usr/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 47: invalid continuation byte