Automatic speech recognition in Luxembourgish. A very first model

The recent advent of highly performant models in Machine Learning had a considerable impact also on automatic speech recognition (ASR) systems in general and on low-resource language in particular. Models that have been trained on thousands of hours of labeled (or unlabeled) speech are achieving today error rates that were inconceivable even ten years ago. However, while these models mainly exist for big languages (i.e. mainly English), small and low-resource languages typically were left out as the preparation of appropriate training material was too costly or too complicated due to the lack of the required high amount of text and audio data. At least since the development of self-supervised learning frameworks like wav2vec2, low-resource languages are experiencing also some considerable advancement in speech recognition and related tasks. Instead of developing an ASR system entirely from scratch for a certain small language, one can now use one of the massive multilingual self-supervised models and fine-tune them with a smaller amount of data for a specific target language.

Ganzen Artikel