What is NeMo?
NVIDIA’s framework for Automatic Speech Recognition (ASR) is called NeMo and it contains a collections of pre-built acoustic models for automatically transcribe spoken language.
Except the acoustic models. NVIDIA also offers pre-built models for Natural Language Processing (NLP) and Text-to-Speech (TTS) but in this tutorial I’m going to write just about ASR.
Installation
Install NeMo:
First of all you need to be sure that you have python 3.6+ before you start. Then, start with few libraries installations and downloading the “main” branch (not the “master”) from GitHub. For more details read [1].
apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython
python -m pip install git+https://github.com/NVIDIA/NeMo.git@main
Then install NeMo’s toolkit by pip in your virtual environment:
pip install nemo-asr
Install the requirements that are found in “NeMo/requirements” folder in the “requirements_asr.txt” file.
python -m pip install -r requirements.txt
python -m pip install -r requirements_asr.txt
Install PyTorch:
The next step will be to Install PyTorch[2] and be sure that you download 1.7.1 version and not the newest one.
Inference with NeMo
We are done and good to go to the final step;) Let’s try NeMo!
The next line will download pre-trained QuartzNet15x5 model from NVIDIA GPU Cloud (NGC) and instantiate it for you:
quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En")
Download a wav file in English that is up to 20 seconds and add the file location:
transcription = quartznet.transcribe(paths2audio_files=['/home/user/wavfile.wav'])
print(transcription)
For those who have problems with editdistance
package and are getting this error: ModuleNotFoundError: No module named ‘editdistance’.
The solution depends on your operation system you use:
Windows:
For Windows, you need to download Visual Studio is 2019 and Visual C++ Build Tools [3].
Ubuntu:
For Ubuntu, the solution also depends on your python version. For Python 3.8 use:
sudo apt-get install python3.8-dev
If you use another version so change it in accordance.
For a faster training you can use GPU instead of CPU you can download CUDA [4] (for Ubuntu) and Apex for using mixed precision:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
End Notes
We went over the basic steps for getting a working environment for inference with NeMo toolkit. If you want to read more about NeMo’s toolkit you can check out one of the tutorials [5] that NVIDIA suggests. If you wish to try other ASR models in more languages, you can continue to my next blog-post.