Getting Started with NVIDIA NeMo ASR
NVIDIA NeMo — Quick Start Guide

This guide will focus on the basics steps for start working with NeMo’s toolkit.
What is NeMo?
NVIDIA’s framework for Automatic Speech Recognition (ASR) is called NeMo and it contains a collections of pre-built acoustic models for automatically transcribe spoken language.
Except the acoustic models. NVIDIA also offers pre-built models for Natural Language Processing (NLP) and Text-to-Speech (TTS) but in this tutorial I’m going to write just about ASR.
Installation
Install NeMo:
First of all you need to be sure that you have python 3.6+ before you start. Then, start with few libraries installations and downloading the “main” branch (not the “master”) from GitHub. For more details read [1].
apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython
python -m pip install git+https://github.com/NVIDIA/NeMo.git@main
Then install NeMo’s toolkit by pip in your virtual environment:
pip install nemo-asr
Install the requirements that are found in “NeMo/requirements” folder in the “requirements_asr.txt” file.
python -m pip install -r requirements.txt
python -m pip install -r requirements_asr.txt
Install PyTorch:
The next step will be to Install PyTorch[2] and be sure that you download 1.7.1 version and not the newest one.
Inference with NeMo
We are done and good to go to the final step;) Let’s try NeMo!
The next line will download pre-trained QuartzNet15x5 model from NVIDIA GPU Cloud (NGC) and instantiate it for you:
quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En")
Download a wav file in English that is up to 20 seconds and add the file location:
transcription = quartznet.transcribe(paths2audio_files=['/home/user/wavfile.wav'])
print(transcription)
For those who have problems with editdistance
package and are getting this error: ModuleNotFoundError: No module named ‘editdistance’.
The solution depends on your operation system you use:
Windows:
For Windows, you need to download Visual Studio is 2019 and Visual C++ Build Tools [3].
Ubuntu:
For Ubuntu, the solution also depends on your python version. For Python 3.8 use:
sudo apt-get install python3.8-dev
If you use another version so change it in accordance.
For a faster training you can use GPU instead of CPU you can download CUDA [4] (for Ubuntu) and Apex for using mixed precision:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
End Notes
We went over the basic steps for getting a working environment for inference with NeMo toolkit. If you want to read more about NeMo’s toolkit you can check out one of the tutorials [5] that NVIDIA suggests. If you wish to try other ASR models in more languages, you can continue to my next blog-post.