Getting Started with NVIDIA NeMo ASR

Gal Hever
3 min readApr 20, 2021

NVIDIA NeMo — Quick Start Guide

This guide will focus on the basics steps for start working with NeMo’s toolkit.

What is NeMo?

NVIDIA’s framework for Automatic Speech Recognition (ASR) is called NeMo and it contains a collections of pre-built acoustic models for automatically transcribe spoken language.

Except the acoustic models. NVIDIA also offers pre-built models for Natural Language Processing (NLP) and Text-to-Speech (TTS) but in this tutorial I’m going to write just about ASR.

Installation

Install NeMo:

First of all you need to be sure that you have python 3.6+ before you start. Then, start with few libraries installations and downloading the “main” branch (not the “master”) from GitHub. For more details read [1].

apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython
python -m pip install git+https://github.com/NVIDIA/NeMo.git@main

Then install NeMo’s toolkit by pip in your virtual environment:

pip install nemo-asr

Install the requirements that are found in “NeMo/requirements” folder in the “requirements_asr.txt” file.

python -m pip install -r requirements.txt

python -m pip install -r requirements_asr.txt

Install PyTorch:

The next step will be to Install PyTorch[2] and be sure that you download 1.7.1 version and not the newest one.

Inference with NeMo

We are done and good to go to the final step;) Let’s try NeMo!

The next line will download pre-trained QuartzNet15x5 model from NVIDIA GPU Cloud (NGC) and instantiate it for you:

quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En")

Download a wav file in English that is up to 20 seconds and add the file location:

transcription = quartznet.transcribe(paths2audio_files=['/home/user/wavfile.wav'])
print(transcription)

For those who have problems with editdistance package and are getting this error: ModuleNotFoundError: No module named ‘editdistance’.

The solution depends on your operation system you use:

Windows:

For Windows, you need to download Visual Studio is 2019 and Visual C++ Build Tools [3].

Ubuntu:

For Ubuntu, the solution also depends on your python version. For Python 3.8 use:

sudo apt-get install python3.8-dev

If you use another version so change it in accordance.

For a faster training you can use GPU instead of CPU you can download CUDA [4] (for Ubuntu) and Apex for using mixed precision:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

End Notes

We went over the basic steps for getting a working environment for inference with NeMo toolkit. If you want to read more about NeMo’s toolkit you can check out one of the tutorials [5] that NVIDIA suggests. If you wish to try other ASR models in more languages, you can continue to my next blog-post.

References

--

--