Kaldi’s versus other toolkits

Kaldi is similar in aims and scope to HTK. The goal is to have modern and
flexible code, written in C++, that is easy to modify and extend. Important
features include:

Code-level integration with Finite State Transducers (FSTs)
- We compile against the OpenFst toolkit (using it as a library).
Extensive linear algebra support
- We include a matrix library that wraps standard BLAS and LAPACK routines.
Extensible design
- As far as possible, we provide our algorithms in the most generic form possible. For instance, our decoders are templated on an object that provides a score indexed by a (frame, fst-input-symbol) tuple. This means the decoder could work from any suitable source of scores, such as a neural net.
Open license
- The code is licensed under Apache 2.0, which is one of the least restrictive licenses available.
Complete recipes
- Our goal is to make available complete recipes for building speech recognition systems, that work from widely available databases such as those provided by the Linguistic Data Consortium (LDC).

The goal of releasing complete recipes is an important aspect of Kaldi. Since
the code is publicly available under a license that permits modifications and
re-release, we would like to encourage people to release their code, along
with their script directories, in a similar format to Kaldi’s own example
script.

We have tried to make Kaldi’s documentation as complete as possible given time
constraints, but in the short term we cannot hope to generate documentation
that is as thorough as HTK’s. In particular there is a lot of introductory
material in the HTKBook, explaining statistical speech recognition for the
uninitiated, that will probably never appear in Kaldi’s documentation. Much of
Kaldi’s documentation is written in such a way that it will only be accessible
to an expert. In the future we hope to make it somewhat more accessible,
bearing in mind that our intended audience is speech recognition researchers
or researchers-in-training. In general, Kaldi is not a speech recognition
toolkit “for dummies.” It will allow you to do many kinds of operations that
don’t make sense.

软件首页