JavaScript (NodeJS / ElectronJS)¶
Model¶
-
class
Model(aModelPath, aNCep, aNContext, aAlphabetConfigPath, aBeamWidth)¶ An object providing an interface to a trained DeepSpeech model.
- Arguments
aModelPath (string) – The path to the frozen model graph.
aNCep (number) – The number of cepstrum the model was trained with.
aNContext (number) – The context window the model was trained with.
aAlphabetConfigPath (string) – The path to the configuration file specifying the alphabet used by the network. See alphabet.h.
aBeamWidth (number) – The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.
- Throws
on error
-
Model.createStream(aSampleRate)¶ Create a new streaming inference state. The streaming state returned by this function can then be passed to
Model.feedAudioContent()andModel.finishStream().- Arguments
aSampleRate (number) – The sample-rate of the audio signal.
- Throws
on error
- Returns
object – an opaque object that represents the streaming state.
-
Model.enableDecoderWithLM(aAlphabetConfigPath, aLMPath, aTriePath, aLMAlpha, aLMBeta)¶ Enable decoding using beam scoring with a KenLM language model.
- Arguments
aAlphabetConfigPath (string) – The path to the configuration file specifying the alphabet used by the network. See alphabet.h.
aLMPath (string) – The path to the language model binary file.
aTriePath (string) – The path to the trie file build from the same vocabulary as the language model binary.
aLMAlpha (float) – The alpha hyperparameter of the CTC decoder. Language Model weight.
aLMBeta (float) – The beta hyperparameter of the CTC decoder. Word insertion weight.
- Returns
number – Zero on success, non-zero on failure (invalid arguments).
-
Model.feedAudioContent(aSctx, aBuffer, aBufferSize)¶ Feed audio samples to an ongoing streaming inference.
- Arguments
aSctx (object) – A streaming state returned by
Model.setupStream().aBuffer (buffer) – An array of 16-bit, mono raw audio samples at the appropriate sample rate.
aBufferSize (number) – The number of samples in @param aBuffer.
-
Model.finishStream(aSctx)¶ Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.
- Arguments
aSctx (object) – A streaming state returned by
Model.setupStream().
- Returns
string – The STT result. This method will free the state (@param aSctx).
-
Model.finishStreamWithMetadata(aSctx)¶ Signal the end of an audio signal to an ongoing streaming inference, returns per-letter metadata.
- Arguments
aSctx (object) – A streaming state pointer returned by
Model.setupStream().
- Returns
object – Outputs a
Metadata()struct of individual letters along with their timing information. The user is responsible for freeing Metadata by callingFreeMetadata(). This method will free the state pointer (@param aSctx).
-
Model.intermediateDecode(aSctx)¶ Compute the intermediate decoding of an ongoing streaming inference. This is an expensive process as the decoder implementation isn’t currently capable of streaming, so it always starts from the beginning of the audio.
- Arguments
aSctx (object) – A streaming state returned by
Model.setupStream().
- Returns
string – The STT intermediate result.
-
Model.stt(aBuffer, aBufferSize, aSampleRate)¶ Use the DeepSpeech model to perform Speech-To-Text.
- Arguments
aBuffer (object) – A 16-bit, mono raw audio signal at the appropriate sample rate.
aBufferSize (number) – The number of samples in the audio signal.
aSampleRate (number) – The sample-rate of the audio signal.
- Returns
string – The STT result. Returns undefined on error.
-
Model.sttWithMetadata(aBuffer, aBufferSize, aSampleRate)¶ Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.
- Arguments
aBuffer (object) – A 16-bit, mono raw audio signal at the appropriate sample rate.
aBufferSize (number) – The number of samples in the audio signal.
aSampleRate (number) – The sample-rate of the audio signal.
- Returns
object – Outputs a
Metadata()struct of individual letters along with their timing information. The user is responsible for freeing Metadata by callingFreeMetadata(). Returns undefined on error.
Module exported methods¶
-
FreeModel(model)¶ Frees associated resources and destroys model object.
- Arguments
model (object) – A model pointer returned by
Model()
-
FreeStream(stream)¶ Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.
- Arguments
stream (Object) – A streaming state pointer returned by
Model.createStream().
-
FreeMetadata(metadata)¶ Free memory allocated for metadata information.
- Arguments
metadata (object) – Object containing metadata as returned by
Model.sttWithMetadata()orModel.finishStreamWithMetadata()
-
printVersions()¶ Print version of this library and of the linked TensorFlow library on standard output.
Metadata¶
-
class
Metadata()¶ Stores the entire CTC output as an array of character metadata objects
-
Metadata.confidence()¶ Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcription.
- Returns
float – Confidence value
-
Metadata.items()¶ List of items
- Returns
array – List of
MetadataItem()
-
Metadata.num_items()¶ Size of the list of items
- Returns
int – Number of items
-
MetadataItem¶
-
class
MetadataItem()¶ Stores each individual character, along with its timing information
-
MetadataItem.character()¶ The character generated for transcription
- Returns
string – The character generated
-
MetadataItem.start_time()¶ Position of the character in seconds
- Returns
float – The position of the character
-
MetadataItem.timestep()¶ Position of the character in units of 20ms
- Returns
int – The position of the character
-