Java¶

DeepSpeechModel¶

class DeepSpeechModel¶

Exposes a DeepSpeech model in Java.

Public Functions

org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.DeepSpeechModel(String modelPath, String alphabetPath, int beam_width)

An object providing an interface to a trained DeepSpeech model.

Parameters

modelPath: The path to the frozen model graph.
alphabetPath: The path to the configuration file specifying the alphabet used by the network. See alphabet.h.
beam_width: The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.

void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.freeModel(): Frees associated resources and destroys model object.

void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.enableDecoderWihLM(String lm, String trie, float lm_alpha, float lm_beta)

Enable decoding using beam scoring with a KenLM language model.

Return

Zero on success, non-zero on failure (invalid arguments).

Parameters

lm: The path to the language model binary file.
trie: The path to the trie file build from the same vocabulary as the language model binary.
lm_alpha: The alpha hyperparameter of the CTC decoder. Language Model weight.
lm_beta: The beta hyperparameter of the CTC decoder. Word insertion weight.

Metadata org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.sttWithMetadata(short [] buffer, int buffer_size, int sample_rate)

Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.

Return

Outputs a Metadata object of individual letters along with their timing information.

Parameters

buffer: A 16-bit, mono raw audio signal at the appropriate sample rate.
buffer_size: The number of samples in the audio signal.
sample_rate: The sample-rate of the audio signal.

DeepSpeechStreamingState org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.createStream(int sample_rate)

Create a new streaming inference state. The streaming state returned by this function can then be passed to feedAudioContent() and finishStream().

Return

An opaque object that represents the streaming state.

Parameters

sample_rate: The sample-rate of the audio signal.

void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.feedAudioContent(DeepSpeechStreamingState ctx, short [] buffer, int buffer_size)

Feed audio samples to an ongoing streaming inference.

Parameters

cctx: A streaming state pointer returned by createStream().
buffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate.
buffer_size: The number of samples in buffer.

String org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.intermediateDecode(DeepSpeechStreamingState ctx)

Compute the intermediate decoding of an ongoing streaming inference. This is an expensive process as the decoder implementation isn’t currently capable of streaming, so it always starts from the beginning of the audio.

Return

The STT intermediate result.

Parameters

ctx: A streaming state pointer returned by createStream().

String org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.finishStream(DeepSpeechStreamingState ctx)

Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The STT result.

Note

This method will free the state pointer (ctx).

Parameters

ctx: A streaming state pointer returned by createStream().

Metadata org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.finishStreamWithMetadata(DeepSpeechStreamingState ctx)

Signal the end of an audio signal to an ongoing streaming inference, returns per-letter metadata.

Return

Outputs a Metadata object of individual letters along with their timing information.

Note

This method will free the state pointer (ctx).

Parameters

ctx: A streaming state pointer returned by createStream().

Metadata¶

class Metadata¶

Stores the entire CTC output as an array of character metadata objects

Public Functions

MetadataItem org.mozilla.deepspeech.libdeepspeech.Metadata.getItems(): List of items

int org.mozilla.deepspeech.libdeepspeech.Metadata.getNum_items(): Size of the list of items

MetadataItem org.mozilla.deepspeech.libdeepspeech.Metadata.getItem(int i)

Retrieve one MetadataItem element

Return

The MetadataItem requested or null

Parameters

i: Array index of the MetadataItem to get

MetadataItem¶

class MetadataItem¶

Stores each individual character, along with its timing information

Public Functions

String org.mozilla.deepspeech.libdeepspeech.MetadataItem.getCharacter(): The character generated for transcription

int org.mozilla.deepspeech.libdeepspeech.MetadataItem.getTimestep(): Position of the character in units of 20ms

float org.mozilla.deepspeech.libdeepspeech.MetadataItem.getStart_time(): Position of the character in seconds