Java¶
DeepSpeechModel¶
-
class
DeepSpeechModel¶ Exposes a DeepSpeech model in Java.
Public Functions
-
org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.DeepSpeechModel(String modelPath, String alphabetPath, int beam_width) An object providing an interface to a trained DeepSpeech model.
- Parameters
modelPath: The path to the frozen model graph.alphabetPath: The path to the configuration file specifying the alphabet used by the network. See alphabet.h.beam_width: The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.
-
void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.freeModel() Frees associated resources and destroys model object.
-
void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.enableDecoderWihLM(String lm, String trie, float lm_alpha, float lm_beta) Enable decoding using beam scoring with a KenLM language model.
- Return
Zero on success, non-zero on failure (invalid arguments).
- Parameters
lm: The path to the language model binary file.trie: The path to the trie file build from the same vocabulary as the language model binary.lm_alpha: The alpha hyperparameter of the CTC decoder. Language Model weight.lm_beta: The beta hyperparameter of the CTC decoder. Word insertion weight.
-
Metadata org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.sttWithMetadata(short [] buffer, int buffer_size, int sample_rate) Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.
- Return
Outputs a Metadata object of individual letters along with their timing information.
- Parameters
buffer: A 16-bit, mono raw audio signal at the appropriate sample rate.buffer_size: The number of samples in the audio signal.sample_rate: The sample-rate of the audio signal.
-
DeepSpeechStreamingState org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.createStream(int sample_rate) Create a new streaming inference state. The streaming state returned by this function can then be passed to feedAudioContent() and finishStream().
- Return
An opaque object that represents the streaming state.
- Parameters
sample_rate: The sample-rate of the audio signal.
-
void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.feedAudioContent(DeepSpeechStreamingState ctx, short [] buffer, int buffer_size) Feed audio samples to an ongoing streaming inference.
- Parameters
cctx: A streaming state pointer returned by createStream().buffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate.buffer_size: The number of samples inbuffer.
-
String org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.intermediateDecode(DeepSpeechStreamingState ctx) Compute the intermediate decoding of an ongoing streaming inference. This is an expensive process as the decoder implementation isn’t currently capable of streaming, so it always starts from the beginning of the audio.
- Return
The STT intermediate result.
- Parameters
ctx: A streaming state pointer returned by createStream().
-
String org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.finishStream(DeepSpeechStreamingState ctx) Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.
- Return
The STT result.
- Note
This method will free the state pointer (
ctx).- Parameters
ctx: A streaming state pointer returned by createStream().
-
Metadata org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.finishStreamWithMetadata(DeepSpeechStreamingState ctx) Signal the end of an audio signal to an ongoing streaming inference, returns per-letter metadata.
- Return
Outputs a Metadata object of individual letters along with their timing information.
- Note
This method will free the state pointer (
ctx).- Parameters
ctx: A streaming state pointer returned by createStream().
-
Metadata¶
-
class
Metadata¶ Stores the entire CTC output as an array of character metadata objects
Public Functions
-
MetadataItem org.mozilla.deepspeech.libdeepspeech.Metadata.getItems() List of items
-
int org.mozilla.deepspeech.libdeepspeech.Metadata.getNum_items() Size of the list of items
-
MetadataItem org.mozilla.deepspeech.libdeepspeech.Metadata.getItem(int i) Retrieve one MetadataItem element
- Return
The MetadataItem requested or null
- Parameters
i: Array index of the MetadataItem to get
-
MetadataItem¶
-
class
MetadataItem¶ Stores each individual character, along with its timing information
Public Functions
-
String org.mozilla.deepspeech.libdeepspeech.MetadataItem.getCharacter() The character generated for transcription
-
int org.mozilla.deepspeech.libdeepspeech.MetadataItem.getTimestep() Position of the character in units of 20ms
-
float org.mozilla.deepspeech.libdeepspeech.MetadataItem.getStart_time() Position of the character in seconds
-