SpeechToTextButton Configuration Options
To enhance the speech recognition experience of the Kendo UI for Angular SpeechToTextButton, you can configure how the component handles speech input and provides recognition results.
- Continuous recognition—Enable the SpeechToTextButton to keep listening for additional speech input without requiring the user to click the button again.
- Interim results—Allow the SpeechToTextButton to provide interim results while the user is still speaking.
- Multiple recognition alternatives—Enable the SpeechToTextButton to return multiple recognition alternatives for a given speech input.
- Language recognition—Configure the SpeechToTextButton to recognize speech in different languages by specifying a BCP 47 language tag.
Continuous Recognition
By default, the SpeechToTextButton stops listening after recognizing a single phrase. You can enable continuous recognition to keep listening for additional speech input without requiring the user to click the button again.
To enable continuous recognition, set the continuous
property to true
:
<button kendoSpeechToTextButton
[continuous]="true"
(result)="handleResult($event)">
Click to speak
</button>
When continuous recognition is enabled:
- The button remains active and continues listening after recognizing speech.
- Multiple phrases can be recognized in sequence.
- The recognition session continues until the user manually stops it or an error occurs.
- The component fires multiple
result
events as different phrases are recognized.
The following example demonstrates continuous recognition in action.
Interim Results
The SpeechToTextButton can provide interim (in-progress) results while the user is still speaking. This feature is useful for creating real-time speech-to-text experiences where users can see their words appear as they speak. For example, when working with voice input in messenger or note taking apps.
To enable interim results, set the interimResults
property to true
:
<button kendoSpeechToTextButton
[interimResults]="true"
(result)="handleResult($event)">
Click to speak
</button>
When interim results are enabled:
- The
result
event fires multiple times during speech recognition. - Each result event contains the
isFinal
property indicating whether the result is final or interim. - Interim results may change as the speech recognition engine processes more audio.
- Final results are provided when the engine has completed processing a phrase.
To properly handle interim results, check the isFinal
property in your event handler:
export class AppComponent {
public finalText = '';
public interimText = '';
public handleResult(event: SpeechToTextResultEvent): void {
if (event.alternatives && event.alternatives.length > 0) {
const transcript = event.alternatives[0].transcript;
if (event.isFinal) {
this.finalText += transcript + ' ';
this.interimText = '';
} else {
this.interimText = transcript;
}
}
}
}
The following example demonstrates the interim results functionality.
Multiple Recognition Alternatives
The speech recognition engine can provide multiple alternative transcripts for the same audio input. This is useful when you want to give users choices or implement custom logic to select the best transcript.
To configure the number of transcripts provided, set the maxAlternatives
property:
<button kendoSpeechToTextButton
[maxAlternatives]="3"
(result)="handleResult($event)">
Click to speak
</button>
Each alternative in the alternatives
array contains:
transcript
—The recognized text.confidence
—A confidence score (0-1) indicating the engine's certainty.
Process the multiple alternatives in your event handler:
export class AppComponent {
public selectedTranscript = '';
public allAlternatives: Array<{transcript: string, confidence: number}> = [];
public handleResult(event: SpeechToTextResultEvent): void {
if (event.alternatives && event.alternatives.length > 0) {
this.allAlternatives = event.alternatives.map(alt => ({
transcript: alt.transcript,
confidence: alt.confidence
}));
// Select the alternative with the highest confidence
const bestAlternative = this.allAlternatives.reduce((best, current) =>
current.confidence > best.confidence ? current : best
);
this.selectedTranscript = bestAlternative.transcript;
}
}
}
The following example demonstrates multiple recognition alternatives.
Language Recognition
The SpeechToTextButton supports recognizing speech in different languages by specifying a BCP 47 language tag through the lang
property. This allows you to tailor the speech recognition experience to your application's audience and support multilingual scenarios.
Setting the Language
To configure the language for speech recognition, set the lang
property to the desired BCP 47 language tag (for example, 'en-US'
for American English, 'de-DE'
for German, or 'es-ES'
for Spanish).
<button kendoSpeechToTextButton lang="es-ES"></button>
By default, the SpeechToTextButton uses 'en-US'
(American English) if no language is specified.
The following example demonstrates how to set the language for the SpeechToTextButton.
Supported Languages
The available languages depend on the underlying speech recognition engine. For the browser's Web Speech API, refer to the list of supported languages.
Browser Support and Considerations
These Kendo UI for Angular SpeechToTextButton advanced features rely on the browser's Web Speech API implementation:
- Continuous Recognition (continuous): Supported in Chrome, Edge, and Safari. May have time limits in some browsers.
- Interim Results (interimResults): Supported in Chrome and Edge. Safari support may vary.
- Multiple Alternatives (maxAlternatives): Supported in Chrome and Edge. The number of alternatives may be limited by the browser.
For the most current browser support information, refer to the Web Speech API compatibility table on MDN.
For cross-browser compatibility, consider:
- Providing fallback behavior when features are not supported.
- Testing in your target browsers.
- Using feature detection to enable/disable functionality.
Known Limitations
- Some browsers may impose time limits on continuous recognition sessions.
- Interim results quality varies between browsers and languages.
- The number of alternatives returned may be less than the requested
maxAlternatives
. - Features may not be available when using custom speech recognition providers (
integrationMode="none"
).