SpeechToTextButton Integration
The Kendo UI for Angular SpeechToTextButton provides flexible integration options for speech recognition. By default, it uses the browser's native Web Speech API for speech-to-text functionality, providing a seamless out-of-the-box experience.
Default Integration (Web Speech API)
The SpeechToTextButton works immediately without any additional configuration. Simply add the component to your template and handle the result
event:
<button kendoSpeechToTextButton
(result)="handleResult($event)">
Click to speak
</button>
export class AppComponent {
public handleResult(event: SpeechToTextResultEvent): void {
if (event.alternatives && event.alternatives.length > 0) {
this.textAreaValue += event.alternatives[0].transcript + ' ';
}
}
}
The Web Speech API provides:
- Built-in speech recognition without external dependencies
- Support for multiple languages
- Real-time processing
- Interim results capability
- Multiple recognition alternatives
For details on enabling features such as continuous recognition, interim results, and changing the recognition language, refer to the Speech Configurations documentation.
The following example demonstrates the default integration with the Web Speech API.
Third-Party Integration
For scenarios requiring custom speech recognition providers, you can integrate the component with third-party services by setting the integrationMode
property to none
.
This approach is ideal when you want to:
- Use a cloud-based speech recognition service (such as Azure, Google, or AWS).
- Integrate with an on-premises or proprietary speech-to-text engine.
- Implement custom business logic for processing audio input.
How Third-Party Integration Works
When integrationMode
is set to none
, the SpeechToTextButton disables its built-in speech recognition. Instead, you can handle the button's events to implement your own speech recognition logic. Typically, you will:
- Capture audio from the user (using the browser's APIs or a custom solution).
- Send the audio data to your chosen speech-to-text provider.
- Process the provider's response and update your UI accordingly.
When the SpeechToTextButton is used for the first time, the browser will prompt the user for permission to access the microphone. This is required for capturing audio input. Users must grant permission for speech recognition to function.
Integration Flow
When integrationMode="none"
, the SpeechToTextButton does not emit the result
event—only the start
and end
events are triggered. This means you must handle audio capture and speech recognition entirely within your own logic. Typically, you will:
- Use the
start
event to begin capturing audio from the user. - Use the
end
event to stop audio capture and send the recorded audio to your chosen speech-to-text provider. - Process the provider's response and update your UI with the recognized text.
- The
result
anderror
events are not emitted in this mode, as the component does not perform any recognition itself.
All result handling and UI updates should be implemented in your custom event handlers, as the component will not emit recognition results when using third-party or custom integrations.
Implementation Example
Below is a template showing where to implement your custom speech recognition logic. Refer to your provider's documentation for the exact API usage and requirements.
@Component({
// ...existing metadata...
template: `
<button kendoSpeechToTextButton
integrationMode="none"
(start)="onStart()"
(end)="onEnd()"
>{{ isListening ? 'Listening...' : 'Click to speak' }}</button>
<kendo-textarea [readonly]="true" [value]="textAreaValue"></kendo-textarea>
`
})
export class AppComponent {
public textAreaValue = '';
public isListening = false;
// Example: Add your API key and region here for your provider
private azureSubscriptionKey = '<YOUR_AZURE_SPEECH_KEY>'; // Get this from Azure Portal > Your Speech resource > Keys and Endpoint
private azureRegion = '<REGION_IDENTIFIER>'; // Example: 'westeurope', 'eastus'
// Add any other required properties for your integration here
public onStart(): void {
this.isListening = true;
// Start recording audio here
// If your provider supports live/streaming results, you can send audio chunks and update textAreaValue as results arrive.
// For batch providers, just record audio and wait for final result in onEnd.
}
public onEnd(): void {
this.isListening = false;
// Stop recording and send audio to your speech-to-text provider here
// Example:
// 1. Prepare audio data (e.g., as a Blob)
// 2. Send to your provider's API endpoint
// 3. Handle the response and update textAreaValue with the recognized text
//
// See provider documentation for details:
// - Azure: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-speech-to-text-short
// - Google: https://cloud.google.com/speech-to-text/docs/reference/rest
// - AWS: https://docs.aws.amazon.com/transcribe/latest/dg/API_Reference.html
//
// Example (simulated result):
this.textAreaValue = 'Hello from external speech recognition!';
}
}
When integrating with third-party or custom speech-to-text providers (
integrationMode="none"
), you are responsible for:
- Capturing and recording audio from the user.
- Converting the audio to the format required by your provider (e.g., WAV/PCM).
- Handling authentication (such as API keys).
- Sending the audio to the provider's API and processing the response.
Refer to your provider's documentation for details on supported formats and API usage.
The following demo shows how to handle the start
and end
events to display recognized speech content. While third-party integration is not implemented, the example includes comments to guide you on connecting to external providers (refer to your provider's documentation for the exact API usage and requirements).
Known Limitations
- The Web Speech API is not supported in Firefox or Firefox for Android. For the latest browser compatibility, see the Web Speech API compatibility table.