Integrating Third-Party Speech-to-Text Providers with SpeechToTextButton
The Kendo UI for Angular SpeechToTextButton supports flexible integration with speech recognition engines. By default, it uses the browser's native Web Speech API for speech-to-text functionality. However, you can easily connect the component to a custom or third-party speech recognition provider by setting the integrationMode
property to none
.
This approach is ideal when you want to:
- Use a cloud-based speech recognition service (such as Azure, Google, or AWS).
- Integrate with an on-premises or proprietary speech-to-text engine.
- Implement custom business logic for processing audio input.
How It Works
When integrationMode
is set to none
, the SpeechToTextButton disables its built-in speech recognition. Instead, you can handle the button's events to implement your own speech recognition logic. Typically, you will:
- Capture audio from the user (using the browser's APIs or a custom solution).
- Send the audio data to your chosen speech-to-text provider.
- Process the provider's response and update your UI accordingly.
When the SpeechToTextButton is used for the first time, the browser will prompt the user for permission to access the microphone. This is required for capturing audio input. Users must grant permission for speech recognition to function.
Integration Flow
When integrationMode="none"
, the SpeechToTextButton does not emit the result
event—only the start
and end
events are triggered. This means you must handle audio capture and speech recognition entirely within your own logic. Typically, you will:
- Use the
start
event to begin capturing audio from the user. - Use the
end
event to stop audio capture and send the recorded audio to your chosen speech-to-text provider. - Process the provider's response and update your UI with the recognized text.
- The
result
anderror
events are not emitted in this mode, as the component does not perform any recognition itself.
All result handling and UI updates should be implemented in your custom event handlers, as the component will not emit recognition results when using third-party or custom integrations.
Below is a template showing where to implement your custom speech recognition logic. Refer to your provider's documentation for the exact API usage and requirements.
@Component({
// ...existing metadata...
template: `
<button kendoSpeechToTextButton
integrationMode="none"
(start)="onStart()"
(end)="onEnd()"
>{{ isListening ? 'Listening...' : 'Click to speak' }}</button>
<kendo-textarea [readonly]="true" [value]="textAreaValue"></kendo-textarea>
`
})
export class AppComponent {
public textAreaValue = '';
public isListening = false;
// Example: Add your API key and region here for your provider
private azureSubscriptionKey = '<YOUR_AZURE_SPEECH_KEY>'; // Get this from Azure Portal > Your Speech resource > Keys and Endpoint
private azureRegion = '<REGION_IDENTIFIER>'; // Example: 'westeurope', 'eastus'
// Add any other required properties for your integration here
public onStart(): void {
this.isListening = true;
// Start recording audio here
// If your provider supports live/streaming results, you can send audio chunks and update textAreaValue as results arrive.
// For batch providers, just record audio and wait for final result in onEnd.
}
public onEnd(): void {
this.isListening = false;
// Stop recording and send audio to your speech-to-text provider here
// Example:
// 1. Prepare audio data (e.g., as a Blob)
// 2. Send to your provider's API endpoint
// 3. Handle the response and update textAreaValue with the recognized text
//
// See provider documentation for details:
// - Azure: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-speech-to-text-short
// - Google: https://cloud.google.com/speech-to-text/docs/reference/rest
// - AWS: https://docs.aws.amazon.com/transcribe/latest/dg/API_Reference.html
//
// Example (simulated result):
this.textAreaValue = 'Hello from external speech recognition!';
}
}
When integrating with third-party or custom speech-to-text providers (
integrationMode="none"
), you are responsible for:
- Capturing and recording audio from the user.
- Converting the audio to the format required by your provider (e.g., WAV/PCM).
- Handling authentication (such as API keys).
- Sending the audio to the provider's API and processing the response.
Refer to your provider's documentation for details on supported formats and API usage.
The following demo shows how to handle the start
and end
events to display recognized speech content. While third-party integration is not implemented, the example includes comments to guide you on connecting to external providers (refer to your provider's documentation for the exact API usage and requirements).
Known Limitations
- The Web Speech API is not supported in Firefox or Firefox for Android. For the latest browser compatibility, see the Web Speech API compatibility table.