New to Kendo UI for Angular? Start a free 30-day trial

SpeechToTextButton Integration

The Kendo UI for Angular SpeechToTextButton provides flexible integration options for speech recognition. By default, it uses the browser's native Web Speech API for speech-to-text functionality, providing a seamless out-of-the-box experience.

Default Integration (Web Speech API)

The SpeechToTextButton works immediately without any additional configuration. Simply add the component to your template and handle the result event:

html

<button kendoSpeechToTextButton 
        (result)="handleResult($event)">
    Click to speak
</button>

typescript

export class AppComponent {    
    public handleResult(event: SpeechToTextResultEvent): void {
        if (event.alternatives && event.alternatives.length > 0) {
            this.textAreaValue += event.alternatives[0].transcript + ' ';
        }
    }
}

The Web Speech API provides:

Built-in speech recognition without external dependencies
Support for multiple languages
Real-time processing
Interim results capability
Multiple recognition alternatives

For details on enabling features such as continuous recognition, interim results, and changing the recognition language, refer to the Speech Configurations documentation.

The following example demonstrates the default integration with the Web Speech API.

Change Theme

Theme

Loading ...

Third-Party Integration

For scenarios requiring custom speech recognition providers, you can integrate the component with third-party services by setting the integrationMode property to none.

This approach is ideal when you want to:

Use a cloud-based speech recognition service (such as Azure, Google, or AWS).
Integrate with an on-premises or proprietary speech-to-text engine.
Implement custom business logic for processing audio input.

How Third-Party Integration Works

When integrationMode is set to none, the SpeechToTextButton disables its built-in speech recognition. Instead, you can handle the button's events to implement your own speech recognition logic. Typically, you will:

Capture audio from the user (using the browser's APIs or a custom solution).
Send the audio data to your chosen speech-to-text provider.
Process the provider's response and update your UI accordingly.

When the SpeechToTextButton is used for the first time, the browser will prompt the user for permission to access the microphone. This is required for capturing audio input. Users must grant permission for speech recognition to function.

Integration Flow

When integrationMode="none", the SpeechToTextButton does not emit the result event—only the start and end events are triggered. This means you must handle audio capture and speech recognition entirely within your own logic. Typically, you will:

Use the start event to begin capturing audio from the user.
Use the end event to stop audio capture and send the recorded audio to your chosen speech-to-text provider.
Process the provider's response and update your UI with the recognized text.
The result and error events are not emitted in this mode, as the component does not perform any recognition itself.

All result handling and UI updates should be implemented in your custom event handlers, as the component will not emit recognition results when using third-party or custom integrations.

Implementation Example

Below is a template showing where to implement your custom speech recognition logic. Refer to your provider's documentation for the exact API usage and requirements.

typescript

@Component({
  // ...existing metadata...
  template: `
    <button kendoSpeechToTextButton
      integrationMode="none"
      (start)="onStart()"
      (end)="onEnd()"
    >{{ isListening ? 'Listening...' : 'Click to speak' }}</button>
    <kendo-textarea [readonly]="true" [value]="textAreaValue"></kendo-textarea>
  `
})
export class AppComponent {
  public textAreaValue = '';
  public isListening = false;
  // Example: Add your API key and region here for your provider
  private azureSubscriptionKey = '<YOUR_AZURE_SPEECH_KEY>'; // Get this from Azure Portal > Your Speech resource > Keys and Endpoint
  private azureRegion = '<REGION_IDENTIFIER>'; // Example: 'westeurope', 'eastus'
  // Add any other required properties for your integration here

  public onStart(): void {
    this.isListening = true;
    // Start recording audio here
    // If your provider supports live/streaming results, you can send audio chunks and update textAreaValue as results arrive.
    // For batch providers, just record audio and wait for final result in onEnd.
  }

  public onEnd(): void {
    this.isListening = false;
    // Stop recording and send audio to your speech-to-text provider here
    // Example:
    // 1. Prepare audio data (e.g., as a Blob)
    // 2. Send to your provider's API endpoint
    // 3. Handle the response and update textAreaValue with the recognized text
    //
    // See provider documentation for details:
    // - Azure: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-speech-to-text-short
    // - Google: https://cloud.google.com/speech-to-text/docs/reference/rest
    // - AWS: https://docs.aws.amazon.com/transcribe/latest/dg/API_Reference.html
    //
    // Example (simulated result):
    this.textAreaValue = 'Hello from external speech recognition!';
  }
}

When integrating with third-party or custom speech-to-text providers (integrationMode="none"), you are responsible for:

Capturing and recording audio from the user.

Converting the audio to the format required by your provider (e.g., WAV/PCM).

Handling authentication (such as API keys).

Sending the audio to the provider's API and processing the response.

Refer to your provider's documentation for details on supported formats and API usage.

The following demo shows how to handle the start and end events to display recognized speech content. While third-party integration is not implemented, the example includes comments to guide you on connecting to external providers (refer to your provider's documentation for the exact API usage and requirements).

Change Theme

Theme

Loading ...

Known Limitations

The Web Speech API is not supported in Firefox or Firefox for Android. For the latest browser compatibility, see the Web Speech API compatibility table.