On-Premises Deployment
The Telerik ASP.NET MVC AI Coding Assistant supports on-premises deployment through a Helm chart that installs the Context API stack into your Kubernetes cluster. The deployment includes the Context API service, a bundled reranking service, and Qdrant for vector storage.
The on-premises package is simplified for customer-hosted environments. Hosted-only concerns such as authentication, rate limiting, and telemetry are removed from this deployment model. If needed, those controls should be handled at the ingress, API gateway, or cluster level.
Prerequisites
Before installing the chart, make sure the cluster provides:
- Persistent storage for Qdrant
- Access to an OpenAI-compatible inference endpoint that exposes the models required by Context API for request handling and embeddings
- A public or internal DNS name for the Context API service
Deployment Steps
- Install the Helm chart in the target cluster.
- Configure the Context API deployment to use the in-cluster Qdrant service.
- Configure the inference endpoint and the model names used by Context API.
- Expose the Context API service on a stable URL.
- Configure the MCP server to use that URL through the
CONTEXTAPI_URLenvironment variable.
Required Context API Configuration
Set the following environment variables for the Context API deployment:
| Variable | Description |
|---|---|
VECTOR_DB_TYPE | Set to qdrant |
QDRANT_HOST | The Qdrant service name inside the cluster |
QDRANT_GRPC_PORT | The Qdrant gRPC port, typically 6334 |
QDRANT_API_KEY | API key for Qdrant, if required |
LLM_BASE_URL | The OpenAI-compatible inference endpoint |
LLM_API_KEY | API key for the inference endpoint, if required |
LLM_DEFAULT_MODEL | The model name used for request handling |
LLM_EMBEDDINGS_MODEL | The model name used for embeddings |
EMBEDDINGS_DIMENSIONS | The dimension size of the embeddings model |
LLM_REQUEST_TIMEOUT | Request timeout in seconds |
Qdrant Configuration
No manual setup is required inside Qdrant for a standard deployment. A fresh Qdrant instance works out of the box as long as it is reachable from Context API and has persistent storage configured.
Context API creates the required collections automatically when they do not already exist.
Reranking
Reranking is included in the deployment and does not require a customer-provided model endpoint.
The standard deployment runs reranking through the bundled cross-encoder service deployed together with Context API. In a standard Helm deployment, reranking works out of the box. A custom CROSS_ENCODER_URL is only needed if the reranking service is renamed or deployed separately.
MCP Server Configuration
After the Context API service is exposed, configure the MCP server to use that address by setting CONTEXTAPI_URL:
{
"env": {
"CONTEXTAPI_URL": "contextapi.customer.example.com:443"
}
}
CONTEXTAPI_URLmust be provided ashost:port.
Result
After deployment, the MCP server sends requests to the customer-hosted Context API instance. Context API uses the configured inference endpoint, bundled reranking service, and Qdrant instance to serve retrieval operations inside the customer environment.