Model Self-Deployment

Model Self-Deployment allows you to deploy mainstream large models provided by the platform as dedicated instances to a specified region, giving you an exclusive inference service. Compared to shared APIs, self-deployed instances offer predictable concurrency and more stable response latency, making them ideal for production scenarios that require guaranteed throughput, low latency, or data isolation.

Features

Elastic replica configuration: Set the maximum number of replicas. The platform automatically scales on demand — each additional replica increases the maximum concurrency accordingly.
Flexible resource specs: Configure CPU Cores, Memory, and TPU Count on demand to match the inference resource requirements of different models.
Multi-region deployment: Choose from different deployment regions to meet data locality or proximity requirements.
OpenAI-compatible API: Self-deployed instances provide an OpenAI-compatible API endpoint (/v1/chat/completions), enabling drop-in replacement for existing integrations with no code changes.
One-click API credentials: After successful deployment, view Base URL, Model ID, and API Key directly from the management page, with ready-to-use example code in CURL, Python, and JavaScript.

Pricing

Model Self-Deployment is billed based on the actual resource specs configured. The cost is displayed in real time when creating a deployment (Configuration Cost ¥XX /Hour). Billing starts after the instance is successfully created and stops when the instance is released. Refer to the compute marketplace for the latest pricing.

Using Model Self-Deployment

Create a Deployment

Navigate to the Model Deployment page in the console. The page displays all currently available models.

Filtering and Search

Use the Category tags at the top to filter by model type. Multiple selections are supported. Click the ✕ on a selected tag to deselect it, or click the clear icon to reset all filters.
Use the Provider tags to filter by model provider.
Enter a model name in the search box for exact lookup.

Deploy a Model

Click Deploy Model on a model card to open the deployment configuration dialog.

Fill in the following fields:

Field	Description
Name	Project name, up to 32 characters, required
Model Name	Auto-filled, read-only
Select Region	Choose a deployment region, required
Max Replicas	Maximum number of replicas, determines maximum concurrency. See the range hint in parentheses
Concurrency Limit	Maximum concurrent requests per replica, determined by model specs, read-only
CPU Cores	Number of CPU cores. See the range hint in parentheses
Memory (GB)	Memory size. See the range hint in parentheses
TPU Count	Number of TPUs. See the range hint in parentheses

The Configuration Cost is shown in real time at the bottom of the dialog. After confirming, click Deploy Now to create the deployment.

If the message "Insufficient resources. Please adjust the configuration and region, or wait for resources to be released" appears, try switching to a different region or reducing the replica/resource configuration.

Manage Deployments

Navigate to the Self Deploy Control page in the console. All deployment records are displayed as a card list.

Instance Status

Status	Meaning
Creating	Instance is being created
Deploying	Deployment is in progress
Deploy Success	Deployment succeeded, ready for API calls
Deploy Failed	Deployment failed
Deleting	Instance is being released

Card Information

Each card displays: Project Name, Create Time, Project ID, model name and ID, Current Workers, Concurrency Limit, Region, and hourly Config Cost.

Get API Credentials

Once the instance status is Deploy Success, click the Get API button on the right side of the card. A side drawer opens with the following information:

Field	Description
Base URL	The base address for API requests
API Endpoints	Fixed at `/v1/chat/completions`
Model ID	The model identifier to pass when making API calls
API Key	Authentication key

All fields have a copy icon for quick copying. At the bottom of the drawer, CURL, Python, and JavaScript example code is provided, ready to copy and use.

Release an Instance

If a deployment is no longer needed, click the Release link on the right side of the card, then click Confirm in the confirmation dialog.

Note: Instance content cannot be recovered after release. Make sure you have saved any required configuration before proceeding.

Product Introduction

Product Overview

Model Self-Deployment

Features

Pricing

Using Model Self-Deployment

Create a Deployment

Manage Deployments

Get API Credentials

Release an Instance

Model Self-Deployment ​

Features ​

Pricing ​

Using Model Self-Deployment ​

Create a Deployment ​

Manage Deployments ​

Get API Credentials ​

Release an Instance ​

Model Self-Deployment

Features

Pricing

Using Model Self-Deployment

Create a Deployment

Manage Deployments

Get API Credentials

Release an Instance