Skip to content

Model Integration Guide

This guide introduces the capabilities of large language models (LLM), selection suggestions, interface usage methods, and common parameter configurations, helping you quickly complete experience and integration on the platform. All SDK/HTTP examples use https://api.chipltech.com/openai/v1 as the unified gateway prefix to ensure that the request path is consistent with the platform interface.

Model Capabilities

Large language models are trained based on deep learning and natural language processing technologies, capable of understanding, generating, and processing human language. They mainly possess:

  • Text Generation: Generate coherent content based on context, and adjust writing style according to prompts.
  • Language Understanding: Accurately understand input semantics, supporting multi-turn dialogue and context association.
  • Text Translation: Possess cross-language understanding and generation capabilities.
  • Knowledge Q&A: Answer cultural, scientific, historical, and other questions based on large-scale corpus.
  • Code Understanding/Generation: Process code in Python, Java, C++, etc., supporting error localization or providing modification suggestions.
  • Text Classification and Summarization: Perform classification, information extraction, and automatic summarization on complex paragraphs.

Model Selection

You can view the list of models supported by the platform on the Model Square page to understand information such as introductions and prices. Click on any model to open the details page to view model details.

Interface Call

DLC Cloud is compatible with the OpenAI API standard. You can directly use the following interfaces in your existing applications:

  • ChatCompletion: Supports streaming and non-streaming modes.
  • Completion: Also supports streaming and non-streaming modes.

Integration Steps:

  1. Set the base URL to https://api.chipltech.com/openai/v1 (missing /v1 will directly return 404).
  2. Get and configure the API Key (see Manage API Keys).
  3. Update the model name as needed, e.g., deepseek/deepseek-r1.
  4. The remaining calling methods are exactly the same as OpenAI.

Code Examples

Python

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.chipltech.com/openai/v1",
    api_key="<Your API Key>",  # Recommended to use environment variable OPENAI_API_KEY
)

model = "cltech/qwen2.5-vl-72b"
stream = True  # Or False
max_tokens = 512

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": "You are a professional AI documentation assistant."},
        {"role": "user", "content": "What scenarios can the TPU container instances provided by DLC Cloud be used for?"},
    ],
    stream=stream,
    max_tokens=max_tokens,
)

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

Curl

bash
export API_KEY="<Your API Key>"

curl "https://api.chipltech.com/openai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{
    "model": "cltech/qwen2.5-vl-72b",
    "messages": [
      {"role": "system", "content": "You are a professional AI documentation assistant."},
      {"role": "user", "content": "What scenarios can the TPU container instances provided by DLC Cloud be used for?"}
    ],
    "max_tokens": 512
  }'

Key Parameters

Basic Parameters

  • model: The name of the called model, which can be queried on the model details page of the Model Square.
  • max_tokens: The maximum number of Tokens returned in a single request. If the generated content exceeds this value, it will be truncated.
  • stream: Whether to enable streaming output. true means returning while generating, false means returning at once.
  • stop: Automatically end output when the generated content hits the set string.

ChatCompletion Message Roles

  • messages: Message array, containing all content interacting with the model.
  • role: Message author, can be system (set model role), user (user input), assistant (model reply, can preset examples), name (optional, used to distinguish authors of the same role).
  • content: Specific text content.

Completion Prompts

  • prompt: Prompt words used to generate completion, clearly describing the problem or task to be solved.

Control Generation Diversity

  • temperature: Sampling temperature. The larger the value, the more diverse the output.
  • top_p: Nucleus sampling probability, controlling the cumulative probability of candidate words.
  • top_k: Upper limit of the number of candidate words.
  • It is recommended to set only one of temperature or top_p.

Control Content Repetition

  • presence_penalty: Presence penalty, reducing the probability of recurrence of already appeared Tokens.
  • frequency_penalty: Frequency penalty, limiting high-frequency Tokens.
  • repetition_penalty: Repetition penalty, further suppressing repetition.

FAQ

  • How to get an API Key? Please refer to Manage API Keys.
  • Can the Community Edition be used for a long time? It can be experienced for a short time, but large-scale calls are recommended to switch to the Standard Edition to guarantee resource quotas.
  • What to do if the call fails? Check if the Token limit is exceeded, if the API Key is correct, if base_url contains /v1, and if the network can access https://api.chipltech.com.

If you need more examples (JavaScript, Java, etc.) or private deployment support, please contact the technical support team.