Skip to content

Model Square Introduction

Model Square is a one-stop entrance providing model inference capabilities. Currently in the internal testing iteration phase, it only supports self-developed vLLM models and Open API interface forms. Capabilities such as other supplier models and visualization tools will be opened gradually. This page aims to help you quickly understand the current status and integration methods.

Currently Available Capabilities

  • vLLM Model Call: The platform pre-sets several vLLM versions (such as the customized Qwen/Qwen-VL series), suitable for basic scenarios such as text generation and visual understanding.
  • Open API Interface: Fully compatible with OpenAI-style HTTP/JSON interfaces, supporting direct calls from common SDKs (Python, Node.js, etc.).
  • Token-level Billing: All models uniformly adopt pay-per-Token billing, facilitating refined cost control during the development phase.

Note: Model Square will continue to introduce more third-party models such as DeepSeek, Llama, and GLM, and provide graphical debugging tools and fine-tuning hosting capabilities. New features will be updated synchronously in announcements and documents after opening.

Usage Guide

  1. Get API Key

    • Enter "API Key Management" in the console, create or copy an existing Key. All subsequent calls need to carry it in the Header.
  2. Select Model

    • After entering "Model Square", filter vLLM models through category tags (such as Deep Thinking, Text Generation, Visual Understanding) to view Token unit prices and capability descriptions.
  3. Call Interface

    • Construct requests according to the OpenAI standard protocol (e.g., POST /v1/chat/completions), fill in the model name, prompt words, and parameters to initiate inference.
    • If quick verification is needed, you can use the online debugger or official example scripts provided by the platform.
  4. Monitoring and Statistics

    • Combine "Call Statistics" to view indicators such as Token usage and failure rate, and adjust call frequency or rate limiting rules if necessary.

Typical Advantages

  • No need to build your own inference cluster, enjoy the optimized vLLM inference stack immediately;
  • Unified API form, enabling seamless switching when integrating new models subsequently;
  • Support on-demand expansion and rate limiting policies to ensure service stability.

If you need to experience more models first or apply for private testing, please contact business/technical support or continue to pay attention to official announcements.