Qwen3.7-Plus AI Model
这条记录涉及编程工具或代码能力更新,适合开发者评估工作流变化和可复用价值。
Qwen3.7-Plus is a multimodal agent model from the Qwen team at Alibaba. It was introduced on June 1, 2026 as part of the Qwen3.7 line. The AI model is designed to combine vision and language in one system, with a strong focus on agent-style workflows such as coding, tool use, browser interaction, and productivity tasks.
Unlike a text-only chatbot, Qwen3.7-Plus AI Model is built to handle images and video as inputs as well as text. It can read screens, understand GUI layouts, operate applications, generate code from visual references, and support workflows that move between browser, desktop, and command-line environments. It is described as a “multimodal interactive hybrid agent.”
Main features
- Text, image, and video understanding
- Text output
- 1,000,000-token context window (1 Million)
- Up to 256,000 thinking tokens for complex reasoning.
- Up to 65,536 output tokens
- Screen reading and GUI understanding
- Browser automation and browser-agent behavior
- Mobile app navigation
- Visual question answering
- Multimodal search and knowledge QA
- Multimodal reasoning
- Vision-to-code generation
- Frontend and web prototyping
- Software engineering and coding assistance
- Tool use and agentic workflow support
- Cross-framework generalization
- Real-world scene understanding
- Autonomous driving scene reasoning
- Productivity assistant use cases
Other Information
Qwen3.7-Plus AI Model is built for tasks where visual input matters. It performs well on screen analysis, document parsing, chart understanding, OCR, counting, spatial reasoning, and UI interaction. It is also aimed at coding tasks, including turning screenshots or design references into executable code.