Run a vLLM Server on HF Jobs in One Command

查看原文

推荐理由

官方发布带来Hugging Face 模型更新信号，适合跟踪能力变化、生态影响和后续落地。

Back to Articles

- Prerequisites

- Launch the server

- Query it from anywhere

- Clean up

- Going further: bigger models

- Going further: Chat with it in a UI

- Going further: SSH into the running server

- Going further: Use it as a coding-agent backend with Pi

- HF Jobs or Inference Endpoints?

- Further reading

You can spin up a private, OpenAI-compatible LLM endpoint on Hugging Face infrastructure with a single command — no servers to provision, no Kubernetes, pay-per-second. Once it's up, you can query it from your laptop, a notebook, or anywhere else.

It's the quickest way to stand up a model for tests, evals, or batch generation. (If you're after a managed, production-ready service instead, that's what Inference Endpoints are for — more on when to pick which at the end.)

Here's the whole thing end to end.

Prerequisites

- A payment method or a positive prepaid credit balance (Jobs is billed per‑minute by hardware usage).

- huggingface_hub >= 1.20.0

: pip install -U "huggingface_hub>=1.20.0"

- Logged in locally: hf auth login

Launch the server

hf jobs run

is docker run

for HF infrastructure. We use the official vllm/vllm-openai

主题标签官方公告Hugging Face模型发布

原始关键词#command#server#jobs#vllm#one#run

查看原文huggingface.co

单一官方来源，暂无交叉验证