今天，通义千问Qwen团队正式开源推出Qwen3，这是 Qwen 系列大型语言模型的最新成员。最新的Qwen3系列模型具备双模推理能力（深入思考/快速响应）、支持119种语言及方言，并强化了Agent功能与代码执行能力，全面满足复杂问题处理与全球化应用需求。

其中，旗舰模型Qwen3-235B-A22B在代码、数学、通用能力等基准测试中，与 DeepSeek-R1、o1、o3-mini、Grok-3 和 Gemini-2.5-Pro 等顶级模型相比，表现出极具竞争力的结果。此外，小型 MoE 模型 Qwen3-30B-A3B 的激活参数数量是 QwQ-32B 的 10%，表现更胜一筹，甚至像 Qwen3-4B 这样的小模型也能匹敌 Qwen2.5-72B-Instruct 的性能。

本次Qwen3开源了两个 MoE 模型的权重：Qwen3-235B-A22B，一个拥有 2350 多亿总参数和 220 多亿激活参数的大模型，以及Qwen3-30B-A3B，一个拥有约 300 亿总参数和 30 亿激活参数的小型 MoE 模型。此外，六个 Dense 模型也已开源，包括 Qwen3-32B、Qwen3-14B、Qwen3-8B、Qwen3-4B、Qwen3-1.7B 和 Qwen3-0.6B，均在 Apache 2.0 许可下开源。

模型亮点小编敲黑板：

Qwen3 模型支持两种思考模式：

思考模式：在这种模式下，模型会逐步推理，经过深思熟虑后给出最终答案，适合需要深入思考的复杂问题。

非思考模式：在此模式中，模型提供快速、近乎即时响应，适用于对速度要求高于深度的简单问题。

多语言

Qwen3 模型支持 119 种语言和方言，其中包括印欧语系、汉藏语系、亚非语系、南岛语系、德拉威语、突厥语系、壮侗语系、乌拉尔语系、南亚语系等等。这一广泛的多语言能力为国际应用开辟了新的可能性，让全球用户都能受益于这些模型的强大功能。

增强的 Agent 能力

优化了 Qwen3 模型的 Agent 和代码能力，同时也加强了对 MCP 的支持（后文附使用Qwen3系列模型与MCP结合的实战教程）

Transformers

frommodelscopeimportAutoModelForCausalLM,AutoTokenizermodel_name="Qwen/Qwen3-30B-A3B"#loadthetokenizerandthemodeltokenizer=AutoTokenizer.from_pretrained(model_name)model=AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto")#preparethemodelinputprompt="Givemeashortintroductiontolargelanguagemodel."messages=[{"role":"user","content":prompt}]text=tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,enable_thinking=True#Switchbetweenthinkingandnon-thinkingmodes.DefaultisTrue.)model_inputs=tokenizer([text],return_tensors="pt").to(model.device)#conducttextcompletiongenerated_ids=model.generate(**model_inputs,max_new_tokens=32768)output_ids=generated_ids[0][len(model_inputs.input_ids[0]):].tolist()#parsingthinkingcontenttry:#rindexfinding151668(</think>)index=len(output_ids)-output_ids[::-1].index(151668)exceptValueError:index=0thinking_content=tokenizer.decode(output_ids[:index],skip_special_tokens=True).strip("\n")content=tokenizer.decode(output_ids[index:],skip_special_tokens=True).strip("\n")print("thinkingcontent:",thinking_content)print("content:",content)

多工具部署

开发者朋友们可以使用 sglang>=0.4.6.post1 或 vllm>=0.8.4 来创建一个与 OpenAI API 兼容的 API endpoint：

要禁用思考模式，可以移除参数 --reasoning-parser（以及 --enable-reasoning）

如果用于本地开发，可以通过运行简单的命令 Ollama run qwen3:30b-a3b 来使用 ollama 与模型进行交互。您也可以使用 LMStudio 或者 llama.cpp 以及 ktransformers 等代码库进行本地开发

Ollama默认是thinking模式，如果需要切换到非thinking模式，在prompt后拼接上/no_think 即可。此外Ollama请确保升级到新版本（v0.6.6或以上）。

魔搭平台的API-Inference，也第一时间为Qwen3系列模型提供了支持。魔搭的用户可通过API调用的方式，直接使用。具体API-Inference的使用方式可参见各个模型页面（例如 https://www.modelscope.cn/models/Qwen/Qwen3-32B）说明：

或者参见API-Inference文档：https://www.modelscope.cn/docs/model-service/API-Inference/intro。值得特别说明的是，Qwen3系列模型可自由切换思考与普通模式，在API接口上，通过extra_body的参数来控制。默认enable_thinking配置打开，可按需关闭。在开启思考模式的时候，还可以通过thinking_budget参数，来限制思考的长度（一般推荐thinking_budget不要配置过小，以4096以上为宜）。

fromopenaiimportOpenAIclient=OpenAI(base_url='https://api-inference.modelscope.cn/v1/',api_key='MODELSCOPE_SDK_TOKEN',#ModelScopeToken)#setextra_bodyforthinkingcontrolextra_body={#enablethinking,settoFalsetodisable"enable_thinking":True,#usethinking_budgettocontorlnumoftokensusedforthinking#"thinking_budget":4096}response=client.chat.completions.create(model='Qwen/Qwen3-32B',#ModelScopeModel-Idmessages=[{'role':'user','content':'9.9和9.11谁大'}],stream=True,extra_body=extra_body)done_thinking=Falseforchunkinresponse:thinking_chunk=chunk.choices[0].delta.reasoning_contentanswer_chunk=chunk.choices[0].delta.contentifthinking_chunk!='':print(thinking_chunk,end='',flush=True)elifanswer_chunk!='':ifnotdone_thinking:print('\n\n===FinalAnswer===\n')done_thinking=Trueprint(answer_chunk,end='',flush=True)

嗯，我现在要比较9.9和9.11哪个更大。首先，我应该先理解这两个数字的结构。9.9是一个小数，而9.11看起来也是一个小数，不过可能有些人可能会误解为日期或者其他形式？不过题目里明确说是比较数值大小，所以应该按照数学上的小数来处理。首先，我需要确认这两个数的小数点位置。9.9是九又十分之九，也就是9+0.9=9.9。而9.11则是九又百分之一十一，也就是9+0.11=9.11。这时候，我需要比较这两个小数的大小。通常比较小数的方法是从左到右逐位比较，直到找到不同的数字为止。首先比较整数部分，两个数都是9，所以这部分相等。接下来比较小数部分。第一个数是0.9，第二个数是0.11。这时候需要注意小数位数的问题。0.9可以看作0.90，因为小数末尾的零不改变数值的大小。这样两个小数就变成了0.90和0.11。现在比较0.90和0.11的话，第一位小数是9和1，显然9比1大，所以0.90大于0.11，因此整个数9.9（即9.90）比9.11大。====================FinalAnswer====================9.9和9.11的比较可以通过以下步骤进行：1.**整数部分**：两者均为9，相等。2.**小数部分**：-9.9可写为**9.90**（补零不改变数值）。-比较十分位：9.90的十分位是**9**，9.11的十分位是**1**。-由于$9>1$，因此**9.90>9.11**。**结论**：$$\boxed{9.9>9.11}$$

Qwen3系列模型提供了对于智能体和工具调用更好的能力支持，能够在在思考和非思考模式下精确集成外部工具。我们也第一时间将多个Qwen3模型集成到了ModelScope MCP实验场，提供Qwen3在MCP场景上的工具调用能力的体验。

我们介绍使用ms-swift对Qwen/Qwen3-8B进行SFT/GRPO以及使用Megatron-SWIFT对Qwen/Qwen3-30B-A3B进行SFT。ms-swift是魔搭社区官方提供的大模型与多模态大模型训练部署框架。

SFT

对Qwen3-8B进行训练的脚本如下，在ModelScope提供的免费GPU Notebook中即可运行：

#训练显存：22GB#你可以指定`--datasetAI-ModelScope/alpaca-gpt4-data-zh`来跑通实验CUDA_VISIBLE_DEVICES=0\swiftsft\--modelQwen/Qwen3-8B\--train_typelora\--dataset'<dataset-path>'\--torch_dtypebfloat16\--num_train_epochs1\--per_device_train_batch_size1\--per_device_eval_batch_size1\--learning_rate1e-4\--lora_rank8\--lora_alpha32\--target_modulesall-linear\--gradient_accumulation_steps4\--eval_steps50\--save_steps50\--save_total_limit2\--logging_steps5\--max_length2048\--output_diroutput\--warmup_ratio0.05\--dataloader_num_workers4\--packingtrue\--user_liger_kerneltrue

自定义数据集格式如下（system字段可选），指定`--dataset <dataset_path>`即可：

GRPO

使用AI-MO/NuminaMath-TIR作为数据集，并使用accuracy函数计算模型回答的准确率奖励, 计算奖励需要安装以下环境

自定义数据集格式与SFT类似，其中assistant部分不必需。如果使用accuracy奖励，则需要solution列来计算准确率。

#llm{"messages":[{"role":"system","content":"Youareausefulandharmlessassistant"},{"role":"user","content":"Tellmetomorrow'sweather"}]}{"messages":[{"role":"system","content":"Youareausefulandharmlessmathcalculator"},{"role":"user","content":"Whatis1+1?"},{"role":"assistant","content":"Itequals2"},{"role":"user","content":"Whataboutadding1?"}]}{"messages":[{"role":"user","content":"Whatisyourname?"}]}#mllm{"messages":[{"role":"user","content":"<image>Whatisthedifferencebetweenthetwoimages?"}],"images":["/xxx/x.jpg"]}{"messages":[{"role":"user","content":"<image><image>Whatisthedifferencebetweenthetwoimages?"}],"images":["/xxx/y.jpg","/xxx/z.png"]}

也可以使用自定义的奖励函数/奖励模型进行训练，数据集中的列会传到奖励函数的**kwargs中，自定义奖励函数的例子参考swift/examples/train/grpo/plugin/plugin.py

在训练过程中，我们使用vLLM来加速采样过程。设置num_infer_workers=8，我们为每个device都部署一个vLLM engine来加速采样过程。

#70G*8CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7\NPROC_PER_NODE=8\swiftrlhf\--rlhf_typegrpo\--modelQwen/Qwen3-8B\--train_typefull\--datasetAI-MO/NuminaMath-TIR\--torch_dtypebfloat16\--num_train_epochs1\--per_device_train_batch_size2\--per_device_eval_batch_size2\--learning_rate1e-6\--save_total_limit2\--logging_steps5\--output_diroutput\--gradient_accumulation_steps1\--warmup_ratio0.05\--dataloader_num_workers4\--max_completion_length4096\--vllm_max_model_len8192\--reward_funcsaccuracy\--num_generations16\--use_vllmtrue\--vllm_gpu_memory_utilization0.4\--sleep_level1\--offload_modeltrue\--offload_optimizertrue\--gc_collect_after_offloadtrue\--deepspeedzero3\--num_infer_workers8\--tensor_parallel_size1\--temperature1.0\--top_p0.85\--report_towandb\--log_completionstrue\--overlong_filtertrue

MoE训练（Megatron-SWIFT）

ms-swift引入了Megatron的并行技术来加速大模型的训练，包括数据并行、张量并行、流水线并行、序列并行，上下文并行，专家并行。支持Qwen3、Qwen3-MoE、Qwen2.5、Llama3、Deepseek-R1蒸馏系等模型的预训练和微调。

对于环境准备（镜像）和HF与MCore模型权重的转换，可以参考Megatron-SWIFT训练文档，这里不详细展开：https://swift.readthedocs.io/zh-cn/latest/Instruction/Megatron-SWIFT%E8%AE%AD%E7%BB%83.html

#https://help.aliyun.com/zh/pai/user-guide/general-environment-variables#请确保两个节点的保存权重路径相同NNODES=$WORLD_SIZE\NODE_RANK=$RANK\megatronsft\--loadQwen3-30B-A3B-Base-mcore\--dataset'liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT'\--tensor_model_parallel_size2\--expert_model_parallel_size8\--moe_grouped_gemmtrue\--moe_shared_expert_overlaptrue\--moe_aux_loss_coeff0.01\--micro_batch_size1\--global_batch_size16\--packingtrue\--recompute_granularityfull\--recompute_methoduniform\--recompute_num_layers1\--train_iters2000\--eval_iters50\--finetunetrue\--cross_entropy_loss_fusiontrue\--lr1e-5\--lr_warmup_iters100\--min_lr1e-6\--savemegatron_output/Qwen3-30B-A3B-Base\--eval_interval200\--save_interval200\--max_length8192\--num_workers8\--dataset_num_proc8\--no_save_optimtrue\--no_save_rngtrue\--sequence_paralleltrue\--use_flash_attntrue

自定义数据集格式与`swift sft`相同，可以在本文上方找到，指定`--dataset <dataset_path>`即可。

使用`megatron sft`和`swift sft`进行Qwen3-30B-A3B模型全参数训练速度/显存占用对比如下：

	Megatron-LM	DeepSpeed-ZeRO2	DeepSpeed-ZeRO3
训练速度	9.6s/it	-	91.2s/it
显存占用	16*60GiB	OOM	16*80GiB

链载Ai

Qwen3 模型支持两种思考模式：

Transformers

多工具部署

SFT

GRPO

MoE训练（Megatron-SWIFT）