CogVideoX-5B：最新开源！！你可以在本地使用AI文本生成视频了！（12G显存

显示全部楼层

智谱CogVideoX系列新开源CogVideoX-5b，视频生成质量更高，视觉效果更好，此前开源的版本为CogVideoX-2B。

GIF有点卡 ...

实测案例一（AI-杨百万）：

提示词：Picturethis:asleek,confidentcatloungingcasuallyinasun-drenchedroom,itsfurglisteningunderthewarmrays.Butwhatsetsthisfelineapartisnotjustitsglossycoatorthegracefulpoiseitexudes;it'sthepairofstylishsunglassesperchedonitsnose,addinganairofmysteryandcoolnesstoitsdemeanor.Thesunglasses,withtheirreflectivelenses,hidethecat'senigmaticeyes,makingitseemasifit'sponderinglife'smysteriesorperhapsjustplanningitsnextmischievousadventure.Assunlightfiltersthroughthewindow,castingpatternsonthefloor,thecat,utterlyunfazedbyitsunusualaccessory,givesoffavibeofeffortlesschic.Itsitsthere,apictureofserenityanddetachment,occasionallyflickingitstailorlettingoutasoftpurr,completelyembodyingtheessenceofcool.Thiscatdoesn'tjustwearsunglasses;itownsthelook,makinganyonewhoglancesitswaydoadouble-take,charmedbythesightofsuchanunexpectedyetstrikingfashionstatement.

实测案例二（AI-杨百万）：

提示词：Inaheartwarmingscene,adelightfulpandabearfindsitselfinthegentleembraceofahuman,engaginginwhatcanonlybedescribedasawhimsicaldance.Thepanda,withitsstrikingblackandwhitefur,looksupwithtrusting,curiouseyes,itsroundfaceframedbyfuzzyears.Thehuman,filledwithjoyandawe,carefullysupportsthepanda'ssoft,plumpbody,guidingitinaseriesofgentle,swayingmovements.Astheymovetogether,thepanda'sclumsyyetendearingattemptstomimictherhythmcreateamomentofpuremagic.Itstinypawsoccasionallyreachout,touchingthehuman'shands,asiftryingtounderstandthisnovelformofinteraction.Aroundthem,theairisfilledwithlaughterandsoftmusic,enhancingtheenchantmentoftheirdance.Thisuniqueencounter,ablendofnature'sinnocenceandhumanaffection,unfoldslikeatenderdanceoffriendship,leavinganindeliblemarkofjoyandconnectiononallwhowitnessit.

推理的硬件需求如下：

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;padding-left: 8px;color: rgb(63, 63, 63);">CogVideoX-2B 模型：

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;text-indent: -1em;display: block;margin: 0.2em 8px;">
•FP16 精度：

• 使用 diffusers：需要12.5GB显存

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;text-indent: -1em;display: block;margin: 0.2em 8px;">
•INT8 精度：

• 使用 diffusers with torchaudio：需要7.8GB显存

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;padding-left: 8px;color: rgb(63, 63, 63);">CogVideoX-5B 模型：

•BF16 精度：

• 使用 diffusers：需要20.7GB显存

•INT8 精度：

• 使用 diffusers with torchaudio：需要11.4GB显存

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: var(--articleFontsize);letter-spacing: 0.034em;"/>

体验界面如下：

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;display: table;padding-right: 0.2em;padding-left: 0.2em;background: rgb(0, 152, 116);color: rgb(255, 255, 255);">快速上手 ?

本模型已经支持使用 Huggingface 的diffusers库进行部署，你可以按照以下步骤进行部署。

安装对应的依赖

#diffusers>=0.30.1
#transformers>=0.44.0
#accelerate>=0.33.0(建议从源代码安装)
#imageio-ffmpeg>=0.5.1

pipinstall--upgradetransformersacceleratediffusersimageio-ffmpeg

运行代码 (BF16 / FP16)

importtorch
fromdiffusersimportCogVideoXPipeline
fromdiffusers.utilsimportexport_to_video

prompt=(
"Apanda,dressedinasmall,redjacketandatinyhat,sitsonawoodenstool"
"inaserenebambooforest.Thepanda'sfluffypawsstrumaminiatureacoustic"
"guitar,producingsoft,melodictunes.Nearby,afewotherpandasgather,"
"watchingcuriouslyandsomeclappinginrhythm.Sunlightfiltersthroughthetall"
"bamboo,castingagentleglowonthescene.Thepanda'sfaceisexpressive,showing"
"concentrationandjoyasitplays.Thebackgroundincludesasmall,flowingstream"
"andvibrantgreenfoliage,enhancingthepeacefulandmagicalatmosphereofthis"
"uniquemusicalperformance."
)

pipe=CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-5b",
torch_dtype=torch.bfloat16
)

pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

video=pipe(
prompt=prompt,
num_videos_per_prompt=1,
num_inference_steps=50,
num_frames=49,
guidance_scale=6,
generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

export_to_video(video,"output.mp4",fps=8)