链载Ai

标题: 资源 | LLM 训练、推理热门框架资源 [打印本页]

作者: 链载Ai    时间: 2 小时前
标题: 资源 | LLM 训练、推理热门框架资源


推理框架

DateTitlePaperCodeRecom
2023.03[FlexGen] High-Throughput Generative Inference of Large Language Models with a Single GPU(@Stanford University etc)https://arxiv.org/pdf/2303.06865.pdfhttps://github.com/FMInference/FlexGen⭐️
2023.05[SpecInfer] Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification(@Peking University etc)https://arxiv.org/pdf/2305.09781.pdfhttps://github.com/flexflow/FlexFlow/tree/inference⭐️
2023.05[FastServe] Fast Distributed Inference Serving for Large Language Models(@Peking University etc)https://arxiv.org/pdf/2305.05920.pdf⚠️⭐️
2023.09?[vLLM] Efficient Memory Management for Large Language Model Serving with PagedAttention(@UC Berkeley etc)https://arxiv.org/pdf/2309.06180.pdfhttps://github.com/vllm-project/vllm⭐️⭐️
2023.09[StreamingLLM] EFFICIENT STREAMING LANGUAGE MODELS WITH ATTENTION SINKS(@Meta AI etc)https://arxiv.org/pdf/2309.17453.pdfhttps://github.com/mit-han-lab/streaming-llm⭐️
2023.09[Medusa] Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads(@Tianle Cai etc)https://sites.google.com/view/medusa-llmhttps://github.com/FasterDecoding/Medusa⭐️
2023.10?[TensorRT-LLM] NVIDIA TensorRT LLM(@NVIDIA)https://nvidia.github.io/TensorRT-LLM/https://github.com/NVIDIA/TensorRT-LLM⭐️⭐️
2023.11?[DeepSpeed-FastGen 2x vLLM?] DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference(@Microsoft)https://arxiv.org/pdf/2401.08671.pdfhttps://github.com/microsoft/DeepSpeed⭐️⭐️
2023.12?[PETALS] Distributed Inference and Fine-tuning of Large Language Models Over The Internet(@HSE Univesity etc)https://arxiv.org/pdf/2312.08361.pdfhttps://github.com/bigscience-workshop/petals⭐️⭐️
2023.12[PowerInfer] PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU(@SJTU)https://ipads.se.sjtu.edu.cn/_media/publications/powerinfer-20231219.pdfhttps://github.com/SJTU-IPADS/PowerInfer⭐️

训练框架

DateTitlePaperCodeRecom
2020.05?[Megatron-LM] Training Multi-Billion Parameter Language Models Using Model Parallelism(@NVIDIA)https://arxiv.org/pdf/1909.08053.pdfhttps://github.com/NVIDIA/Megatron-LM⭐️⭐️
2023.10[LightSeq] LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers(@UC Berkeley etc)https://arxiv.org/pdf/2310.03294.pdfhttps://github.com/RulinShao/LightSeq⭐️
2023.10?[TensorRT-LLM] NVIDIA TensorRT LLM(@NVIDIA)https://nvidia.github.io/TensorRT-LLM/https://github.com/NVIDIA/TensorRT-LLM⭐️⭐️
2023.12?[PETALS] Distributed Inference and Fine-tuning of Large Language Models Over The Internet(@HSE Univesity etc)https://arxiv.org/pdf/2312.08361.pdfhttps://github.com/bigscience-workshop/petals⭐️⭐️
2024.01[inferflow]INFERFLOW: AN EFFICIENT AND HIGHLY CONFIGURABLE INFERENCE ENGINE FOR LARGE LANGUAGE MODELS(@Tencent AI Lab)https://arxiv.org/pdf/2401.08294.pdfhttps://github.com/inferflow/inferflow⭐️


亮点/可实施的项目









欢迎光临 链载Ai (https://www.lianzai.com/) Powered by Discuz! X3.5