Publications
QLM: Queue Management for SLO-oriented Large Language Model Serving.
SoCC 2024.
- Adopted by vLLM (release notes) and ByteDance AIBrix (blog)
- Media Coverage: IBM, ByteDance, Hugging Face
QLM: Queue Management for SLO-oriented Large Language Model Serving.
SoCC 2024.