GLM-4-Voice is an end-to-end spoken large language model developed by Zhipu AI (Z.ai) unlike traditional “pipeline” systems that chain together separate STT (Speech-to-Text), LLM, and TTS (Text-to-Speech) models, GLM-4-Voice processes audio natively