Tech »  Topic »  Z.ai debuts open source GLM-4.6V, a native tool-calling vision model for multimodal reasoning

Z.ai debuts open source GLM-4.6V, a native tool-calling vision model for multimodal reasoning


Chinese AI startup Zhipu AI aka Z.ai has released its GLM-4.6V series, a new generation of open-source vision-language models (VLMs) optimized for multimodal reasoning, frontend automation, and high-efficiency deployment.

The release includes two models in "large" and "small" sizes:

  1. GLM-4.6V (106B), a larger 106-billion parameter model aimed at cloud-scale inference

  2. GLM-4.6V-Flash (9B), a smaller model of only 9 billion parameters designed for low-latency, local applications

Recall that generally speaking, models with more parameters — or internal settings governing their behavior, i.e. weights and biases — are more powerful, performant, and capable of performing at a higher general level across more varied tasks.

However, smaller models can offer better efficiency for edge or real-time applications where latency and resource constraints are critical.

The defining innovation in this series is the introduction of native function calling in a vision-language model—enabling direct use of tools such as search, cropping ...


Copyright of this story solely belongs to venturebeat . To see the full text click HERE