Today, artificial intelligence can describe images, recognize objects, and explain complex relationships. The pace of development is remarkable: So-called vision-language models (VLMs) combine text and image understanding in impressive ways. Yet, of all things, they struggle with a seemingly simple task—counting. Researchers at the Institute for Information Systems (iisys) at Hof University of Applied Sciences are now working to address this issue, with a paper posted to the arXiv preprint server.Today, artificial intelligence can describe images, recognize objects, and explain complex relationships. The pace of development is remarkable: So-called vision-language models (VLMs) combine text and image understanding in impressive ways. Yet, of all things, they struggle with a seemingly simple task—counting. Researchers at the Institute for Information Systems (iisys) at Hof University of Applied Sciences are now working to address this issue, with a paper posted to the arXiv preprint server.[#item_full_content]