Microsoft, in collaboration with the University of Arizona, tested leading artificial intelligence models in the new simulation environment Magentic Marketplace to explore their behavior in competitive and cooperative conditions.
This is reported by Finway
Features of the Experimental Platform Magentic Marketplace
Researchers created Magentic Marketplace as an open platform for testing interactions among various AI agents. In this digital marketplace, hundreds of agents performed tasks of varying complexity: client models ordered services, such as food, while corporate agents competed for profitable deals. The open-source nature of the environment allows other teams to replicate experiments and improve results.
Vulnerabilities and Limitations of Modern AI Agents
During testing, a number of significant flaws were identified in the performance of leading language models, including GPT-4o, GPT-5, and Gemini 2.5 Flash. Specifically, it was found that AI agents are susceptible to manipulation—they can be influenced to favor certain vendors, which raises questions about their independence. Additionally, as the number of decision-making options increased, the effectiveness of the agents sharply declined due to cognitive overload.
Researchers also noted issues in the collaboration among agents. Without clear and detailed instructions, the models struggled to allocate roles, leading to decreased productivity. Even with step-by-step guidelines, the level of autonomous collaboration remained insufficient.
“The key question is whether autonomous systems can effectively interact and negotiate without human oversight,” emphasized Edje Kamar, head of the AI Frontiers Lab at Microsoft Research.
Based on the findings, scientists concluded that current generative AI models are not yet ready for fully autonomous operation in complex environments. While technologies are rapidly advancing, the path to creating true agent systems remains far from complete.