Grok 3 Beta achieves SOTA results in finance benchmarks
The US-based company Vals AI specializes in evaluating and benchmarking large language models (LLMs) for specific industry use cases. The goal is to measure the performance of AI models using realistic and practical data. The company has now presented results for several Grok models.

The details
- According to Vals AI, Grok 3 Beta (the largest model of xAI) sets new SOTA on finance, legal, and tax benchmarks. Grok 3 Beta achieves an average accuracy of 78.1% on all benchmarks and a latency time of 15.52 seconds.
- Grok 3 Mini Fast Beta (High Reasoning) scored better than the larger Grok 3 Beta overall, with an average accuracy of 81.6% and a latency time of 23.34 seconds.
- Grok 3 Mini Fast Beta (Low Reasoning) performed worse than the larger Grok 3 Beta overall, with an average accuracy of 77.3% and a latency time of 10.00 seconds.
- xAI claimed that Grok models perform better than GPT-4o and DeepSeek V3. These results confirm this. In addition, the three tested Grok models outperformed GPT-4o (67.0%) and DeepSeek V3 (74.7%) in average accuracy.

Our thoughts
It’s impressive how quickly xAI was able to catch up with OpenAI. Just a reminder, the company has only been around for two years. We use Grok daily for research tasks, image generation, and to improve our social media posts. For us, AI is a great tool to boost our productivity.
More information: 🔗 Vals AI
Magic AI tool of the week
Today, it is essential to work in a structured and organized manner. There are many tools that you can use to boost your productivity. However, finding the right tool for your needs can be overwhelming.
One of the best tools we’ve ever used is Notion, especially with its powerful AI features. Notion combines the features of a note-taking app, document editor, project management tool, and AI assistance.
AI will help you finish your tasks faster and more efficiently. We promise this tool boosts your productivity from day one.
Hand-picked articles of the week
- Run GenAI Models locally with Docker Model Runner
- Build a Local AI Agent to Chat with Financial Charts Using Agno
- Build a financial multi-agent system with AG2 (formerly AutoGen) and Ollama
😀 Do you enjoy our content? If so, why not support us with a small financial contribution? This helps us fund our work to ensure we can stick around long-term.