xLSTM better than Transformer models?

May 11, 2024 5 minute read

More topics: Apple invests in AI infrastructure, and DeepSeek unveils a new open-source LLM

Magic AI News

Hi AI Enthusiasts,

Welcome to this week’s Magic AI, where we bring you exciting updates from the world of Artificial Intelligence (AI) and technology. A lot has happened in the last week.

This week’s Magic AI tool is an automation powerhouse. Perfect for meetings, sales calls, coaching sessions, and YouTube videos.

Let’s explore this week’s AI news together. Stay curious! 😎

Top AI news of the week

💬 xLSTM: Language model from Europe

The European AI startup NXAI, founded by AI pioneer Sepp Hochreiter, has presented a new architecture for language models in a paper. The new architecture is based on the familiar LSTM architecture. And it is called xLSTM (short for Extended Long Short-Term Memory).

LSTMs have a kind of built-in short-term memory. The new xLSTM architecture combines the transformer technology with LSTM. The research question of the paper is:

“How far do we get in language modeling when scaling LSTMs to billions of parameters, leveraging the latest techniques from modern LLMs, but mitigating known limitations of LSTMs?”

According to the paper, larger xLSTM models could compete with current large language models. The researchers have enhanced LSTM architecture through exponential gating with memory mixing and a new memory structure. For a more in-depth look at the new architecture, you should check out the full paper.

Our thoughts

We have often used the LSTM architecture in the past for time series forecasts or anomaly detection in quality control. The LSTM architecture is powerful, but LSTMs cannot compete with state-of-the-art LLMs like Llama 3.

The new architecture expands the LSTM architecture so that the new architecture has the potential to be a real competitor to actual transformer models like Llama 3. We find the approach very exciting and will continue to follow this topic.

📚 Book recommendation (it’s free) to learn more about state-of-the-art Deep Learning models: Dive into Deep Learning

More information

xLSTM: Extended Long Short-Term Memory - arXiv
xLSTM: Extended Long Short-Term Memory – better AI models from Europe - Heise Online

🍎 Apple invests in AI infrastructure

Apple plans to integrate its own chips into servers. The demand for computing power for generative AI is increasing, and Apple is planning several functions in the new operating systems. Generative AI requires big server parks with much computing power.

According to the famous analyst Jeff Pu, Apple’s manufacturing partner Foxconn builds “AI servers” based on Apple’s M2-Ultra-SoC. In 2025, the M4 will then be integrated into the servers. A few days ago, Apple unveiled the new iPad Pro, which already uses the new M4 chip. Apple calls the new M4 chip an AI powerhouse.

Our thoughts

Apple has been using its own ARM chips in Macs since 2020. Apple develops hardware and software hand in hand, which is the reason for the impressive performance of Apple computers.

In addition, Apple has developed the efficient machine learning framework MLX for Apple chips. It’s like PyTorch but optimized for Apple hardware. It’s an open-source framework, available on GitHub with many examples.

More information

Apple Reportedly Building M2 Ultra and M4-Powered AI Servers - MacRumors
MLX GitHub repo - GitHub

🤖 DeepSeek unveils a new open-source LLM

The AI startup DeepSeek just presented DeepSeek-V2, an open-source mixture-of-experts (MoE) large language model. This open-source model has 236 billion parameters and supports a context length of 128,000 tokens.

DeepSeek-V2 achieves high performance in benchmarks. DeepSeek has adjusted the mixture-of-experts architecture and shows the potential of large open-source models. You can check out the official paper to learn more about this new open-source model. It also requires significantly fewer parameters than many of its competitors.

DeepSeek-V2 Benchmark (Image by DeepSeek)

DeepSeek-V2 is available on HuggingFace. In addition, you can use the cost-effective API of DeepSeek-V2.

Our thoughts

According to the benchmarks, DeepSeek-V2 seems to be a powerful open-source LLM. But it also has its weaknesses. It works especially well in Chinese because they used mainly data from China for training. In addition, it has its strengths, especially in math and coding tasks. The first tests also show that the model answers questions from a Chinese perspective.

More information

Magic AI tool of the week

Do you find writing meeting summaries or follow-up e-mails tedious? Then, you should check out the tool CastMagic! This tool turns your audio into content.

Imagine you have a sales call. Then, you need a summary of the call, including the customer’s name, asked questions, sentiment, next steps, and so on. Right!

CastMagic can do all of this for you. The tool does all the tedious work for you, and you can focus on the conversation with your customer. It’s a win for you and your customers.

👉🏽 Try CastMagic for Free*

Articles of the week

💡 Do you enjoy our content and want to read super-detailed articles about AI? If so, subscribe to our blog and get our popular data science cheat sheets for FREE.

Get our FREE Cheat Sheets

Thanks for reading, and see you next time.

- Tinz Twins

P.S. Have a nice weekend! 😉😉

Share on

Facebook LinkedIn

Tinz Twins

xLSTM better than Transformer models?

Top AI news of the week

💬 xLSTM: Language model from Europe

🍎 Apple invests in AI infrastructure

🤖 DeepSeek unveils a new open-source LLM

Magic AI tool of the week

Articles of the week

Share on

You may also enjoy

OpenAI releases open safety reasoning models

PayPal announces the integration into ChatGPT

Is Grokipedia the future of the Internet?

OpenAI and Broadcom plan AI chips and challenge Nvidia