Meta Releases Llama 4: Multimodal Capabilities Emerge

Meta has announced the release of Llama 4, a new iteration in their open-source model series, notably featuring multimodal capabilities (text, image, potentially audio/video understanding). While this release expands the open-source ecosystem, initial assessments suggest it doesn't represent a fundamental leap compared to leading models like DeepSeek V3. Consequently, the progress towards AGI remains estimated at 39%. Llama 4 is a valuable contribution, but not seen as significantly altering the AGI timeline currently.

Read more...
Related Links: Official Blog Post: Llama 4 Multimodal Intelligence

Qwen2.5-Omni: First Open Source Omni Model Released

Qwen has introduced voice and video chat capabilities in Qwen Chat, allowing users to interact with the AI like a phone or video call. The team has open-sourced Qwen2.5-Omni-7B under Apache 2.0 license, releasing both the model and technical details. This is an omni model - a single model that understands text, audio, images, and video while outputting text and audio. It features a "thinker-talker" architecture enabling simultaneous thinking and talking. The team believes AGI will be agent-based on omni models.

Read more...
Related Links: Qwen Chat Demo Video Demo Technical Paper Blog Post GitHub Repository Hugging Face Model ModelScope