OpenAI Plans Social Network to Rival X
OpenAI is developing a social network that could shake up the AI landscape. The project centers on ChatGPT's image generation and includes a social feed, according to sources close to the matter.
InternVL3, a new open-source AI model, matches or beats proprietary giants like GPT-4 and Gemini in understanding images and video. Its key breakthrough lies in how it learns to process visual information alongside language from the start, rather than adding these capabilities later.
The model excels at tasks that would stump many AIs: tracking objects in videos, analyzing complex charts, understanding technical diagrams, and even helping with mathematical problems that combine text and visuals.
In rigorous testing across multiple benchmarks, InternVL3 proved particularly strong at maintaining accuracy with longer videos and more complex visual scenarios.
What sets InternVL3 apart is its training approach. Unlike most AI models that learn language first and visual skills second, InternVL3 develops both abilities simultaneously. Think of it like raising a bilingual child who learns two languages naturally from birth, rather than learning a second language later in life.
The researchers also solved a common problem in AI vision: maintaining accuracy with longer videos and complex scenes. They developed a new way to help the AI keep track of spatial relationships over time, much like how humans maintain their sense of space and movement while watching a video.
The largest version of InternVL3 achieved a score of 72.2 on the MMMU benchmark - a comprehensive test of AI visual understanding. This puts it ahead of other open-source models and closer to proprietary leaders like Gemini-2.5 Pro.
Most importantly, the researchers are releasing both their code and training data to the public. This move could accelerate progress in AI vision technology by allowing other researchers to build on their work.
The implications extend beyond academic research. Better visual AI could improve everything from medical imaging to autonomous vehicles, making machines better at understanding the visual world the way humans do.
Why this matters:
Read on, my dear:
Fuel your morning with AI insights. Lands in your inbox 6 a.m. PST daily. Grab it free now! π