Neuralangelo: NVIDIA's New AI Model Turns 2D Videos into 3D Structures

Leading artificial intelligence (AI) company and chip manufacturer Nvidia has introduced its latest addition to their impressive lineup of AI tools called Neuralangelo.

Meet Neuralangelo, an advanced AI model designed to revolutionize the transformation of 2D video clips into highly detailed 3D structures. Powered by neural networks and cutting-edge algorithms for 3D reconstruction, Neuralangelo has the ability to generate virtual replicas of real-world objects with astonishing realism.

The name Neuralangelo pays homage to the renowned Italian sculptor and painter, Michelangelo, whose artistic brilliance during the Renaissance era produced iconic works such as the sculpture of David and the breathtaking paintings adorning the Sistine Chapel ceiling.

In a remarkable demonstration, Neuralangelo showcases its capabilities by recreating a diverse range of objects, from the timeless beauty of Michelangelo’s David to the ordinary yet familiar sight of a flatbed truck.

Through the utilization of Neuralangelo, the boundaries between the 2D and 3D realms are seamlessly bridged. This breakthrough AI model opens up endless possibilities for industries such as architecture, entertainment, and virtual reality, enabling the creation of immersive experiences and accurate virtual representations of physical objects.

Nvidia’s Neuralangelo marks a significant milestone in the field of AI-driven video transformation, propelling us into an era where the conversion of flat footage into captivating 3D structures is now within reach. With its remarkable capabilities, Neuralangelo is poised to reshape the way we perceive and interact with visual content, ushering in a new era of virtual creativity and innovation.

The AI model emerged from a study done in collaboration with the NVIDIA research team and the Johns Hopkins University in Maryland, U.S.

Neuralangelo is one of nearly 30 projects by NVIDIA Research to be presented at the Conference on Computer Vision and Pattern Recognition (CVPR), taking place June 18-22 in Vancouver. The papers span topics including pose estimation, 3D reconstruction, and video generation, said the company in a blog.

Multiple Images Observed From Different Viewpoints

How it works is that the AI-powered model will observe the depth, shape, and size of the characters or objects in a 2D video from multiple angles. Neuralangelo will at first create an initial 3D representation of the scene and then will optimize the render to enhance it further to lift the intricate details and textures.

Creative professionals can then use the 3D outcome in design applications, editing them further for use in art, video game development, and robotics, said the company in a blog. It also equips users with the capability of creating digital twins of the real world using ubiquitous mobile devices.

Many are wondering what this means for the gaming industry, in which Nvidia’s graphic cards are a leader. The company recently announced the Nvidia RTX 4060 Ti, an upgrade after RTX 4070.

“The 3D reconstruction capabilities Neuralangelo offers will be a huge benefit to creators, helping them recreate the real world in the digital world,” said Ming-Yu Liu, senior director of research and co-author of the paper, in the blog.

“This tool will eventually enable developers to import detailed objects — whether small statues or massive buildings — into virtual environments for video games or industrial digital twins,” she added.

As one Twitter user described it as ‘photogrammetry on steroids,’ neural surface reconstruction methods used in Neuralangelo have shown potential in overcoming ambiguous observations like large areas of homogeneous colors, repetitive texture patterns, or strong color variations. Photogrammetry is a technique that uses photos as the primary medium for the measurement of physical objects.

The concept behind Neuralangelo is not new. NVIDIA research last year created NVIDIA 3D MoMa, which allows architects, designers, and game developers to import objects into a graphics engine for digital manipulation.

Multiple Images Observed From Different Viewpoints

You may also like these posts

Enhancing Data Risk Management for Generative AI and Large Language Models (LLMs) in Enterprise Environments

AutoGPT: The New Kid on the AI Block That’s Changing Everything!

Google’s DeepMind Introduces AI System Outperforming Human Fact-Checkers