WordPress Ad Banner

Google’s DeepMind Introduces AI System Outperforming Human Fact-Checkers

In a groundbreaking study, Google’s DeepMind research unit has unveiled an artificial intelligence system that outperforms human fact-checkers in assessing the accuracy of information produced by large language models. This innovative system, known as the Search-Augmented Factuality Evaluator (SAFE), leverages a multi-step process to analyze text and verify claims using Google Search results.

Evaluating Superhuman Performance

In a recent study titled “Long-form factuality in large language models,” published on arXiv, SAFE showcased remarkable accuracy, aligning with human ratings 72% of the time and outperforming human judgment in 76% of disagreements. Nevertheless, the concept of “superhuman” performance is sparking lively discussions, with some experts debating the comparison against crowdworkers instead of expert fact-checkers.

Cost-Effective Verification

One of SAFE’s significant advantages is its cost-effectiveness. The study revealed that utilizing SAFE was approximately 20 times cheaper than employing human fact-checkers. With the exponential growth of information generated by language models, having an affordable and scalable method for verifying claims becomes increasingly crucial.

Benchmarking Top Language Models

The DeepMind team utilized SAFE to evaluate the factual accuracy of 13 leading language models across four families, including Gemini, GPT, Claude, and PaLM-2, on the LongFact benchmark. Larger models generally exhibited fewer factual errors, yet even top-performing models still generated significant false claims. This emphasizes the importance of automatic fact-checking tools in mitigating the risks associated with misinformation.

Prioritizing Transparency and Accountability

While the SAFE code and LongFact dataset have been made available for scrutiny on GitHub, further transparency is necessary regarding the human baselines used in the study. Understanding the qualifications and processes of crowdworkers is essential for accurately assessing SAFE’s capabilities.

UniSim: Revolutionizing AI Training with Realistic Simulation and Sim-to-Real Bridge

In a groundbreaking collaboration involving Google DeepMind, UC Berkeley, MIT, and the University of Alberta, a novel machine learning model named UniSim has been developed, aiming to usher in a new era of AI training simulations. UniSim is designed to generate highly realistic simulations to train a wide range of AI systems, offering a universal simulator of real-world interactions.

UniSim’s primary objective is to provide realistic experiences in response to actions taken by humans, robots, and interactive agents. While still in its early stages, it represents a significant step towards achieving this ambitious goal, with the potential to revolutionize fields like robotics and autonomous vehicles.

Introducing UniSim

UniSim is a generative model capable of emulating interactions between humans and the surrounding environment. It has the capacity to simulate both high-level instructions, such as “open the drawer,” and low-level controls, like “move by x, y.” This simulated data serves as a valuable resource for training other models that require data mimicking real-world interactions.

The researchers behind UniSim propose the integration of a vast array of data sources, including internet text-image pairs, motion-rich data from navigation, manipulation, human activities, robotics, as well as data from simulations and renderings, into a conditional video generation framework.

UniSim’s unique ability lies in its aptitude to merge diverse data sources and generalize beyond its training examples, enabling precise fine-grained motion control in static scenes and objects.

Diverse Data Sources Unified

To achieve its extraordinary capabilities, UniSim underwent training using a diverse dataset drawn from simulation engines, real-world robot data, human activity videos, and image-description pairs. The challenge was to integrate various datasets with different labeling and distinct purposes. For example, text-image pairs offered rich scenes but lacked movement, while video captioning data described high-level activities but lacked detail on low-level movement.

To address this challenge, the researchers homogenized these disparate datasets, utilizing transformer models for creating embeddings from text descriptions and non-visual modalities, such as motor controls and camera angles. They trained a diffusion model to encode the visual observations depicting actions, then conditioned the diffusion model to the embeddings, connecting observations, actions, and outcomes.

The result was UniSim’s capability to generate photorealistic videos, covering a spectrum of activities including human actions and environmental navigation. Moreover, it can execute long-horizon simulations, demonstrating its proficiency in preserving the scene’s structure and contained objects.

Bridging the Gap: Sim-to-Real

UniSim’s potential extends to bridging the “sim-to-real gap” in reinforcement learning environments. It can simulate diverse outcomes, particularly in robotics, enabling offline training of models and agents without the need for real-world training. This approach offers several advantages, including access to unlimited environments, real-world-like observations, and flexible temporal control frequencies.

The high visual quality of UniSim narrows the gap between learning in simulation and the real world, making it possible for models trained with UniSim to generalize to real-world settings in a zero-shot manner.

Applications of UniSim

UniSim has a wide array of applications, including controllable content creation in games and movies, training embodied agents purely in simulations for deployment in the real world, and supporting vision language models like DeepMind’s RT-X models. It has the potential to provide vast amounts of training data for vision-language planners and reinforcement learning policies.

Moreover, UniSim can simulate rare events, a feature crucial in applications like robotics and self-driving cars, where data collection is expensive and risky. Despite its resource-intensive training process, UniSim holds the promise of advancing machine intelligence by instigating interest in real-world simulators.

In conclusion, UniSim represents a groundbreaking development in the realm of AI training simulations, offering the potential to create realistic experiences for AI systems across various fields, ultimately bridging the gap between simulation and the real world. Its capacity to provide diverse and realistic training data makes it a valuable asset for the future of machine learning and artificial intelligence.

Google DeepMind To Power Its AI With AlphaGo

DeepMind, previously considered the undisputed leader in the realm of artificial intelligence (AI) in the past decade, has now claimed that its next-generation AI model will surpass OpenAI’s ChatGPT. The revelation was made by company co-founder and CEO Demis Hassabis in an interview with Wired.

DeepMind first made global news when Google decided to acquire the software company in 2014. Back then, the company pioneered the use of reinforcement learning to train its AI models, a method that provides AI feedback on its performance. DeepMind started off by employing the approach of teaching AI how to play video games.

Just two years later, DeepMind’s AI program AlphaGo stunned experts by defeating the human Go champion. Go, considered to be the world’s oldest board game, is one of high complexity. Many were under the assumption that computer programs were decades away from performing such a feat but DeepMind’s success demonstrated the potential of its approach. Now the company plans to bring the power of AlphaGo to its chatbot Gemini.

Google tries to keep up in chatbot race

Even as the DeepMind team kept working on AI models after its roaring success in 2016, Google took a cautious approach to unveiling them. Sam Altman’s OpenAI though was far more aggressive in showcasing its GPT models and became globally known after unveiling its ChatGPT chatbot last year.

Google’s rushed debut of its Bard chatbot was rather disappointing and the company can only hope that DeepMind’s upcoming product gives it a chance to fight ChatGPT, which currently powers its rival Microsoft’s Bing Search engine.

In April, Google formally merged the teams Brain and DeepMind at its AI lab, to synergize the work being done by the two powerhouses, in a bid to take on OpenAI.

Beating OpenAI at the game DeepMind knows better

Large language models (LLM), such as GPT-4 that powers ChatGPT, also use reinforcement learning to improve their performance. DeepMind plans to leverage its expertise from teaching a computer program how to master complex games like Go into its upcoming system Gemini with the aim of significantly surpassing GPT.

For instance, AlphaGo was capable of a function called tree search where it could compute possibilities of different moves on the board. When translated to language models, this could help them perform multiple tasks at once.

Apart from language models, DeepMind has also been working in areas such as robotics and neuroscience. Last week, it showcased an algorithm in which its robot arms can perform manipulation tasks, a much more complex skill than text prediction.

Hassabis is confident that his team can equip Gemini with capabilities to solve problems and also plan tasks, making it much superior that OpenAI’s offering. Anticipated expenditures for DeepMind on Gemini are estimated to reach tens of millions of dollars, mirroring the investment made by OpenAI for its GPT-4 model.

The bigger question revolves around the timeline for the unveiling of Gemini. Hassabis told Wired that it could be months before the system is ready and one will have to closely monitor OpenAI’s moves to ascertain whether DeepMind can genuinely beat ChatGPT.