WordPress Ad Banner

The Open-Source AI Debate Heats Up with Recent Google and Meta Headlines


The discussion surrounding open-source AI is reaching new levels of intensity within the realm of Big Tech, fueled by recent developments involving Google and Meta.

According to a report from CNBC on Tuesday evening, Google’s latest large language model (LLM) PaLM 2 is said to utilize nearly five times more text data for training compared to its predecessor. However, Google had initially claimed that PaLM 2 was smaller in size while employing a more efficient technique. Notably, Google did not disclose specific details about the training data’s size or other relevant information.

WordPress Ad Banner

While a Google spokesperson declined to comment on the CNBC report, several Google engineers expressed their dissatisfaction with the leak and were eager to voice their opinions. In a tweet that has since been removed, Dmitry (Dima) Lepikhin, a senior staff software engineer at Google DeepMind, directed strong language towards the individual responsible for leaking PaLM 2 details, stating, “whoever leaked PaLM2 details to cnbc, sincerely fuck you!”

Additionally, Alex Polozov, a senior staff research scientist at Google, shared his thoughts in what he described as a “rant,” highlighting the concerns regarding increased siloing of research brought about by such leaks.

Lucas Beyer, a Google AI researcher based in Zurich, echoed similar sentiments, expressing his dismay not only at the potential accuracy of the token count but also at the broader impact of the leak. Beyer emphasized the erosion of trust and respect resulting from such incidents, which could ultimately lead to more guarded communication, reduced openness over time, and a less favorable work and research environment.

The leaked information has stirred up further debate and intensified the ongoing conversation surrounding open-source AI, with implications that extend beyond the specific details of PaLM 2. The incident raises questions about the delicate balance between transparency and the protection of intellectual property in the fast-paced world of AI development.

Meta’s LeCun: “The platform that will win will be the open one”

Not in response to the Google leak — but in coincidental timing — Meta chief AI scientist Yann LeCun did an interview focusing on Meta’s open-source AI efforts with the New York Times, which published this morning.

The piece describes Meta’s release of its LLaMA large language model in February as “giving away its AI crown jewels” — since it released the model’s source code to “academics, government researchers and others who gave their email address to Meta [and could then] download the code once the company had vetted the individual.”

“The platform that will win will be the open one,” LeCun said in the interview, later adding that the growing secrecy at Google and OpenAI is a “huge mistake” and a “really bad take on what is happening.”

In a Twitter thread, VentureBeat journalist Sean Michael Kerner pointed out that Meta has “actually already gave away one of the most critical AI/ML tools ever created — PyTorch. The foundational stuff needs to be open/and it is. After all, where would OpenAI be without PyTorch?”

Meta’s take on open source is nuanced

But even Meta and LeCun will only go so far in terms of openness. For example, Meta had made LLaMA’s model weights available for academics and researchers on a case-by-case basis — including Stanford for its Alpaca project — but those weights were subsequently leaked on 4chan. That leak is what actually allowed developers around the world to fully access a GPT-level LLM for the first time, not the Meta release, which did not include releasing the LLaMA model for commercial use.

VentureBeat spoke to Meta last month about the nuances of its take on the open- vs. closed-source debate. Joelle Pineau, VP of AI research at Meta, said in our interview that accountability and transparency in AI models is essential.

“More than ever, we need to invite people to see the technology more transparently and lean into transparency,” she said, explaining that the key is to balance the level of access, which can vary depending on the potential harm of the model.

“My hope, and it’s reflected in our strategy for data access, is to figure out how to allow transparency for verifiability audits of these models,” she said.

On the other hand, she said that some levels of openness go too far. “That’s why the LLaMA model had a gated release,” she explained. “Many people would have been very happy to go totally open. I don’t think that’s the responsible thing to do today.”

LeCun remains outspoken on AI risks being overblown

Still, LeCun remains outspoken in favor of open-source AI, and in the New York Times interview argued that the dissemination of misinformation on social media is more dangerous than the latest LLM technology.

“You can’t prevent people from creating nonsense or dangerous information or whatever,” he said. “But you can stop it from being disseminated.”

And while Google and OpenAI may become more closed with their AI research, LeCun insisted he — and Meta — remain committed to open source, saying “progress is faster when it is open.”