WordPress Ad Banner

Empower Your Websites: Opting Out of Google’s Bard and Future AI Training


Large language models have been trained on vast amounts of data, much of which was collected without user knowledge or consent. Now, website owners have a choice when it comes to allowing their web content to be used by Google as material to feed its Bard AI and future models.

This choice can be as simple as disallowing “User-Agent: Google-Extended” in your site’s robots.txt file, which informs automated web crawlers about the content they can access.

WordPress Ad Banner

While Google asserts its commitment to developing AI in an ethical and inclusive manner, the use case for AI training differs significantly from indexing the web.

Danielle Romain, the company’s VP of Trust, acknowledges in a blog post that web publishers desire greater choice and control over how their content is used for emerging generative AI applications, seemingly as if this is a revelation.

Curiously, the term “train” is conspicuously absent from the post, even though it is evident that this data serves as raw material for training machine learning models.

Instead, the VP of Trust poses the question of whether you are unwilling to “help improve Bard and Vertex AI generative APIs” – an invitation to assist these AI models in becoming more accurate and capable over time.

In essence, Google is not taking something from you; it’s about whether you are willing to contribute voluntarily.

This framing of the question may be the most suitable approach, as consent is a vital aspect of this equation, and seeking a positive choice to contribute aligns with ethical principles. However, the authenticity of this approach is undermined by the fact that Bard and other models have already been trained on enormous amounts of data obtained from users without their consent.

Google’s actions undeniably reveal that it exploited unrestricted access to web data, obtained what it needed, and is now seeking permission after the fact to appear as though consent and ethical data collection are top priorities. If they truly were, this option would have been available years ago.