WordPress Ad Banner

Google Chrome M121 Unveils Game-Changing AI Features for a More Personalized Browsing Experience

Welcome to the next level of browsing! Google Chrome M121 is here with a dazzling array of generative AI features designed to revolutionize your web experience. In this blog post, we’ll explore three groundbreaking features that promise to simplify, enhance, and personalize your browsing journey. Get ready for a Chrome makeover like never before!

Tab Organizer: Declutter Your Digital Space

Tired of drowning in a sea of open tabs? Chrome’s new Tab Organizer feature is your savior. This innovative tool uses advanced machine learning to automatically group and label similar tabs, putting an end to the chaos. Simply right-click on a tab and choose “Organize Similar Tabs” or click the drop-down arrow to the left of the tabs. Chrome even suggests names and emojis for your tab groups, making navigation a breeze.

Create with AI: Your Theme, Your Way

Inject a personalized touch into your Chrome experience with the Create with AI feature. This tool lets you generate custom themes based on your preferred subject, mood, visual style, and color. Want an “aurora borealis” theme in an “animated” style with a “serene” mood? Just click the “Customize Chrome” button, select “Change theme,” and choose “Create with AI.” Watch as Chrome brings your vision to life using a text-to-image diffusion model, previously seen in Android 14 and Pixel devices.

Help Me Write: AI-Powered Text Assistance

Struggling to find the right words? Say hello to Help Me Write, Chrome’s AI-powered text assistance feature. This tool suggests ways to polish, expand, or adjust the tone of your text based on your preferences. Right-click on any text box or field on a website and choose “Help me write” to unleash the power of generative AI in your writing endeavors. Note: This feature is set to arrive in the next month’s Chrome release.

 Google Chrome M121

Google Chrome’s AI Ambitions and Challenges:

As the world’s most popular web browser, Chrome continues to push the boundaries of AI integration. These new features represent Google’s ongoing commitment to innovation. With a global market share of 62.85%, Chrome has already introduced AI features such as real-time video captions, malicious site detection, permission prompt management, and key point generation for web pages.

However, these exciting additions have sparked mixed reviews. While users and experts applaud the convenience and creativity of these AI tools, concerns about privacy, security, and accuracy have been raised. During our exploration of the update, we observed occasional hiccups with the Tab Organizer feature, which sometimes grouped unrelated tabs or failed to function.

Google reassures users that privacy and security are top priorities. The company emphasizes that it neither collects nor stores personal information from these AI features. Constant improvements in AI model quality and reliability are underway, with Google actively seeking user feedback to refine these experimental features.

Google Delays Gemini AI Launch

Google has chosen to delay the launch of its highly anticipated Gemini AI model, intended to compete with OpenAI’s GPT-4, until the following year. According to sources cited by The Information, Google CEO Sundar Pichai made the decision to postpone the scheduled launch events in California, New York, and Washington due to performance issues in languages other than English.

Gemini, designed as a multimodal AI model capable of comprehending and generating text, images, and various data types, has encountered challenges in multilingual functionality. In comparison to GPT-4, Gemini falls short in this aspect, prompting Google engineers to recognize the need for further improvement. While smaller versions of Gemini are undergoing testing, the development of the full-scale Gemini model is still in progress.

This isn’t the first instance of a delay for Gemini; earlier reports indicated a pushback for the cloud version of the model. Consequently, AI-driven products like the Bard chatbot, expected to benefit from Gemini enhancements, will now face a delay until the following year.

Google initially unveiled Gemini at its I/O event, emphasizing its impressive multimodal capabilities and efficiency in tool and API integrations. The company planned to offer Gemini in various sizes, including a mobile-friendly “Gecko” version, with the goal of attracting third-party developers.

The key question remains when Gemini will be seamlessly integrated into Google’s services, such as Bard, Search, and Workspace.

Gemini’s Role in Shaping the Future of Internet Information Flow

The significance of Gemini for Google lies in its potential to demonstrate the company’s ability to rival or surpass OpenAI, shaping a new internet landscape where information flow transitions from traditional search and the World Wide Web to chatbots.

Gemini’s success would also challenge the industry perception that GPT-4 is the ultimate benchmark, showcasing that there is still room for breakthroughs in underlying Transformer technology and scaling principles. While Google holds an advantage in data and computing, its ability to capitalize on this advantage has been hindered, in part, by Microsoft’s partnership with OpenAI.

Since March 2023, no company, whether a major tech player or an innovative startup, whether operating with closed or open source models, has managed to release a model comparable to GPT-4. Instead, the market is flooded with language models at the GPT-3.5 level, a standard that now seems easily attainable.

GPT-4’s advanced capabilities stem from its larger, more complex, and more expensive architecture. Using a mixture of interconnected AI models (a Mixture of Experts), GPT-4 surpasses a single large model. It is speculated that Google’s Gemini is built on a similar concept. OpenAI’s CEO, Sam Altman, has hinted at a timeline for the release of GPT-5, expected to be even more advanced.

However, the intricate architecture of these models also comes with a high cost for inference. In response, OpenAI is striving to lower prices with models like GPT-4 Turbo, even if it means compromising on some aspects of quality.

Vulnerability of AI Language Models: Manipulation Risks and Security Threats

Researchers from the University of Sheffield recently conducted a study that shed light on the vulnerability of popular artificial intelligence(AI) applications, such as ChatGPT and others, to potential exploitation for crafting harmful Structured Query Language (SQL) commands. Their findings indicate the possibility of launching cyber attacks and compromising computer systems using these AI applications.

The study, co-led by Xutan Peng, a PhD student, and his team, targeted Text-to-SQL systems utilized for creating natural language interfaces to databases. Their investigation included applications like BAIDU-UNIT, ChatGPT, AI2SQL, AIHELPERBOT, Text2SQL, and ToolSKE.

Peng emphasized, “Many companies are unaware of these threats, and due to the complexity of chatbots, even within the community, there are aspects not fully understood.” Despite ChatGPT being a standalone system with minimal risks to its own service, the research revealed its susceptibility to producing malicious SQL code that could cause substantial harm to other services.

The vulnerabilities found within these AI applications opened doors for potential cyber threats, allowing the exploitation of systems, theft of sensitive information, manipulation of databases, and execution of Denial-of-Service attacks, rendering machines or networks inaccessible to users.

Peng highlighted an example where individuals, including professionals like nurses, employ AI models like ChatGPT for productivity purposes, inadvertently generating harmful SQL commands that could cause severe data mismanagement in scenarios like interacting with databases storing clinical records.

Additionally, the researchers identified a concerning issue during the training of Text-to-SQL models, where they could surreptitiously embed harmful code, resembling a Trojan Horse, within the models. This “invisible” code could potentially harm users who utilize these compromised systems.

Dr. Mark Stevenson, a senior lecturer at the University of Sheffield, stressed the complexity of large language models used in Text-to-SQL systems, acknowledging their potency but also their unpredictability. The research team shared their findings with companies like Baidu and OpenAI, leading to the resolution of these vulnerabilities in their AI applications.

The study, published in arXiv, emphasizes the need to recognize and address potential software security risks associated with Natural Language Processing (NLP) algorithms. Their findings underscore the importance of exploring methods to safeguard against such exploitations in the future.

Study Abstract:

The study conducted by the University of Sheffield revealed vulnerabilities in Text-to-SQL systems within several commercial applications, showcasing the potential exploitation of Natural Language Processing models to produce malicious code. This signifies a significant security threat that could result in data breaches and Denial of Service attacks, posing a serious risk to software security. The research aims to draw attention to these vulnerabilities within NLP algorithms and encourages further exploration into safeguarding strategies to mitigate these risks.

The Role of Generative AI in Empowering Developers and Learning to Code

Founder of Datasette, Simon Willison, believes that this is an ideal moment to venture into programming, not due to the prospect of AI taking over coding but rather because it can simplify the learning process. He asserts that large language models, contrary to rendering coding obsolete, actually flatten the learning curve. Willison emphasizes that we must not abandon the art of coding but rather harness generative AI to enhance the developer experience, regardless of one’s skill level.

Commending the “will to learn,” Willison’s insights on generative AI are highly regarded in the developer community. Another notable contributor is Mike Loukides from O’Reilly Media, known for his knack for distilling complex topics. When it comes to generative AI and coding, Loukides underscores that crafting effective prompts is a nuanced skill. He contends that to excel in prompting, one must possess expertise in the subject matter of the prompt.

The Art of Crafting Effective Prompts for Generative AI

In essence, being a proficient programmer is the key to success. Falling into the trap of thinking that AI is a repository of unparalleled wisdom, beyond human reach, can be counterproductive, warns Loukides. To effectively utilize coding tools like AWS CodeWhisperer or Google Codey, one must guide them towards producing the desired output. To instruct AI step by step in solving development issues, a deep understanding of the problem and the ability to craft precise prompts are essential.

Additionally, evaluating AI’s performance when it errs necessitates a certain level of expertise. Coding assistants are valuable for helping developers take on more ambitious projects, as Willison advocates, but they will not eliminate the need for developers to understand and generate code. Nor should we aspire for this, circling back to Willison’s initial argument.

AI’s role in learning to code is particularly beneficial for new developers or those venturing into unfamiliar languages, frameworks, or databases. Willison describes the steep learning curve, where a minor mistake like missing a semicolon can result in a perplexing error message, consuming hours to rectify. This frustration can discourage aspiring programmers, making them doubt their abilities.

Achieving Coding Confidence with AI Support

This is where AI assistants come to the rescue. Willison argues that you should not require a computer science degree to automate repetitive tasks. Language model-backed assistants like ChatGPT can streamline these tedious operations. GitHub engineer Jaana Dogan emphasizes that people often focus on code generation but overlook the utility of large language models in code analysis and other tasks. We don’t need AI to do all the work; as Willison suggests, it should handle mundane, non-critical tasks that, if left to the developer, might erode their confidence.

To embark on the journey of generative AI and software development, the key is to start small. Begin by automating simple, repetitive tasks that you understand but would rather not rewrite continuously. The time saved can be redirected towards tackling more intricate coding challenges. As your expertise grows, you can gradually automate these more complex tasks, ultimately improving your proficiency.

Why We Must Teach AI: The Importance of AI Education

Artificial Intelligence (AI) is a transformative force that has rapidly become ubiquitous in our daily lives. Its applications are extensive, ranging from self-driving cars to healthcare diagnostics. As AI increasingly shapes our world, it becomes crucial to understand why we must teach AI to the next generation. In this blog post, we will explore the significance of AI education and why it is vital for the future.

Empowering the Workforce

Empowering the workforce with AI education is essential. AI is not just a technology; it’s a paradigm shift in problem-solving. A workforce well-versed in AI can harness its potential to increase productivity and efficiency across various industries. AI education equips individuals with the knowledge and skills needed to work collaboratively with AI systems, enhancing their competitiveness in the job market.

Bridging the Digital Divide

The digital divide, which denotes the gap between those with access to technology and those without, is a pressing societal issue. Teaching AI can help bridge this divide. By providing access to AI education, we can level the playing field, ensuring that individuals from diverse backgrounds and varying socioeconomic statuses have the opportunity to understand and utilize this powerful technology.

Ethical AI Development

The ethical dimensions of AI are of growing concern. To create AI systems that respect human values and rights, we need a generation of AI professionals well-versed in ethical considerations. Teaching AI includes educating individuals about the ethical implications of AI, enabling them to make informed decisions when designing, developing, and deploying AI systems.

Fostering Innovation

AI education fosters innovation by encouraging creativity and the application of AI techniques to solve complex problems. With the right knowledge and training, individuals can develop new AI applications that address real-world challenges. This can lead to groundbreaking inventions and improvements across various industries.

Preparing for the Future

The future is undeniably AI-driven. From autonomous systems to AI-powered healthcare, our lives will increasingly intersect with AI technologies. Teaching AI prepares individuals for the future and ensures they are not left behind. It enables them to understand, adapt to, and thrive in a world where AI is deeply embedded in various aspects of society.

Solving Global Challenges

AI has the potential to address some of the most significant global challenges, from climate change and healthcare disparities to poverty and food security. Teaching AI equips individuals with the tools to leverage AI in finding innovative solutions to these complex problems. By empowering a global AI workforce, we can collectively tackle the most pressing issues of our time.

Conclusion

In a world where AI is reshaping industries, economies, and societies, it is imperative that we teach AI. AI education is not just for technology enthusiasts but for everyone who wishes to remain relevant and make a positive impact in an AI-driven world. It empowers individuals, fosters ethical development, and prepares us for the future. As AI continues to evolve, the need for AI education becomes increasingly vital, and by investing in it, we can unlock the full potential of this transformative technology.

UniSim: Revolutionizing AI Training with Realistic Simulation and Sim-to-Real Bridge

In a groundbreaking collaboration involving Google DeepMind, UC Berkeley, MIT, and the University of Alberta, a novel machine learning model named UniSim has been developed, aiming to usher in a new era of AI training simulations. UniSim is designed to generate highly realistic simulations to train a wide range of AI systems, offering a universal simulator of real-world interactions.

UniSim’s primary objective is to provide realistic experiences in response to actions taken by humans, robots, and interactive agents. While still in its early stages, it represents a significant step towards achieving this ambitious goal, with the potential to revolutionize fields like robotics and autonomous vehicles.

Introducing UniSim

UniSim is a generative model capable of emulating interactions between humans and the surrounding environment. It has the capacity to simulate both high-level instructions, such as “open the drawer,” and low-level controls, like “move by x, y.” This simulated data serves as a valuable resource for training other models that require data mimicking real-world interactions.

The researchers behind UniSim propose the integration of a vast array of data sources, including internet text-image pairs, motion-rich data from navigation, manipulation, human activities, robotics, as well as data from simulations and renderings, into a conditional video generation framework.

UniSim’s unique ability lies in its aptitude to merge diverse data sources and generalize beyond its training examples, enabling precise fine-grained motion control in static scenes and objects.

Diverse Data Sources Unified

To achieve its extraordinary capabilities, UniSim underwent training using a diverse dataset drawn from simulation engines, real-world robot data, human activity videos, and image-description pairs. The challenge was to integrate various datasets with different labeling and distinct purposes. For example, text-image pairs offered rich scenes but lacked movement, while video captioning data described high-level activities but lacked detail on low-level movement.

To address this challenge, the researchers homogenized these disparate datasets, utilizing transformer models for creating embeddings from text descriptions and non-visual modalities, such as motor controls and camera angles. They trained a diffusion model to encode the visual observations depicting actions, then conditioned the diffusion model to the embeddings, connecting observations, actions, and outcomes.

The result was UniSim’s capability to generate photorealistic videos, covering a spectrum of activities including human actions and environmental navigation. Moreover, it can execute long-horizon simulations, demonstrating its proficiency in preserving the scene’s structure and contained objects.

Bridging the Gap: Sim-to-Real

UniSim’s potential extends to bridging the “sim-to-real gap” in reinforcement learning environments. It can simulate diverse outcomes, particularly in robotics, enabling offline training of models and agents without the need for real-world training. This approach offers several advantages, including access to unlimited environments, real-world-like observations, and flexible temporal control frequencies.

The high visual quality of UniSim narrows the gap between learning in simulation and the real world, making it possible for models trained with UniSim to generalize to real-world settings in a zero-shot manner.

Applications of UniSim

UniSim has a wide array of applications, including controllable content creation in games and movies, training embodied agents purely in simulations for deployment in the real world, and supporting vision language models like DeepMind’s RT-X models. It has the potential to provide vast amounts of training data for vision-language planners and reinforcement learning policies.

Moreover, UniSim can simulate rare events, a feature crucial in applications like robotics and self-driving cars, where data collection is expensive and risky. Despite its resource-intensive training process, UniSim holds the promise of advancing machine intelligence by instigating interest in real-world simulators.

In conclusion, UniSim represents a groundbreaking development in the realm of AI training simulations, offering the potential to create realistic experiences for AI systems across various fields, ultimately bridging the gap between simulation and the real world. Its capacity to provide diverse and realistic training data makes it a valuable asset for the future of machine learning and artificial intelligence.

Microsoft Set to Unveil Its Latest AI Chip, Codenamed ‘Athena,’ Next Month

After years of development, Microsoft is on the cusp of revealing its highly-anticipated AI chip, codenamed ‘Athena,’ at the upcoming annual ‘Ignite’ event next month. This unveiling marks a significant milestone for the tech giant, as it signals a potential shift away from its reliance on GPUs manufactured by NVIDIA, the dominant player in the semiconductor industry.

Microsoft has meticulously crafted its Athena chip to empower its data center servers, tailoring it specifically for training and running large-scale language models. The motivation behind this endeavor stems from the ever-increasing demand for NVIDIA chips to fuel AI systems. However, NVIDIA’s chips are notorious for being both scarce and expensive, with its most powerful AI offering, the H100 chip, commanding a hefty price tag of $40,000.

By venturing into in-house GPU production, Microsoft aims to curb costs and bolster its cloud computing service, Azure. Notably, Microsoft had been covertly working on Athena since 2019, coinciding with its $1 billion investment in OpenAI, the visionary organization behind ChatGPT. Over the years, Microsoft has allocated nearly $13 billion to support OpenAI, further deepening their collaboration.

Athena’s Arrival: Microsoft’s In-House AI Chip Ready for the Spotlight

Besides advancing its own AI aspirations, Microsoft’s chip could potentially aid OpenAI in addressing its own GPU requirements. OpenAI has recently expressed interest in developing its AI chip or potentially acquiring a chipmaker capable of crafting tailored chips for its unique needs.

This development holds promise for OpenAI, especially considering the colossal expenses associated with scaling ChatGPT. A Reuters report highlights that expanding ChatGPT to a tenth of Google’s search scale would necessitate an expenditure of approximately $48.1 billion for GPUs, along with an annual $16 billion investment in chips. Sam Altman, the CEO of OpenAI, has previously voiced concerns about GPU shortages affecting the functionality of his products.

To date, ChatGPT has relied on a fleet of 10,000 NVIDIA GPUs integrated into a Microsoft supercomputer. As ChatGPT transitions from being a free service to a commercial one, its demand for computational power is expected to skyrocket, requiring over 30,000 NVIDIA A100 GPUs.

Microsoft’s Athena: A Potential Game-Changer in the Semiconductor Race

The global chip supply shortage has only exacerbated the soaring prices of NVIDIA chips. In response, NVIDIA has announced the upcoming launch of the GH200 chip, featuring the same GPU as the H100 but with triple the memory capacity. Systems equipped with the GH200 are slated to debut in the second quarter of 2024.

Microsoft’s annual gathering of developers and IT professionals, ‘Ignite,’ sets the stage for this momentous revelation. The event, scheduled from November 14 to 17 in Seattle, promises to showcase vital updates across Microsoft’s product spectrum.

Llama 2 Long: Redefining AI for Handling Complex User Queries

Meta Platforms has unveiled a groundbreaking AI model that may have slipped under the radar during its annual Meta Connect event in California. While the tech giant showcased numerous AI-powered features for its popular apps like Facebook, Instagram, and WhatsApp, the real standout innovation is Llama 2 Long, an extraordinary AI model designed to provide coherent and relevant responses to extensive user queries, surpassing some of the leading competitors in the field.

Llama 2 Long is an extension of the previously introduced Llama 2, an open-source AI model from Meta known for its versatility in tasks ranging from coding and mathematics to language comprehension, common-sense reasoning, and conversational abilities. What sets Llama 2 Long apart is its capacity to handle more substantial and complex inputs, making it a formidable rival to models like OpenAI’s GPT-3.5 Turbo and Claude 2, which struggle with extended contextual information.

The inner workings of Llama 2 Long are a testament to Meta’s dedication to pushing the boundaries of AI technology. Meta’s research team used varying versions of Llama 2, spanning from 7 billion to 70 billion parameters, which are the adjustable values that govern how the AI model learns from data. They augmented the model with an additional 400 billion tokens of data containing longer texts compared to the original Llama 2 dataset.

Furthermore, the architecture of Llama 2 underwent subtle alterations, primarily in how it encodes the position of each token within a sequence. The introduction of Rotary Positional Embedding (RoPE) proved pivotal, as it allowed each token to be mapped onto a 3D graph that reflects its relationship with other tokens, even when rotated. This innovation enhances the model’s accuracy and efficiency, reducing its reliance on extensive information and memory, which sets it apart from other techniques.

The researchers took the innovative step of reducing the rotation angle of the RoPE encoding from Llama 2 to Llama 2 Long, enabling the model to accommodate more distant or less frequent tokens in its knowledge base. Additionally, they employed reinforcement learning from human feedback (RLHF) and synthetic data generated by Llama 2 itself to fine-tune the model’s performance across various tasks.

The paper detailing Llama 2 Long’s capabilities asserts that the model can generate high-quality responses to user queries containing up to 200,000 characters, equivalent to approximately 40 pages of text. The paper provides illustrative examples of Llama 2 Long’s responses across a range of subjects, including history, science, literature, and sports.

Meta’s researchers regard Llama 2 Long as a significant stride towards the development of more versatile and general AI models capable of addressing diverse and intricate user needs. They also acknowledge the ethical and societal implications of such models, emphasizing the need for further research and dialogue to ensure their responsible and beneficial utilization.

In conclusion, Meta’s introduction of Llama 2 Long represents a remarkable advancement in the realm of AI, with the potential to revolutionize how AI models handle complex and extensive user queries while also underlining the importance of ethical considerations in their deployment.

Apple’s AI Chief Has Announced iOS 17 Update Gives Users Choice of Search Engine

Former high-ranking Google executive John Giannandrea recently highlighted a significant alteration in the latest iPhone software update, iOS 17, which was unveiled on September 25. This update introduces a noteworthy change that allows users to opt for a search engine other than Google when navigating in private mode.

In the wake of growing privacy concerns among users, Google, the tech behemoth, has found itself under increased scrutiny from the public regarding issues of user choice and competition within the search engine market.

The iOS 17 software release has introduced a pivotal feature by adding a second setting that empowers iPhone users to seamlessly switch between Google and alternative search engines. This development was emphasized by the head of Apple’s artificial intelligence division during his testimony in a federal court in Washington as part of the Justice Department’s antitrust lawsuit against Alphabet Inc.’s Google.

This newly added feature simplifies the process of changing search engines with a single tap, a move aimed at addressing concerns surrounding Google’s alleged monopoly in online search. This issue has gained prominence in light of the U.S. government’s antitrust lawsuit, which contends that Google has been unlawfully maintaining its dominant position through agreements with web browsers and mobile device manufacturers, including Apple.

Initially, Google refuted these allegations, asserting in its opening statement that users can easily switch search engines in a matter of seconds. However, Gabriel Weinberg, the CEO of rival search engine DuckDuckGo, testified on September 28 that Google’s default status on browsers is perceived as a barrier to users changing their preferences, citing a convoluted process.

Furthermore, Google’s default position as the search engine in Apple’s Safari, the web browser for Apple devices, is a result of contractual obligations between the two tech giants. As part of this arrangement, Google shares a portion of its advertising revenue with Apple, although the exact sum remains confidential. According to reports, the Justice Department has indicated that Google pays Apple an annual amount estimated to be between $4 billion and $7 billion.

Giannandrea clarified in his testimony that Google will continue to be the default search engine for Safari in private mode, which does not store browsing history. However, the new update offers users the flexibility to choose from a range of search engines, including Yahoo Inc., Microsoft Corp.’s Bing, DuckDuckGo, and Ecosia, for their private browsing experience.

John Giannandrea, currently leading Apple’s AI division, previously worked at Google from 2010 to 2018 in the role of Senior Vice President of Engineering. In his current capacity, Giannandrea is spearheading machine learning initiatives at Apple and driving AI-powered endeavors for the company.

OpenAI’s ChatGPT Unveils New Voice and Image Features for Enhanced User Interaction

OpenAI’s ChatGPT, the AI-powered language model, is unveiling a set of exciting new features, allowing users to “see, hear, and speak.” These enhancements are designed to make ChatGPT more user-friendly and versatile, offering a variety of ways for users to interact with the AI model.

OpenAI has announced a phased rollout of voice and image capabilities within ChatGPT over the next two weeks. These features are intended to empower users to engage in voice conversations and visually convey their queries to ChatGPT, making the AI experience even more interactive and accessible.

The primary goal behind these updates is to enhance the utility and user-friendliness of ChatGPT. According to MIT Technology Review, OpenAI has been diligently refining its technology with the aim of providing a comprehensive AI solution through the ChatGPT Plus app. This puts it in direct competition with virtual assistants like Siri, Google Assistant, and Alexa.

OpenAI emphasized the significance of these new features, stating, “Voice and image give you more ways to use ChatGPT in your life. Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it.” The voice feature will be available on both iOS and Android platforms, with the option to opt-in through your settings, while the image feature will be functional across all platforms.

OpenAI went on to explain how users can leverage these capabilities: “You can now use voice to engage in a back-and-forth conversation with your assistant. Speak with it on the go, request a bedtime story for your family, or settle a dinner table debate.”

The image feature had been hinted at earlier in March when GPT-4, the model powering ChatGPT, was introduced. However, it was not accessible to the general public at the time. Now, users can upload images to the app and inquire about the content of those images, expanding the AI’s versatility.

MIT Technology Review also noted that this announcement follows the recent integration of DALL-E 3, OpenAI’s image-generation model, into ChatGPT. This integration allows users to instruct the chatbot to generate images based on their input.

Additionally, OpenAI has partnered with Be My Eyes, enabling users to ask ChatGPT questions based on images, further expanding its practical applications.

Powering the voice feature of ChatGPT, OpenAI utilized Whisper, its speech-to-text model, to convert spoken words into text, which ChatGPT can then process, enabling voice interactions with the AI software. Joanne Jang, a producer manager at OpenAI, mentioned that synthetic voices were created by training the text-to-speech model on the voices of hired actors. OpenAI is also considering the possibility of allowing users to create their own custom voices in the future.

OpenAI is taking privacy, safety, and accessibility concerns seriously with the introduction of these features. They have outlined a multifaceted approach to address these issues, including content moderation, responsible data handling, clear user guidelines, restrictions on sensitive topics, and a strong focus on ethical software use. Furthermore, OpenAI is actively collaborating with external organizations, researchers, and experts to conduct audits and assessments of the system, ensuring that ChatGPT remains a responsible and reliable tool for users.