OpenAI’s New AI Model GPT-4o: Understanding its Capabilities, Benefits and Potential Concerns

Authored By - Dr Yatin Kathuria

Introducing GPT-4o

Imagine an AI assistant that can truly perceive the world around you, understanding not just your words, but also the context, emotions, and visual nodes you convey. Sounds fictional? Well, it’s a reality! On May 13^th 2024, OpenAI, revealed its new model “GPT- 4o”, where “o” stands for “Omni”, signifying its versatility across various forms of communication. What sets it apart is its ability to understand and process information across multiple modalities: text, audio, and visual. This means that it can not only comprehend written text like its predecessor models, but can also analyze real-time video and audio inputs simultaneously, from your smartphone’s camera and microphone. Let’s say you’re having a video call with GPT 4o, and you show it a picture of an historical monument. Not only can GPT 4o analyse and describe the monument in excessive detail, but it can also understand any additional context you provide through your voice.

GPT-4o Capabilities and Benefits

The capabilities of Omni go far beyond simple image recognition and analysis. It can understand complex visual concepts, and can interpret even real-time video footage. This opens up a world of possibilities for applications in different fields like education, health care, and other creative industries. GPT-4o can create interactive educational content by combining text, images, and videos. For instance, it can generate engaging science lessons with visual explanations. It can adapt to individual student needs, providing personalized explanations, practice problems, and feedback across different subjects. Further in healthcare sector, GPT-4o can be utilised to analyze medical images (X-rays, MRIs) and provide preliminary diagnoses, aiding radiologists and doctors. Further, it can generate patient-friendly explanations about medical conditions, treatment options, and medication instructions. Also, with its upgraded contextual awareness it can be a valuable tool for customer interaction in various industries, GPT-4o can sustain rational and contextually relevant conversations over prolonged interactions thereby improving customer satisfaction without employing vast human resource. Capabilities of this AI model are undoubtedly impressive, however one of the most exciting aspects of this technology is its accessibility and affordability. According to OpenAI’s CEO, Sam Altman, GPT-4o will be available at half the price of GPT-4 Turbo, their previous flagship model. Furthermore, Altman revealed that GPT-4o will offer twice the speed and a staggering 5X increased rate limit for third-party developers. This means that more companies and developers will be able to integrate GPT-4o into their applications and services, making this powerful technology available to a wider audience.

Human-Like Conversations

OpenAI proclaims that, GP- 4o model can respond to audio inputs very quickly, in as little as 232 milliseconds, with an average of 320 milliseconds, which is parallel to human response time in a conversation. The model can engage in more natural humanlike conversations, adjusting its tone and delivery based on the emotional context of the interaction. For example, if you’re feeling frustrated or anxious, GPT-4o can detect those emotions in your voice and respond with a more empathetic and understanding tone. Imagine having a virtual assistant that can truly connect with you on an emotional level, providing not just factual information, but also emotional support and guidance when needed. You must have seen movie –“Her”, a thought-provoking science-fiction romantic drama film directed by Spike Jonze and released in year 2013. The film tells the story of a lonely man who develops an unconventional and deeply emotional relationship with an artificial intelligence operating system named Samantha. As their interactions evolve, man becomes increasingly attracted with Samantha’s intelligence and their bond become so stronger, leading to a unique and intimate connection. This new model from OpenAI can turn this fictional story to a reality. Apart from bracing such relationships, the functionality of Omni could be preferably valuable in fields like mental health, counselling, and customer service, where emotional intelligence is crucial.

Possible Downsides and Concerns

“We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities.”

(A Paragraph from a blog post Introducing GPT-4o Model published on OpenAI website)

Upon the launch of OpenAI’s GPT-4o model, accusations of violating individual rights emerged. OpenAI introduced a new voice called “Sky” in their GPT-4o chatbot. The Sky voice was weirdly similar to Scarlett Johansson’s voice, resembling her disembodied AI companion from the movie “Her” (the movie about which I have already discussed under heading –“Human-like Conversation”). Johansson called for legislation to protect her name, voice, or likeness misappropriated by OpenAI. This incident underscores the ethical complexities surrounding AI models, even though the public is yet to fully explore GPT-4o’s capabilities. The allegations by Johansson serve as an early indication of the potential concerns which may arise in near future. As OpenAI has itself professed through a website blog that the model 4o may exhibit risks and challenges, it is critical for the users to be aware of the probable risks associated with this new model .-

Privacy and Security– One of the primary concerns that can be intensified by Omni model is breach of user privacy and security. OpenAI Privacy Page states that the company may store and use content submitted to ChatGPT, including chats with GPT models to improve model performance. The multimodal capabilities of Omni model may augment the incidents of data beach and privacy concerns. When users provide videos or images, there’s a risk of inadvertently sharing sensitive or private information. The model could potentially recognize faces, locations, or other identifiable details.

Data Quality and Contamination: Another concern associated with OpenAI models is the possibility for delivering incorrect outputs. Like any AI model, GPT-4o is trained on data that may contain inherent biases or inaccuracies. Recently OpenAI’s GPT-4o faced issues with data contamination. As per MIT Technology Review, the tokenizer used for processing text was polluted by Chinese spam data. As a result, the Chinese token library within the model has phrases related to pornography and gambling. Such contamination can lead to poor performance and unintended consequences.

Bias and Fairness: Various studies have already confirmed that that OpenAI’s GPT models are biased because of the data used to train it. For instance, according to a study from researchers at the University of California, The AI model training process shows a preference for copyrighted works in the public domain, some studies have also identified political bias in ChatGPT responses.Conjoining multimodalities of text, audio, and visual data in Omni Model can introduce biases from multiple sources. Therefore, safeguarding fairness across different modalities (text, audio, and images) will be more challenging as bias detection and mitigation are essential to prevent discriminatory outputs.

Emotional Disconnect– GPT-4o can act as an AI assistants that engage in human-like conversations and adjust their outputs based on human emotions. Now this functionality can have serious impacts on user’s sensibility. AI assistants can simulate empathy and understanding, but they lack genuine emotions. Relying on AI for emotional support might lead to a disconnect from authentic human emotions. People may become accustomed to superficial interactions, affecting their ability to empathize with others. (According to a study published in Scientific Report)

Although, OpenAI states that they have taken necessary safety interventions and claims that GPT-4o has already undergone extensive external scrutiny (OpenAI Red Teaming Network) with more than 70 external experts from various domains such as social psychology, bias and fairness, and misinformation to identify risks that may be introduced or amplified by the newly added modalities in its GPT model. While OpenAI and other developers are working to mitigate these issues, it’s important for us as responsible users to approach GPT-4o outputs with a critical eye and harness the benefits of this innovation for public good.

References

Hello GPT-4o, https://openai.com/index/hello-gpt-4o/
GPT-4o’s Chinese token-training data is polluted by spam and porn websites, https://www.technologyreview.com/2024/05/17/1092649/gpt-4o-chinese-token-polluted/
Artificial intelligence in communication impacts language and social relationships, https://www.nature.com/articles/s41598-023-30938-9
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4, https://arxiv.org/pdf/2305.00118