OpenAI, the creator of ChatGPT has finally revealed GPT-4, capable of accepting text or image inputs.

After months of rumors and speculation, OpenAI has announced GPT-4: the latest in its line of AI language models that power applications like ChatGPT and the new Bing.
The company claims the model is “more creative and collaborative than ever before,” and “can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem solving abilities.”
OpenAI says it’s already partnered with a number of companies to integrate GPT-4 into their products, including Duolingo, Stripe, and Khan Academy. The new model will also be available on ChatGPT Plus and as an API. (There is a waitlist for access here.)
In a research blog post, OpenAI said the distinction between GPT-4 and its predecessor GPT-3.5 is “subtle” in casual conversation (GPT-3.5 is the model that powers ChatGPT), but that the differences between the systems are clear when faced with more complex tasks.
The company says these improvements can be seen on GPT-4’s performance on a number of tests and benchmarks, including the Uniform Bar Exam, LSAT, SAT Math and SAT Evidence-Based Reading & Writing exams. In the exams mentioned GPT-4 scored in the 88th percentile and above, with a full list of exams and scores seen here.
Speculation about GPT-4 and its capabilities have been rife over the past year, with many suggesting it would be a huge leap over previous systems. “People are begging to be disappointed and they will be,” said OpenAI CEO Sam Altman in an interview in January. “The hype is just like… We don’t have an actual AGI and that’s sort of what’s expected of us.”
The rumor mill was further energized last week after a Microsoft executive let slip that the system would launch this week in an interview with the German press. The executive also suggested the system would be multi-modal — that is, able to generate not only text but other mediums. Many AI researchers believe that multi-modal systems that integrate text, audio, and video offer the best path towards building increasingly intelligent AI systems.
GPT-4 is indeed multimodal, but in fewer mediums than some predicted. OpenAI says the system can accept both text and image inputs and emit text outputs. OpenAI says the ability to parse text and image simultaneously allows it interpret more complex input. In the samples below you can see the system explaining memes and unusual images:
It’s been a long journey to get to GPT-4, with OpenAI — and AI language models in general — building momentum slowly over several years before rocketing into the mainstream in recent months.
The original research paper describing GPT was published in 2018, with GPT-2 announced in 2019, and GPT-3 in 2020. These models are trained on huge datasets of text, much of it scraped from the internet, which is mined for statistical patterns. These patterns are then used to predict what word follows another. It’s a relatively simple mechanism to describe, but the end result is flexible systems that can generate, summarize, and rephrase writing, as well as perform other text-based tasks like translation or generating code.
OpenAI originally delayed the release of its GPT models for fear they would be used for malicious purposes like generating spam and misinformation. But in late 2022, the company launched ChatGPT — a conversational chatbot based on GPT-3.5 that anyone could access. ChatGPT’s launch triggered a frenzy in the tech world, with Microsoft soon following it with its own AI chatbot Bing (part of the Bing search engine) and Google scrambling to catch up.
As predicted, the wider availability of these AI language models has created problems and challenges. The education system is still adapting to the existence of software that writes respectable college essays; online sites like Stack Overflow and sci-fi magazine Clarkesworld have had to close submissions due to an influx of AI-generated content; and early uses of AI writing tools in journalism have been rocky at best. But, some experts have argued that the harmful effects have still been less than anticipated.
Developing…