GPT-5.5 Launch: What’s New in OpenAI’s Latest AI Model

Key takeaways⭐

GPT-5.5 solves complex tasks autonomously with high quality and speed.
The model is more expensive than GPT-5.4 but offers better tokenization efficiency.
Leads in Terminal-Bench programming but trails in SWE-Bench Pro.
GPT-5.5 can generate interactive 3D video games and handle complex spreadsheets.
Performs better than an average human in computing environments.

GPT 5.5 features: Programming Task Benchmarks

Regarding programming tasks, initial results suggest that GPT-5.5 is among the most powerful models available, although the competition remains fierce. In Terminal-Bench 2.0, for example, GPT-5.5 achieves a cutting-edge score of 82.7%, clearly surpassing Claude Opus 4.7, which scores 69.4%.

The picture becomes somewhat more complex when analyzing real-world issue resolution on GitHub.

On SWE-Bench Pro, GPT-5.5 delivers a strong 58.6%, but Claude Opus 4.7 remains in front with 64.3%. So, while GPT-5.5 looks extremely capable in coding environments, the benchmark results also show that leadership can vary depending on the type of programming challenge being tested.

Everything uses the same number of tokens (or fewer) from GPT-5.4 to achieve the same results.

Engineering that tested GPT 5.5 during the beta test affirms that this model is better at understanding the system architecture and correcting the problem early to avoid problems in the next steps and next tasks.

📌 To summarize:

In Terminal-Bench 2.0 GPT 5.5 reached 82.7% (this is a test that checks a complex workflow in the command line);
In SWE-Bench Pro, this model reached 58.6% (this tests the capacity of resolution of real problems in GitHub); however, in the same test, Opus 4.7 reached 64.3%.
In Expert-SWE, GPT 5.5 showed better performance than GPT-5.4 on a task with a human resolution estimate of 20 hours.

More capabilities:

Financial Modeling: You can navigate and reason through very large and complex spreadsheets, updating cells without breaking the documents.

3D Video Games: It is capable of generating video games with interactive 3D graphics and advanced details.

GPT 5.5 Is Best for Programming, but Perfect for Other Tasks As Well

The new GPT-5.5 is capable of searching information, working with tools, checking results, and understanding what is relevant for the task that you assign to it.

In Codex, it includes a "browser usage" function perfect for in-app QA; with this function, GPT-5.5 is able to interact with the keyboard and interface like a human using the app.

In the GDPval benchmark, a test that assesses the productivity of highly specific knowledge work across 44 occupations, GPT-5.5 scores 84.9%.

Meanwhile, in OSWorld-Verified (which measures how and whether the model can operate autonomously in computing environments), it scores 78.7% (the human baseline is fixed at 72.4%). This means that in computer environments, the model is better than the average human.

Beyond programming and office work

Beyond programming and everyday office work, OpenAI presents GPT-5.5 as a kind of “co-scientist” able to support even highly complex areas of research. It does not simply reply to ready-made questions: it can explore ideas, challenge different hypotheses, and help interpret results from a broader perspective.

One interesting example comes from mathematics. An internal version of the model reportedly contributed to finding a new proof connected to Ramsey numbers, a topic in combinatorics. Experts are also starting to see major changes in their day-to-day work.

Derya Unutmaz, a professor of immunology, used GPT-5.5 Pro to analyze a gene-expression dataset made up of 62 samples and almost 28,000 genes.

The model produced a detailed report, brought out important insights, and completed in a short time an analysis that would have taken his team months.

GPT-5.5 Model and Cybersecurity

The company considers and classifies the new GPT-5.5 as a model with high cybersecurity capabilities (using the company's internal readiness framework).

According to OpenAI, the new AI model can find vulnerabilities better than its predecessor, although, according to information shared by the company, it did not reach the "critical" level in the evaluations.

However, to compensate for this, OpenAI has deployed some stricter classifiers to detect problems and risks, which could result in greater restrictions for some users of the model while calibration is being refined.

The same model was tested by external experts in cybersecurity and biology before its release, and the company states that it will continue to adjust it as the model's capabilities increase.

More Expensive than GPT-5.4, but More Efficient

GPT-5.5 models are available from today for ChatGPT users who have a Plus, Pro, Business, or Enterprise subscription.

In Codex, the contest window is for 400,000 tokens (1M via API usage), and Fast mode is capable of generating tokens 1.5 times faster than its predecessor, but the cost is 2.5x higher.

The summary is that GPT-5.5 is more expensive than GPT-5.4, but its token efficiency is better.

API Pricing GPT-5.5

The GPT-5.5 API pricing is as follows:

Input tokens: $5 per million (twice the cost of the previous model).
Output tokens: $30 per million.

For the "PRO" model, the cost skyrockets to $30 per million tokens and $180 per million output tokens.

OpenAI Launches GPT-5.5: Faster, Smarter, and Built for Autonomous Tasks

Key takeaways⭐

GPT 5.5 features: Programming Task Benchmarks

📌 To summarize:

GPT 5.5 Is Best for Programming, but Perfect for Other Tasks As Well

Beyond programming and office work

GPT-5.5 Model and Cybersecurity

More Expensive than GPT-5.4, but More Efficient

API Pricing GPT-5.5

References:

Lorka AI Team

Related Articles

ChatGPT Explained: Features, Uses & Plans

Best ChatGPT Alternatives in 2026 (Free & Paid)

Gemini vs ChatGPT vs Claude: Who Wins in 2026?