Cathay Pacific Haitong: GPT-5.2 series redefines AI productivity and drives AI from model competition to scenario implementation

Zhitongcaijing · 2d ago

The Zhitong Finance App learned that Cathay Pacific Haitong released a research report saying that the release of the GPT-5.2 series marks the transition of large model capabilities from technical demonstration to a new stage of large-scale economic production. It has reached the level of human experts in abstract reasoning and complex knowledge work, confirming the potential of AI to create economic value in high-end professional fields. This will accelerate the shift of the focus of industry competition from the underlying model to implementation processes such as specific scenario applications, enterprise services, and human-robot collaborative workflows.

Cathay Pacific Haitong's main views are as follows:

GPT-5.2 achieved a historic leap forward in core reasoning and professional work tasks, reaching the level of a human expert for the first time in a comprehensive assessment

On December 12, on the occasion of the 10th anniversary, OpenAI officially released the GPT-5.2 series model. The series includes three versions: Instant, Thinking, and Pro, designed to meet the needs of tasks of varying complexity. In the ARC-AGI-2 test, known as the “Turing Test for the AI World,” it obtained a score of 52.9%, which is nearly three times higher than the 17.6% of GPT-5.1, and its abstract reasoning ability is on par with the recently released Gemini 3. What is more groundbreaking is its performance in the GDPval benchmark. The test covered 44 real career scenarios. GPT-5.2 Thinking outperformed or equalled industry experts in 70.9% of tasks, and GPT-5.2 Pro reached 74.1%. This is the first time that the AI model has reached the top level of humans overall in comprehensive knowledge work assessments. In professional tasks such as financial modeling for investment banks, the average score increased from 59.1% to 68.4%, marking the beginning of AI's deep penetration into core productivity links.

GPT-5.2 simultaneously made significant advances in code generation, long context, and visual understanding, providing reliable support for complex multi-modal tasks

In the SWeBench Pro evaluation, which is closer to the real engineering environment, GPT-5.2 Thinking achieved 55.6% SOTA results and showed greater potential in front-end and 3D interface generation. Its long context processing capability achieved a qualitative leap. The accuracy rate was close to 100% in the 256K token-long “multi-pin search” test, while GPT-5.1 was only 30%, making it possible to analyze extremely long documents and complex projects in depth. On the visual side, the error rate of scientific chart questioning (CharXiv Q&A) and GUI interface understanding (ScreenSpot-Pro) was reduced by nearly half compared to previous generations, and spatial positioning capabilities were significantly enhanced, laying a solid foundation for AI agents to process real-world information.

GPT-5.2's tool call reliability has been greatly improved, and security and deployment strategies have been optimized for enterprise-level applications

GPT-5.2 achieved a high score of 98.7% in multiple rounds of complex tool call tests (TAU2-bench). It can independently plan and complete customer service processes involving multiple steps such as reservation changes and compensation, showing strong end-to-end task execution capabilities. At the same time, OpenAI continued its iterative deployment strategy, providing GPT-5.2 series (Instant, Thinking, Pro) to paying users in ChatGPT, and retaining GPT-5.1 for up to three months to ensure a smooth transition. Although the price of the API has increased by about 40%, the government emphasized that improving the efficiency of its token can make the total cost manageable. The age prediction and content protection mechanisms in continuous testing also reflect continued investment in security.

Risk warning: Large models are iterating faster than expected, the supply of computing power is insufficient, and data privacy compliance risks.