The Zhitong Finance App learned that Cathay Pacific Haitong released a research report saying that the release of Grok 4 surpassed the existing model on July 10. Its crushing Benchmark and cross-level performance marks xAI's pioneering entry into the next generation of AI. It will encourage industry enterprises to actively explore integration with cutting-edge technology, accelerate the pace of innovation, and push the entire industry to a higher stage of development. Cloud service providers and data center operators will directly benefit from the growing demand for computing power. AI solution providers with vertical advantages and data barriers will stand out from the competition.
Cathay Pacific Haitong's main views are as follows:
Deep thinking and group decision collaboration to reconstruct the superhuman reasoning and computational paradigm
Grok-4's inference computing power has achieved a fault level breakthrough. Its pre-training calculation capacity and inference calculation ability have increased more than ten times that of the previous generation, and the training scale has reached 100 times that of Grok-2. Through 2,500 doctoral-level problem tests covering natural science, engineering, etc., Grok-4 achieved a 45% score in the Final Human Test (HLE), which is double that of Gemini2.5Pro, the most advanced AI ever. Grok-4 not only comprehensively surpassed the academic ability of human graduate students, but also set new records with perfect scores in authoritative benchmarks such as GPQA and AIME25. Among them, Grok-4 Heavy, which collaborates with multiple agents, can simultaneously combine the two abilities of deep thinking and group collaboration to correct errors, and successfully achieved a perfect score in AIME25. This non-human reasoning effect has made testing of traditional human designs meaningless. The boundary of its ability is driving the discovery of new technologies and the laws of physics, and is expected to spawn groundbreaking scientific research results within two years.
Open up the whole chain of closed loops in real scenarios and verify the execution of cross-industry decisions
In terms of its ability to solve real-world scenario problems, Grok-4 showed revolutionary progress: the voice function doubled the response speed and halved the delay, and Eve speech synthesis technology gave the conversation natural magnetism and emotional fluidity, and the user experience was significantly superior to the competition; in the vending machine management test (Vending-Bench), Grok-4 crushed the second ClaudeOpus 4 more than twice with a net worth generation value of 4694.15 to verify its long-term strategy execution; at the same time, it has already opened the 256K context API interface The biomedical field helped the ARC Institute screen millions of test data to generate research hypotheses, became the preferred tool in financial decisions, and even completed the independent development of a first-person shooter in just 4 hours, proving that it can integrate the tool chain throughout the entire process to solve complex tasks across industries.
Focus on the revolution in pixel-level video generation and build a new ecosystem of human-robot collaborative sensing
What is lacking in the US and China is that Grok-4's multi-modal capabilities are still a clear shortcoming. In particular, although progress has been made in the field of image understanding and generation, they still need to be greatly improved, and human-level audiovisual perception and interaction capabilities have not yet been achieved. The next generation of R&D will focus on breakthroughs in video generation technology to achieve a closed loop of AI video creation on the X platform through end-to-end training of “input pixels - output pixels”. Next year, it is planned to launch an automatic 3D resource generation system integrating Unreal Engine to empower the gaming and film industry.
In the short term, we will first strengthen the dedicated programming model and optimize image recognition technology. The ultimate goal is to build a superintelligence with deep thinking, real-time response, and multi-modal collaboration, and completely reshape the human-robot collaboration paradigm.
Risk warning: Increased technology competition, insufficient computing power supply, data privacy compliance risks.