On December 5, Volcano Engine officially released Doubao speech recognition model 2.0, built based on Seed's hybrid expert big language model architecture. According to the official report, the 2.0 version model's reasoning ability has been improved, accurate identification can be achieved through deep understanding of the context, and the overall contextual keyword recall rate has increased by 20%. It supports multi-modal visual recognition. It can be understood while understanding, and the accuracy of text recognition can be improved by inputting visual information such as single images and multiple images. Supports recognition of 13 overseas languages, including Japanese, Korean, German, and French. It also focuses on upgrading complex scenarios such as proper nouns, personal names, place names, brand names, and confusing polyphonic characters.

Zhitongcaijing · 12/05/2025 08:01
On December 5, Volcano Engine officially released Doubao speech recognition model 2.0, built based on Seed's hybrid expert big language model architecture. According to the official report, the 2.0 version model's reasoning ability has been improved, accurate identification can be achieved through deep understanding of the context, and the overall contextual keyword recall rate has increased by 20%. It supports multi-modal visual recognition. It can be understood while understanding, and the accuracy of text recognition can be improved by inputting visual information such as single images and multiple images. Supports recognition of 13 overseas languages, including Japanese, Korean, German, and French. It also focuses on upgrading complex scenarios such as proper nouns, personal names, place names, brand names, and confusing polyphonic characters.