Year
Month
(Preprint) From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang 王裕鑫 ¹, Hongtao Xie 谢洪涛 ¹, Shancheng Fang ¹, Jing Wang ², Shenggao Zhu ², Yongdong Zhang 张勇东 ¹
¹ University of Science and Technology of China
中国科技大学
² Huawei Cloud & AI
华为云人工智能
arXiv, 2021-08-22
Abstract

In this paper, we abandon the dominant complex language model and rethink the linguistic learning process in the scene text recognition. Different from previous methods considering the visual and linguistic information in two separate structures, we propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union by directly enduing the vision model with language capability. Specially, we introduce the text recognition of character-wise occluded feature maps in the training stage. Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e.g. occlusion, noise, etc.).

As the linguistic information is acquired along with visual features without the need of extra language model, VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition. Furthermore, an Occlusion Scene Text (OST) dataset is proposed to evaluate the performance on the case of missing character-wise visual cues. The state of-the-art results on several benchmarks prove our effectiveness.
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_1
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_2
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_3
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_4
  • Multi-physical field null medium: new solutions for the simultaneous control of EM waves and heat flow
  • Sailing He, Ruili Zhang, Junbo Liang
  • Opto-Electronic Advances
  • 2024-09-30
  • Adaptive decentralized AI scheme for signal recognition of distributed sensor systems
  • Shixiong Zhang, Hao Li, Cunzheng Fan, Zhichao Zeng, Chao Xiong, Jie Wu, Zhijun Yan, Deming Liu, Qizhen Sun
  • Opto-Electronic Advances
  • 2024-09-29
  • Data-driven polarimetric approaches fuel computational imaging expansion
  • Sylvain Gigan
  • Opto-Electronic Advances
  • 2024-09-28
  • An externally perceivable smart leaky-wave antenna based on spoof surface plasmon polaritons
  • Weihan Li, Jia Chen, Shizhao Gao, Lingyun Niu, Jiaxuan Wei, Ruosong Sun, Yaqi Wei, Wenxuan Tang, Tie Jun Cui
  • Opto-Electronic Advances
  • 2024-09-25
  • The possibilities of using a mixture of PDMS and phosphor in a wide range of industry applications
  • Rodrigo Rendeiro, Jan Jargus, Jan Nedoma, Radek Martinek, Carlos Marques
  • Opto-Electronic Advances
  • 2024-09-20
  • Agile cavity ringdown spectroscopy enabled by moderate optical feedback to a quantum cascade laser
  • Qinxue Nie, Yibo Peng, Qiheng Chen, Ningwu Liu, Zhen Wang, Cheng Wang, Wei Ren
  • Opto-Electronic Advances
  • 2024-09-20
  • Genetic algorithm assisted meta-atom design for high-performance metasurface optics
  • Zhenjie Yu, Moxin Li, Zhenyu Xing, Hao Gao, Zeyang Liu, Shiliang Pu, Hui Mao, Hong Cai, Qiang Ma, Wenqi Ren, Jiang Zhu, Cheng Zhang
  • Opto-Electronic Science
  • 2024-09-20
  • Finely regulated luminescent Ag-In-Ga-S quantum dots with green-red dual emission toward white light-emitting diodes
  • Zhi Wu, Leimeng Xu, Jindi Wang, Jizhong Song
  • Opto-Electronic Advances
  • 2024-09-18
  • Vortex-field enhancement through high-threshold geometric metasurface
  • Qingsong Wang, Yao Fang, Yu Meng, Han Hao, Xiong Li, Mingbo Pu, Xiaoliang Ma, Xiangang Luo
  • Opto-Electronic Advances
  • 2024-09-10
  • Cascaded metasurfaces enabling adaptive aberration corrections for focus scanning
  • Xiaotong Li, Xiaodong Cai, Chang Liu, Yeseul Kim, Trevon Badloe, Huanhuan Liu, Junsuk Rho, Shiyi Xiao
  • Opto-Electronic Advances
  • 2024-09-06
  • Functionality multiplexing in high-efficiency metasurfaces based on coherent wave interferences
  • Yuejiao Zhou, Tong Liu, Changhong Dai, Dongyi Wang, Lei Zhou
  • Opto-Electronic Advances
  • 2024-09-03
  • Physics and applications of terahertz metagratings
  • Shreeya Rane, Shriganesh Prabhu, Dibakar Roy Chowdhury
  • Opto-Electronic Science
  • 2024-09-03



  • NetGraph: An Intelligent Operated Digital Twin Platform for Data Center Networks                                IGNNITION: fast prototyping of graph neural networks for communication networks
    About
    |
    Contact
    |
    Copyright © PubCard