Year
Month
(Preprint) From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang 王裕鑫 ¹, Hongtao Xie 谢洪涛 ¹, Shancheng Fang ¹, Jing Wang ², Shenggao Zhu ², Yongdong Zhang 张勇东 ¹
¹ University of Science and Technology of China
中国科技大学
² Huawei Cloud & AI
华为云人工智能
arXiv, 2021-08-22
Abstract

In this paper, we abandon the dominant complex language model and rethink the linguistic learning process in the scene text recognition. Different from previous methods considering the visual and linguistic information in two separate structures, we propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union by directly enduing the vision model with language capability. Specially, we introduce the text recognition of character-wise occluded feature maps in the training stage. Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e.g. occlusion, noise, etc.).

As the linguistic information is acquired along with visual features without the need of extra language model, VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition. Furthermore, an Occlusion Scene Text (OST) dataset is proposed to evaluate the performance on the case of missing character-wise visual cues. The state of-the-art results on several benchmarks prove our effectiveness.
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_1
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_2
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_3
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network_4
  • Advanced biological imaging techniques based on metasurfaces
  • Yongjae Jo, Hyemi Park, Hyeyoung Yoon, Inki Kim
  • Opto-Electronic Advances
  • 2024-10-31
  • Orthogonal matrix of polarization combinations: concept and application to multichannel holographic recording
  • Shujun Zheng, Jiaren Tan, Hongjie Liu, Xiao Lin, Yusuke Saita, Takanori Nomura, Xiaodi Tan
  • Opto-Electronic Advances
  • 2024-10-23
  • Streamlined photonic reservoir computer with augmented memory capabilities
  • Changdi Zhou, Yu Huang, Yigong Yang, Deyu Cai, Pei Zhou, Kuenyao Lau, Nianqiang Li, Xiaofeng Li
  • Opto-Electronic Advances
  • 2024-10-22
  • High-precision multi-focus laser sculpting of microstructured glass
  • Kang Xu, Peilin Huang, Lingyu Huang, Li Yao, Zongyao Li, Jiantao Chen, Li Zhang, Shaolin Xu
  • Opto-Electronic Advances
  • 2024-10-09
  • Multi-physical field null medium: new solutions for the simultaneous control of EM waves and heat flow
  • Sailing He, Ruili Zhang, Junbo Liang
  • Opto-Electronic Advances
  • 2024-09-30
  • Adaptive decentralized AI scheme for signal recognition of distributed sensor systems
  • Shixiong Zhang, Hao Li, Cunzheng Fan, Zhichao Zeng, Chao Xiong, Jie Wu, Zhijun Yan, Deming Liu, Qizhen Sun
  • Opto-Electronic Advances
  • 2024-09-29
  • Data-driven polarimetric approaches fuel computational imaging expansion
  • Sylvain Gigan
  • Opto-Electronic Advances
  • 2024-09-28
  • An externally perceivable smart leaky-wave antenna based on spoof surface plasmon polaritons
  • Weihan Li, Jia Chen, Shizhao Gao, Lingyun Niu, Jiaxuan Wei, Ruosong Sun, Yaqi Wei, Wenxuan Tang, Tie Jun Cui
  • Opto-Electronic Advances
  • 2024-09-25
  • The possibilities of using a mixture of PDMS and phosphor in a wide range of industry applications
  • Rodrigo Rendeiro, Jan Jargus, Jan Nedoma, Radek Martinek, Carlos Marques
  • Opto-Electronic Advances
  • 2024-09-20
  • Agile cavity ringdown spectroscopy enabled by moderate optical feedback to a quantum cascade laser
  • Qinxue Nie, Yibo Peng, Qiheng Chen, Ningwu Liu, Zhen Wang, Cheng Wang, Wei Ren
  • Opto-Electronic Advances
  • 2024-09-20
  • Genetic algorithm assisted meta-atom design for high-performance metasurface optics
  • Zhenjie Yu, Moxin Li, Zhenyu Xing, Hao Gao, Zeyang Liu, Shiliang Pu, Hui Mao, Hong Cai, Qiang Ma, Wenqi Ren, Jiang Zhu, Cheng Zhang
  • Opto-Electronic Science
  • 2024-09-20
  • Finely regulated luminescent Ag-In-Ga-S quantum dots with green-red dual emission toward white light-emitting diodes
  • Zhi Wu, Leimeng Xu, Jindi Wang, Jizhong Song
  • Opto-Electronic Advances
  • 2024-09-18



  • NetGraph: An Intelligent Operated Digital Twin Platform for Data Center Networks                                IGNNITION: fast prototyping of graph neural networks for communication networks
    About
    |
    Contact
    |
    Copyright © PubCard