Year
Month
(Preprint) CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model
Xin Wang ¹, Yasheng Wang ², Pingyi Zhou ², Meng Xiao ², Yadao Wang ², Li Li ³, Xiao Liu ⁴, Hao Wu 武浩 ⁵, Jin Liu 刘进 ¹, Xin Jiang ²
¹ School of Computer Science, Wuhan University 武汉大学 计算机学院
² Noah's Ark Lab, Huawei 华为 诺亚方舟实验室
³ Faculty of Information Technology, Monash University
⁴ School of Information Technology, Deakin University
⁵ School of Information Science and Engineering, Yunnan University 云南大学 信息学院
arXiv, 2021-08-10
Abstract

Pre-trained models for programming languages have proven their significant values in various code-related tasks, such as code search, code clone detection, and code translation. Currently, most pre-trained models treat a code snippet as a sequence of tokens or only focus on the data flow between code identifiers.

However, rich code syntax and hierarchy are ignored which can provide important structure information and semantic rules of codes to help enhance code representations. In addition, although the BERT-based code pre-trained models achieve high performance on many downstream tasks, the native derived sequence representations of BERT are proven to be of low-quality, it performs poorly on code matching and similarity tasks.

To address these problems, we propose CLSEBERT, a Constrastive Learning Framework for Syntax Enhanced Code Pre-Trained Model, to deal with various code intelligence tasks. In the pre-training stage, we consider the code syntax and hierarchy contained in the Abstract Syntax Tree (AST) and leverage the constrastive learning to learn noise-invariant code representations. Besides the masked language modeling (MLM), we also introduce two novel pre-training objectives. One is to predict the edges between nodes in the abstract syntax tree, and the other is to predict the types of code tokens. Through extensive experiments on four code intelligence tasks, we successfully show the effectiveness of our proposed model.
CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model_1
CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model_2
CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model_3
CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model_4
  • Harmonic heterostructured pure Ti fabricated by laser powder bed fusion for excellent wear resistance via strength-plasticity synergy
  • Desheng Li, Huanrong Xie, Chengde Gao, Huan Jiang, Liyuan Wang, Cijun Shuai
  • Opto-Electronic Advances
  • 2025-09-25
  • Strong-confinement low-index-rib-loaded waveguide structure for etchless thin-film integrated photonics
  • Yifan Qi, Gongcheng Yue, Ting Hao, Yang Li
  • Opto-Electronic Advances
  • 2025-09-25
  • Flicker minimization in power-saving displays enabled by measurement of difference in flexoelectric coefficients and displacement-current in positive dielectric anisotropy liquid crystals
  • Junho Jung, HaYoung Jung, GyuRi Choi, HanByeol Park, Sun-Mi Park, Ki-Sun Kwon, Heui-Seok Jin, Dong-Jin Lee, Hoon Jeong, JeongKi Park, Byeong Koo Kim, Seung Hee Lee, MinSu Kim
  • Opto-Electronic Advances
  • 2025-09-25
  • Dual-frequency angular-multiplexed fringe projection profilometry with deep learning: breaking hardware limits for ultra-high-speed 3D imaging
  • Wenwu Chen, Yifan Liu, Shijie Feng, Wei Yin, Jiaming Qian, Yixuan Li, Hang Zhang, Maciej Trusiak, Malgorzata Kujawinska, Qian Chen, Chao Zuo
  • Opto-Electronic Advances
  • 2025-09-25
  • Phase matching sampling algorithm for sampling rate reduction in time division multiplexing optical fiber sensor system
  • Junhui Wu, Zhilin Xu, Yi Shi, Yurong Liang, Qizhen Sun
  • Opto-Electronic Technology
  • 2025-09-18
  • Three-dimensional integrated optical fiber devices: emergence and applications
  • Tingting Yuan, Xiaotong Zhang, Shitai Yang, Donghui Wang, Libo Yuan
  • Opto-Electronic Technology
  • 2025-09-18
  • Femtosecond laser micro/nano-processing via multiple pulses incubation
  • Jingbo Yin, Zhenyuan Lin, Lingfei Ji, Minghui Hong
  • Opto-Electronic Technology
  • 2025-09-18
  • All-optical digital logic and neuromorphic computing based on multi-wavelength auxiliary and competition in a single microring resonator
  • Qiang Zhang, Yingjun Fang, Ning Jiang, Anran Li, Jiahao Qian, Yiqun Zhang, Gang Hu, Kun Qiu
  • Opto-Electronic Science
  • 2025-08-28
  • Fast step heterodyne light-induced thermoelastic spectroscopy gas sensing based on a quartz tuning fork with high-frequency of 100 kHz
  • Yuanzhi Wang Ying He, Shunda Qiao, Xiaonan Liu, Chu Zhan, Xiaoming Duan, Yufei Ma
  • Opto-Electronic Advances
  • 2025-08-28
  • Advances and new perspectives of optical systems and technologies for aerospace applications: a comprehensive review
  • Sandro Oliveira, Jan Nedoma, Radek Martinek, Carlos Marques
  • Opto-Electronic Advances
  • 2025-08-25
  • Dynamic spatial beam shaping for ultrafast laser processing: a review
  • Cyril Mauclair, Bahia Najih, Vincent Comte, Florent Bourquard, Martin Delaigue
  • Opto-Electronic Science
  • 2025-08-25
  • Aberration-corrected differential phase contrast microscopy with annular illuminations
  • Yao Fan, Chenyue Zheng, Yefeng Shu, Qingyang Fu, Lixiang Xiong, Guifeng Lu, Jiasong Sun, Chao Zuo, Qian Chen
  • Opto-Electronic Science
  • 2025-08-25



  • Grassland: A Rapid Algebraic Modeling System for Million-variable Optimization                                China's Technology Cooperation with Russia: Geopolitics, Economics, and Regime Security
    About
    |
    Contact
    |
    Copyright © PubCard