Analysis of Code Similarity with Triplet Loss-Based Deep Learning System

Abdellatif, Abdelrahman Taha Abdeltawab; İslamoğlu, Ertuğrul; Nizam, Ali

doi:10.1007/978-3-031-70924-1_26

Analysis of Code Similarity with Triplet Loss-Based Deep Learning System

Dosyalar

Abdellatif.pdf (1.38 MB)

Tarih

2024

Yazarlar

Abdellatif, Abdelrahman Taha Abdeltawab

İslamoğlu, Ertuğrul

Nizam, Ali

Yayıncı

Springer

Erişim Hakkı

info:eu-repo/semantics/embargoedAccess

Özet

Nowadays, several plagiarism detection tools based on static code features are available for code similarity detection. The application of deep learning in this domain represents an emerging area of research. This research proposes an innovative deep learning system based on triplet loss for detecting code similarity. Our training approach involves generating embeddings for pairs of code snippets to increase the detection accuracy. The system uses a tokenization and embedding mechanism specifically tailored for Java code snippets using CodeBERT, a pre-trained model that combines programming language and natural language processing. After the learning phase, we employed transfer learning with a classifier to detect code similarity. The effectiveness of the proposed system is evaluated by a reduction in loss values and an improvement in accuracy compared to models without the integration of triplet loss. The results indicate that our model can identify code similarities and distinguish between snippets with high accuracy, improving the capability of code similarity detection, clone detection, and source code analysis.

Anahtar Kelimeler

Deep Learning, Code Embedding, Code Similarity Analysis, Contrastive Learning, Triplet Loss

Kaynak

Recent Trends and Advances in Artificial Intelligence

Scopus Q Değeri

Q4

Cilt

1138

Künye

ABDELLATİF, Abdelrahman Taha Abdeltawab, Ertuğrul İSLAMOĞLU & Ali NİZAM. "Analysis of Code Similarity with Triplet Loss-Based Deep Learning System". Recent Trends and Advances in Artificial Intelligence, 1138 (2024): 351-361.

Bağlantı

https://hdl.handle.net/11352/5121

Koleksiyon

Bilgisayar Mühendisliği Bölümü
Scopus İndeksli Yayınlar Koleksiyonu
Yazılım Mühendisliği

Detaylı Öğe Kaydı

Analysis of Code Similarity with Triplet Loss-Based Deep Learning System

Dosyalar

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon

Onay

İnceleme

Ekleyen

Referans Veren