Ön Eğitimli Dil Modellerinin Kokan Kod Sınıflama Performansının Üçlü Kayıp Yöntemiyle İyileştirilmesi

İslamoğlu, Ertuğrul

dc.contributor.advisor	Nizam, Ali
dc.contributor.author	İslamoğlu, Ertuğrul
dc.date.accessioned	2025-08-28T08:30:08Z
dc.date.available	2025-08-28T08:30:08Z
dc.date.issued	2024	en_US
dc.identifier.citation	İSLAMOĞLU, Ertuğrul, Ön Eğitimli Dil Modellerinin Kokan Kod Sınıflama Performansının Üçlü Kayıp Yöntemiyle İyileştirilmesi, Fatih Sultan Mehmet Vakıf Üniversitesi Lisansüstü Eğitim Enstitüsü Bilgisayar Mühendisliği Anabilim Dalı Bilgisayar Mühendisliği Programı, Yayımlanmamış Yüksek Lisans Tezi, İstanbul 2024.	en_US
dc.identifier.uri	https://hdl.handle.net/11352/5373
dc.description.abstract	Tez çalışmasının amacı, geliştiricilerin kodda yaptıkları değişiklikleri ve kötü kokan kod veya kısaca kötü kod (code smell) tespit araçlarının çıktılarını derin öğrenme sistemleriyle analiz ederek bir kötü kod tespit sistemi oluşturmaktır. Analizin derin öğrenme teknikleriyle yapılmasıyla kötü kod tespitinde semantik anlamın dikkate alınması, doğruluk ve performansın arttırılması hedeflenmektedir. Bu hedefe yönelik olarak derin öğrenme alanında kullanılan ön eğitimli dil modellerin, üçlü kayıp tekniğiyle iyileştirilerek kokan kod sınıflama performansının arttırılmasına yönelik bir çalışma yapılmıştır. Günümüzde büyüyen ve çeşitlenen kod miktarı, kod analizi işlemlerinde ve kod yönetiminde zorluklar meydana getirmiştir. Bununla birlikte İnternetteki büyük miktarda açık kaynak kod içeren GitHub gibi depolar kod analizinde önemli fırsatlar sunmaktadır. Kod yerleştirmesi (code embedding), kodun semantik anlamını vektörel biçimde saklar. Mevcut kod yerleştirmesi yöntemleri, kaynak kod analizinde çeşitli yazılım mühendisliği görevleri için başarıyla kullanılmış olsa da statik kod analiz araçlarının performansını ve işlevselliğini elde edebilmek için ilave çalışmalara ihtiyaç vardır. Ayrıca, kod yerleştirme modelleri kullanan sistemlerde performansın arttırılması için, görüntü işleme alanında olduğu gibi, yerleştirmeleri iyileştirme için kullanılan ön işleme yöntemlerinin standartlaştırılmasına ihtiyaç vardır. Bu çalışma, toplanan kötü kodları sınıflandırmak için bir modelin geliştirilmesini ve bu modelin performansını iyileştirmek amacıyla karşılaştırmalı (contrastive) öğrenmenin kod yerleştirmelerine uygulanmasını kapsamaktadır. Kod kokusu tespiti gibi kod sınıflandırma görevleri için sınıf içi benzerliği güçlendirmek ve farklı sınıflar arasındaki mesafeyi artırmak amacıyla üçlü kayıp tabanlı bir ağ kullanılmaktadır. GitHub'daki açık kaynaklı proje depolarından kod toplanarak deneysel bir veri kümesi oluşturulmuştur. Çalışmada, yaygın olarak kullanılan, önceden eğitilmiş; BERT, CodeBERT ve GraphCodeBERT dil modellerinin ürettiği kod yerleştirmeleri ve karşılaştırmalı öğrenme ile iyileştirilmiş ve bu iyileştirmenin sınıflamaya etkisi değerlendirilmiştir. Bulgular, ön eğitimli modeller ile oluşturulan yerleştirmelerinin doğrudan kullanımı ile %80-89 arasında bir doğruluk oranı elde edildiğini göstermiştir. Bu doğruluk oranı karşılaştırmalı öğrenme kullanımı ile %7-19 arasında iyileştirilmiştir. Bu sonuçlar, karşılaştırmalı öğrenmenin bir ön işleme adımı olarak önceden eğitilmiş kod yerleştirmeler yaklaşımları için avantajlar sunabileceğini göstermektedir. Sonuç olarak, karşılaştırmalı öğrenme tekniklerinin kod yerleştirme vektörü oluşturma sürecine dahil edilmesi, kod analizinde performans iyileştirmesi için fırsatlar sağlayabilir.	en_US
dc.description.abstract	The aim of this thesis is to create a bad code or smelly code detection system by analyzing the code changes made by developers and the outputs of code smell detection tools with deep learning systems. By using deep learning techniques, it is aimed to take semantic meaning into account in bad code detection and to increase accuracy and performance. Towards this goal, a study has been conducted to improve the performance of the pre-trained language models used in deep learning by using the triple loss technique to improve the performance of bad code classification. Today, the growing and diversifying amount of code has created difficulties in code analysis and code management. However, repositories such as GitHub, which contain large amounts of open source code on the Internet, offer significant opportunities in code analysis. Code embedding stores the semantic meaning of code in vector form. While existing code embedding methods have been successfully used in source code analysis for various software engineering tasks, additional work is needed to achieve the performance and functionality of static code analysis tools. Furthermore, to improve the performance of systems using code embedding models, there is a need to standardize the preprocessing methods used to refine embeddings, as in the field of image processing. This thesis presents the development of a model for classifying collected smelly codes and the application of contrastive learning to code embeddings to improve the performance of this model. For code classification tasks such as code odor detection, a triple loss-based network is used to strengthen the intra-class similarity and increase the distance between different classes. An experimental dataset was created by collecting code from open-source project repositories on GitHub. In the study, the widely used, pre-trained BERT, CodeBERT and GraphCodeBERT language models are improved with code embeddings and benchmark learning, and the effect of this improvement on classification is evaluated. The results show that the direct use of the embeddings generated by the pre-trained models yields an accuracy rate between 80-89%. This accuracy was improved by 7-19% with the use of contrastive learning. These results show that contrastive learning can offer advantages for pre-trained code embeddings approaches as a pre-processing step. Consequently, the incorporation of contrastive learning techniques into the code placement vector generation process can provide opportunities for performance improvement in code analysis.	en_US
dc.language.iso	tur	en_US
dc.publisher	Fatih Sultan Mehmet Vakıf Üniversitesi	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Derin Öğrenme	en_US
dc.subject	Kötü Kod	en_US
dc.subject	Üçlü Kayıp	en_US
dc.subject	Karşılaştırmalı Öğrenme	en_US
dc.subject	Deep Learning	en_US
dc.subject	Code Smells	en_US
dc.subject	Triplet Loss	en_US
dc.subject	Contrastive Learning	en_US
dc.title	Ön Eğitimli Dil Modellerinin Kokan Kod Sınıflama Performansının Üçlü Kayıp Yöntemiyle İyileştirilmesi	en_US
dc.title.alternative	Optimizing the Code Smell Classification Performance of Pretrained Language Models Using the Triple Loss Method	en_US
dc.type	masterThesis	en_US
dc.contributor.department	FSM Vakıf Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Ana Bilim Dalı	en_US
dc.relation.publicationcategory	Tez	en_US
dc.contributor.institutionauthor	İslamoğlu, Ertuğrul

Bu öğenin dosyaları:

Ad:: İslamoğlu.pdf
Boyut:: 1.959Mb
Biçim:: PDF
Açıklama:: Yüksek Lisans Tezi

Göster/Aç

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Lisansüstü Eğitim Enstitüsü / Institute of Postgraduate Education [1093]
Lisansüstü Eğitim Enstitüsü'ne ait yayınları içerir.

Basit öğe kaydını göster