IIIT Hyderabad Publications |
|||||||||
|
Unsupervised Learning Based Approach for Plagiarism Detection in Programming AssignmentsAuthors: Jitendra Yasaswi Bharadwaj katta,Srikailash G,Anil Chilupuri,Suresh Purini,C V Jawahar Conference: Innovations in Software Engineering Conference, ISEC Date: 2017-02-05 Report no: IIIT/TR/2017/9 AbstractIn this work, we propose a novel hybrid approach for automatic plagiarism detection in programming assignments.Most of the well known plagiarism detectors either employ a text-based approach or use features based on the property of the program at a syntactic level. However, both these approaches succumb to code obfuscation which is a huge obstacle for automatic software plagiarism detection. Our proposed method uses static features extracted from the intermediate representation of a program in a compiler infrastructure such as gcc . We demonstrate the use of unsupervised learning techniques on the extracted feature representations and show that our system is robust to code obfuscation. We test our method on assignments from introductory programming course. The preliminary results show that our system is better when compared to other popular tools like MOSS. For visualizing the local and global structure of the features, we obtained the low-dimensional representations of our features using a popular technique called t-SNE,a variation of Stochastic Neighbor Embedding, which can preserve neighborhood identity in low-dimensions. Based on this idea of preserving neighborhood identity, we mine interesting information such as the diversity in student solution approaches to a given problem. The presence of well defined clusters in low-dimensional visualizations demonstrate that our features are capable of capturing interesting programming patterns. Full paper: pdf Centre for Software Engineering Research Lab |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |