
Unified Deep Semantic Search on Code
Author(s) -
Ashwin Patil,
S. T. Pachpute,
Rushika Bhattad,
Archit Pandit,
Anita Gunjal
Publication year - 2020
Publication title -
international journal of engineering and advanced technology
Language(s) - English
Resource type - Journals
ISSN - 2249-8958
DOI - 10.35940/ijeat.e9861.069520
Subject(s) - computer science , snippet , source code , code (set theory) , programming language , information retrieval , semantic search , artificial intelligence , natural language processing , search engine , set (abstract data type)
A tool that can search over large code corpus directly and list ranked snippets can prove to be an invaluable resource to programmers looking for similar code snippets using natural language queries. It must have a deep understanding of the semantics of source code and queries to evaluate their intent correctly. Over the years, many tools that rely on the textual similarity between source code and query have proven to be ineffective as they fail to learn the high- level semantic understanding of source code and query. While the previous models for code search using deep neural networks do a good job but, most of them only evaluate their models on only a single programming language, mostly Java. In this paper, we propose a novel deep neural network model called Unified Code Net that can handle the intricacies of different programming languages. This model borrows several vital features from different previous models and builds on top of those ideas to make a unified model that can generate document vector embeddings from source code, and using similarity search with the query vector embedding can return the most similar code snippets in any language. This tool can drastically reduce the programmer’s efforts to look for an efficient and viable code snippet for problem at hand which ideally can replace use of search engines for the same