Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- RepoCoder is a framework to address the challenge of repository-level code completion.
- RepoCoder utilizes a similarity-based retriever and a pre-trained code language model.
- RepoCoder uses a novel iterative retrieval-generation paradigm.
- RepoEval is a new benchmark for testing the performance of RepoCoder.
- RepoCoder significantly improves the zero-shot code completion baseline.
Paper Content
Introduction
- Awareness of other files in repository is important for software production.
- Automated tools should take into account broader context in repository for code completion.
- Interrelated dependencies, such as shared utilities, configurations, and cross-API invocations, exist in code files.
- Repositories have unique naming conventions and coding styles.
- Modularization and customization bring difficulties to automatic code completion tools.
- RepoCoder proposed to bridge gap between retrieval context and intended completion target.
- RepoEval benchmark created to evaluate repository-level code completion.
- RepoCoder significantly outperforms zero-shot code completion paradigm.
Methodology
Overall framework
- Task of code completion using language model M can be characterized as Ŷ = M(X)
- RepoCoder is a framework that integrates code generation and retrieval models
- Repository code files are partitioned into code snippets
- Retrieval model R is used to obtain relevant code snippets from C repo
- Language model M is used to perform retrieval-augmented generation
- New query is established using Ŷ 1 for retrieval
- Final prediction is obtained as Ŷ = M(C 2 ret , X)
Generation-augmented retrieval
- Retriever employed in RepoCoder framework can be any model capable of searching for relevant documents.
- Retrieval database is constructed using a sliding window which scans files and extracts contiguous lines of code.
- Query is formulated using last S w lines of unfinished code X and most similar code snippets are retrieved.
- Generate-then-retrieve paradigm proposed to mitigate issue of indiscriminately shifting all retrieved code snippets.
Retrieval-augmented generation
- Generator used in RepoCoder framework is a pre-trained language model.
- Need to incorporate global and local context for code completion.
- Goal is to find optimal method of prompt construction.
- Code snippets presented in ascending order based on similarity scores.
- Prompt contains both global and local information needed for code completion.
Benchmark construction
- Task of code completion in software repositories is common
- Proposed RepoEval benchmark to evaluate code completion tools
- Benchmark covers 3 levels of code completion granularity: line, API invocation, and function body
- Utilize unit tests to evaluate correctness of completed functions
- RepoEval is publicly available and labeled with source repository, file path, line numbers, and ground truth completion
- 8 repositories selected that meet criteria of open-source license, created after Jan 1, 2022, non-fork, more than 100 stars, >80% Python files, and explicit unit tests
- 1600 test samples for line completion, 1600 for API invocation, and 455 for function body completion
Experimental setup
Implementation details
- Evaluate two distinct retrieval methods for RepoCoder
- First method is a sparse bag-of-words model
- Second method is a dense text embedding model
- Test four pre-trained language models with varying code generation capabilities
- Carefully consider various hyper-parameters to optimize performance
Methods for comparison
- Previous studies have shown that large pre-trained language models can generate code in a zero-shot manner.
- Utilizing intra-file context is valuable for code completion scenarios.
- RepoCoder mitigates the issue of omitted context by adding it to the retrieval database.
- RepoCoder uses generation-augmented retrieval to bridge the gap between retrieval and the intended completion target.
- An oracle retrieval-augmented generation method is used to compare the efficacy of this approach.
Evaluation metrics
- Evaluation of datasets uses Exact Match (EM) and Edit Similarity (ES) metrics
- EM score is binary (1 if predicted code matches ground truth, 0 otherwise)
- ES metric is calculated using Levenshtein distance
- Unit tests used to evaluate function body completion dataset
- Pass Rate (PR) reported (1 if code passes all test cases, 0 otherwise)
Experimental results
Line and api completion datasets
- RepoCoder consistently and significantly improves the zero-shot performance on both datasets across all model sizes and retrieval methods.
- RepoCoder outperforms RG-1 across all settings and gets close to the oracle performance.
- RepoCoder even outperforms the zero-shot CODE-DAVINCI-002 model.
- Simple sparse retriever has equivalent performance to the dense retriever.
Function completion dataset
- Evaluated performance of RepoCoder on function body completion dataset
- Used most powerful CODE-DAVINCI-002 model
- Used sparse retriever
- Performance of RepoCoder similar to line and API completion datasets
- RepoCoder outperforms zero-shot baseline and one Retrieval-Generation iteration
- Performance of RepoCoder close to oracle method
Statistics on retrieved code snippets
- RepoCoder integrates code snippets into the prompt to provide context
- Study examines impact of retrieved code snippets
- Results show positive correlation between higher line overlap and better performance
- Results show high token overlap between retrieved code snippets and ground truth completion
Context locations of effective retrieval
- Code snippets retrieved from prompt bring relevant context from other files in repository
- Study performed to understand impact of different context locations
- Identified 2,364 and 1,866 code snippets for line and API completion datasets
- Classification scheme of five distinct file locations used to locate original source of code snippets
- Majority of code snippets located within defined categories
- Majority of code snippets originate from files with “Similar Import”, “Similar Name”, or “Current Directory” locations
- Stronger need for information obtained from other files in API completion scenarios
Code duplication in repositories
- RepoCoder’s performance is positively correlated with the code duplication ratio of a repository.
- Results show a correlation between RepoCoder’s performance and the code duplication ratio.
- Highest duplication ratio results in highest performance improvement for RepoCoder.
- Correlation between RepoCoder’s performance and the code duplication ratio is not absolute.
Study on failed cases
- RepoCoder effectiveness and limitations investigated
- Results suggest improvement with use of zero-shot, RG-1, RG-2, and oracle methods
- Manual case study reveals majority of failures caused by misguided code retrievals
- Model predictions not always suitable for retrieval
- Language models sensitive to given prompts and small variations in code snippets
- Need for more accurate evaluation methods
Related work
- Global context in code completion is a challenge
- Conventional code completion techniques analyze code and re-rank candidate suggestions
- Code completion can also be approached as a language modeling task
- Pre-trained language models have gained attention in code completion
- Joint modeling of retrieval and generation is being explored for code generation
- In-context joint retrieval and generation is a growing trend
Conclusion and future work
- RepoCoder is a framework for repository-level code completion
- Utilizes a similarity-based retriever and a pre-trained language model
- Iterative retrieval and generation bridge the gap between retrieval context and the intended target
- Experiments show RepoCoder improves zero-shot code completion performance
- Comprehensive analysis provides insights into effectiveness and limitations of RepoCoder