Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • RepoCoder is a framework to address the challenge of repository-level code completion.
  • RepoCoder utilizes a similarity-based retriever and a pre-trained code language model.
  • RepoCoder uses a novel iterative retrieval-generation paradigm.
  • RepoEval is a new benchmark for testing the performance of RepoCoder.
  • RepoCoder significantly improves the zero-shot code completion baseline.

Paper Content

Introduction

  • Awareness of other files in repository is important for software production.
  • Automated tools should take into account broader context in repository for code completion.
  • Interrelated dependencies, such as shared utilities, configurations, and cross-API invocations, exist in code files.
  • Repositories have unique naming conventions and coding styles.
  • Modularization and customization bring difficulties to automatic code completion tools.
  • RepoCoder proposed to bridge gap between retrieval context and intended completion target.
  • RepoEval benchmark created to evaluate repository-level code completion.
  • RepoCoder significantly outperforms zero-shot code completion paradigm.

Methodology

Overall framework

  • Task of code completion using language model M can be characterized as Ŷ = M(X)
  • RepoCoder is a framework that integrates code generation and retrieval models
  • Repository code files are partitioned into code snippets
  • Retrieval model R is used to obtain relevant code snippets from C repo
  • Language model M is used to perform retrieval-augmented generation
  • New query is established using Ŷ 1 for retrieval
  • Final prediction is obtained as Ŷ = M(C 2 ret , X)

Generation-augmented retrieval

  • Retriever employed in RepoCoder framework can be any model capable of searching for relevant documents.
  • Retrieval database is constructed using a sliding window which scans files and extracts contiguous lines of code.
  • Query is formulated using last S w lines of unfinished code X and most similar code snippets are retrieved.
  • Generate-then-retrieve paradigm proposed to mitigate issue of indiscriminately shifting all retrieved code snippets.

Retrieval-augmented generation

  • Generator used in RepoCoder framework is a pre-trained language model.
  • Need to incorporate global and local context for code completion.
  • Goal is to find optimal method of prompt construction.
  • Code snippets presented in ascending order based on similarity scores.
  • Prompt contains both global and local information needed for code completion.

Benchmark construction

  • Task of code completion in software repositories is common
  • Proposed RepoEval benchmark to evaluate code completion tools
  • Benchmark covers 3 levels of code completion granularity: line, API invocation, and function body
  • Utilize unit tests to evaluate correctness of completed functions
  • RepoEval is publicly available and labeled with source repository, file path, line numbers, and ground truth completion
  • 8 repositories selected that meet criteria of open-source license, created after Jan 1, 2022, non-fork, more than 100 stars, >80% Python files, and explicit unit tests
  • 1600 test samples for line completion, 1600 for API invocation, and 455 for function body completion

Experimental setup

Implementation details

  • Evaluate two distinct retrieval methods for RepoCoder
  • First method is a sparse bag-of-words model
  • Second method is a dense text embedding model
  • Test four pre-trained language models with varying code generation capabilities
  • Carefully consider various hyper-parameters to optimize performance

Methods for comparison

  • Previous studies have shown that large pre-trained language models can generate code in a zero-shot manner.
  • Utilizing intra-file context is valuable for code completion scenarios.
  • RepoCoder mitigates the issue of omitted context by adding it to the retrieval database.
  • RepoCoder uses generation-augmented retrieval to bridge the gap between retrieval and the intended completion target.
  • An oracle retrieval-augmented generation method is used to compare the efficacy of this approach.

Evaluation metrics

  • Evaluation of datasets uses Exact Match (EM) and Edit Similarity (ES) metrics
  • EM score is binary (1 if predicted code matches ground truth, 0 otherwise)
  • ES metric is calculated using Levenshtein distance
  • Unit tests used to evaluate function body completion dataset
  • Pass Rate (PR) reported (1 if code passes all test cases, 0 otherwise)

Experimental results

Line and api completion datasets

  • RepoCoder consistently and significantly improves the zero-shot performance on both datasets across all model sizes and retrieval methods.
  • RepoCoder outperforms RG-1 across all settings and gets close to the oracle performance.
  • RepoCoder even outperforms the zero-shot CODE-DAVINCI-002 model.
  • Simple sparse retriever has equivalent performance to the dense retriever.

Function completion dataset

  • Evaluated performance of RepoCoder on function body completion dataset
  • Used most powerful CODE-DAVINCI-002 model
  • Used sparse retriever
  • Performance of RepoCoder similar to line and API completion datasets
  • RepoCoder outperforms zero-shot baseline and one Retrieval-Generation iteration
  • Performance of RepoCoder close to oracle method

Statistics on retrieved code snippets

  • RepoCoder integrates code snippets into the prompt to provide context
  • Study examines impact of retrieved code snippets
  • Results show positive correlation between higher line overlap and better performance
  • Results show high token overlap between retrieved code snippets and ground truth completion

Context locations of effective retrieval

  • Code snippets retrieved from prompt bring relevant context from other files in repository
  • Study performed to understand impact of different context locations
  • Identified 2,364 and 1,866 code snippets for line and API completion datasets
  • Classification scheme of five distinct file locations used to locate original source of code snippets
  • Majority of code snippets located within defined categories
  • Majority of code snippets originate from files with “Similar Import”, “Similar Name”, or “Current Directory” locations
  • Stronger need for information obtained from other files in API completion scenarios

Code duplication in repositories

  • RepoCoder’s performance is positively correlated with the code duplication ratio of a repository.
  • Results show a correlation between RepoCoder’s performance and the code duplication ratio.
  • Highest duplication ratio results in highest performance improvement for RepoCoder.
  • Correlation between RepoCoder’s performance and the code duplication ratio is not absolute.

Study on failed cases

  • RepoCoder effectiveness and limitations investigated
  • Results suggest improvement with use of zero-shot, RG-1, RG-2, and oracle methods
  • Manual case study reveals majority of failures caused by misguided code retrievals
  • Model predictions not always suitable for retrieval
  • Language models sensitive to given prompts and small variations in code snippets
  • Need for more accurate evaluation methods
  • Global context in code completion is a challenge
  • Conventional code completion techniques analyze code and re-rank candidate suggestions
  • Code completion can also be approached as a language modeling task
  • Pre-trained language models have gained attention in code completion
  • Joint modeling of retrieval and generation is being explored for code generation
  • In-context joint retrieval and generation is a growing trend

Conclusion and future work

  • RepoCoder is a framework for repository-level code completion
  • Utilizes a similarity-based retriever and a pre-trained language model
  • Iterative retrieval and generation bridge the gap between retrieval context and the intended target
  • Experiments show RepoCoder improves zero-shot code completion performance
  • Comprehensive analysis provides insights into effectiveness and limitations of RepoCoder