Artificial Intelligence-Assisted Programming Tasks: Comparison
Please note this is a comparison between Version 1 by Man Fai Wong and Version 2 by Peter Tang.

Artificial intelligence (AI)-assisted programming can enable software engineers to work more efficiently and effectively with the existing software tools such as OpenAI ChatGPT, Github Copilot, DeepMind AlphaCode, Amazon Codewhisperer, Replit Ghostwriter, Microsoft IntelliCode and Codedium, especially in situations where complex algorithms are being used that involve large amounts of code (i.e., Big Code regime). It also strikes a balance between productivity and ensuring safety, security, and reliability within the programming development environment. There are two main categories of AI-assisted programming tasks related to software naturalness: generation and understanding. The former includes code generation, code completion, code translation, code refinement, and code summarization. The latter is concerned with understanding code and includes defect detection and clone detection.

  • software naturalness
  • large language models
  • AI-assisted programming

1. Code Generation

Program synthesis, also known as source code generation, is the process of automatically generating source code from a programming language based on user-specified constraints [1][2][74,75]. This researchtudy focuses on text-to-code generation for code generation, while code-to-code generation is referred to as code translation. The history of code generation dates back to the use of theorem provers to construct a proof of user-provided specifications and extract corresponding logical programs [3][4][76,77]. With the increasing popularity of deep learning methods, neural methods, including Long Short–Term Memory (LSTM) [5][78] and Recursive–Reverse–Recursive Neural Network [6][79], have been adopted to generate output programs with specific inductive biases given sufficient program samples. More recently, transformer-based LLMs such as GPT-3 [7][59] and T5 [8][50] have shown impressive performance in code generation tasks by leveraging contextual representations learned from large amounts of code, as well as public code sources and natural language data, to improve program synthesis. These approaches incorporate systematic pre-training and fine-tuning tasks to develop a deep understanding of code structure and meaning, making them well-suited for software development tasks. To evaluate the models for code generation tasks, different metrics are available such as 𝑝𝑎𝑠𝑠@𝑘
[9][35], which measures the percentage of problems solved using k generated programs per problem, BLEU-4 [10][80], and exact match accuracy on program synthesis benchmarks such as APPS [11][36], MBPP [12][81], and CodeBLEU [8][50], which consider both syntactic and semantic matches based on code structure in addition to N-gram matches.

2. Code Completion

Code completion, also known as autocompletion, is a software development feature that suggests possible code completions as a programmer types [13][82]. Its goal is to save time and reduce errors by providing suggestions for method names, variable names, and even entire code snippets [14][83]. Previous research on code completion started with statistical language models [15][16][84,85]. Later, LSTM-based deep learning approaches were applied to the task, aiming to learn the semantic information of source code without considering its syntactic structure [17][86]. To address the limitations of LSTM-based language models, transformer architecture was introduced for code completion. Normally, the language models for code completion are trained using a causal language model that predicts the unknown token after a sequence of known tokens. Recent work on code completion using LLMs [9][18][35,87] has shown impressive performance on benchmarks, such as CodeXGLUE [19][34], compared to existing statistical language models and deep learning approaches.

3. Code Translation

Code translation is the process of converting code from one programming language to another, with the goal of migrating legacy software. While theoretically possible, building a code translator is challenging due to differences in syntax and platform APIs between programming languages. Most current translation tools are rule-based, requiring handcrafted rewrite rules applied to an abstract syntax tree (AST) derived from the input source code. However, creating such tools demands significant expertise in both the source and target languages. Recent studies have explored using statistical machine translation [20][21][88,89] as well as deep learning approaches [22][23][90,91] for programming language translation. Quality evaluation for generated functions often uses the BLEU score, while the exact match is used to compare generated output with reference ground truth.

4. Code Refinement

Code refinement, which can be referred to as automated program repair (APR), is the process of automatically fixing bugs or vulnerabilities by converting a buggy function into a correct one. Deep learning models have a strong learning capability that enables them to learn various patterns for transforming buggy programs into patched ones from large code corpora. Many studies [24][25][92,93] have demonstrated the superior performance of deep learning-based techniques over traditional template-based [26][27][94,95], heuristic-based [28][29][30][96,97,98], and constraint-based [31][32][99,100] APR techniques. LLM is used to generate plausible patches or modifications to a given incorrect code. The model can be trained on a large corpus of correct code to learn the patterns and structures of correct code. When LLMs are given a faulty code, the model can then generate suggestions for how to correct it as one of the downstream tasks. The LLMs for code refinement can be evaluated by CodeXGLUE [19][34] or HumanEval [9][35] as the abstracted codes or the classical APR benchmarks such as Defects4J [33][101] and QuixBugs [34][102] as real-world codes, but the understanding and generation of concrete variable and function names is still mandatory and challenging [35][103].

5. Code Summarization

Code summarization is a technique used to generate English descriptions of code snippets at the function level, which can then be used to generate documentation. Typically, this involves taking the source code as input and producing a natural language summary as output. In AI-assisted programming tools, code summarization can be used to analyze code and identify optimization opportunities, such as using a binary Euclid algorithm instead of a traditional modular arithmetic-based algorithm, which can significantly improve software performance. In recent years, there has been promising research into the automatic generation of natural language descriptions of programs, with studies such as [36][37][38][104,105,106] making notable progress in this area. The rise of deep learning, coupled with the abundance of data from open-source repositories, has made automatic code summarization an area of interest for researchers. Many of the neural approaches [39][40][107,108] use a sequence-to-sequence approach to generate source code summaries, with some models converting the source code into various types of representations, such as token-based [41][42][109,110], tree-based [43][44][111,112], and graph-based [45][46][113,114], before passing it through language models.

6. Defect Detection

As software systems increase in complexity, it becomes more challenging to identify errors. Defect detection aims to enhance software reliability by predicting whether a piece of code is susceptible to bugs or not, by detecting previously unknown errors. Rule-based approaches have been defined in existing defect detection frameworks by inferring likely programming rules from various sources such as code, version histories, and comments [23][47][48][91,115,116]. Statistical language models based on N-gram language models have also been widely used in this area [49][50][51][117,118,119]. More recently, many deep learning-based solutions [27][52][53][54][55][56][57][95,120,121,122,123,124,125] have been proposed to bridge the gap by suggesting different feature sets from which the detection framework can learn, attempting to imitate how a practitioner looks for vulnerabilities. However, LLMs, such as CodeBERT [58][126], have recently emerged as a promising technique in this field due to their ability to understand code structure. These models can be trained on a large corpus of error-free code and used to identify patterns and structures in source code that deviate from those learned from the error-free code as a binary classification task [59][60][127,128]. To evaluate the model predictions, accuracy, precision, recall, and F1 scores can be used.

7. Clone Detection

Clone detection involves identifying identical or similar code fragments, known as clones, within or across software systems. The goal of clone detection is to measure the similarity between two code snippets and determine if they have the same functionality. Clones can be classified into four types [61][62][129,130], with types 1–3 being syntactic clones that differ in minor ways, while type 4 clones, known as semantic clones, are difficult to detect since they have different syntax but the same semantics and, thus, require manual validation. With the increasing amount of source code, large-scale and automatic clone detection has become essential. Several tools have been developed to perform clone detection [63][64][65][66][67][68][131,132,133,134,135,136], using techniques such as comparison of the AST, tokens, or source code text. Notable clone detection datasets include BigCloneBench [69][25], which contains Java code snippets.
Table 1.
Summary of language models for AI-assisted programming tasks.
Video Production Service