AutoCompiler: Automatic Repo-Level Compilation Tool Based on AI Agents

TL;DR: AutoCompiler is a tool that automatically compiles projects from their source code repositories. Going beyond our paper, this post will show some extended experiments focusing on the application of AutoCompiler to various downstream tasks.

Resources:

  • You can access the source code of this project at Ch3nYe/AutoCompiler.
  • The study of methodology behind this tool can be found in our paper:

    CompileAgent: Automated Real-World Repo-Level Compilation with Tool-Integrated LLM-based Agent System
    Hu Li, Ch3nYe(Co-first Author), et al. arXiv:2505.04254

Introduction

There are many scenarios where compilation of code repositories needs to be performed, such as CI/CD pipelines, automated fuzzing tests, code security checks, malware code audits, and so on. However, the compilation process can be time-consuming and error-prone, especially when dealing with large codebases or complex dependencies.

To address this issue, we propose AutoCompiler, the first automatic repo-level compilation tool based on AI agents. AutoCompiler leverages state-of-the-art AI models to analyze the source code of a project and build the code artifact automatically, as well as handling possible errors occured in compilation. This tool can significantly reduce the time and effort required to compile code repositories, making it easier to integrate compilation tasks into various workflows.

Currently, AutoCompiler are built for C/C++ projects those are more easily to arise compilation issues, and it is designed to be extensible to other programming languages. If anyone is interested in extending AutoCompiler to support other languages, please feel free to do it.

Challenges

At the very first, we need to identifiy the challenges of automatic compilation in code repositories. We thus conduct an extensive study on the compilation process of 100 popular C/C++ projects on GitHub. In general, the compilation process of code repositories is challenging due to the following reasons:

  • Challenge 1: The compilation process is highly variable and unstructured. Developers have significant freedom in designing the compilation process, making it fundamentally difficult to automate project-level compilation.
  • Challenge 2: Compilation guides are often poorly documented, with issues such as varying guide formats and styles, outdated information, and missing documentation. These problems prevent the automation of the compilation process through simple guide parsing.
  • Challenge 3: The compilation environment and dependencies are complex. Issues such as missing dependencies, inconsistently managed dependency sources, and conflicts between dependencies and the current environment frequently lead to compilation failures.

Methodology

Here, we just briefly introduce the methodology of AutoCompiler. For more details, please refer to our paper.

workflow

Figure 1: The overview of AutoCompiler.

Figure 1 illustrates the workflow of AutoCompiler from circled 1 to circled 16. Generally, our AutoCompiler employs a master agent to charge the whole compilation process, and calls various specialized tools to tackle different requirements. The reasoning skills and heuristics of LLM-based agents, along with their capability to understand unstructured input, make them well-suited for tasks like automated compilation, in contrast to traditional hard-coded automation programs. In other words, the LLM-driven approach fundamentally mitigate the Challenge 1.

There are two main modules:

  • CompileNavigator Module: locates the correct compilation instructions within a code repository by employing tools like Shell, File Navigator, and Instruction Extractor. Its role is to accurately identify files containing compilation instructions and extract those instructions to facilitate the compilation process. Designed to mitigate Challenge 2.
  • ErrorSolver Module: addresses compilation errors during the build process by utilizing tools such as Website Search and Multi-Agent Discussion. It aims to resolve errors like missing dependencies or configuration conflicts by discussing and consensus-building among multiple agents or by searching for solutions online. Designed to mitigate Challenge 3.

Evaluation

We conducted experiments to evaluate the performance of AutoCompiler in practical scenarios, including fuzzing tests, code security checks, and malware code audits. And the backend LLM is GPT-4o for the following experiments.

Overall Effectiveness

For the overall tests, we collect 100 popular C/C++ GitHub projects, which has 14 topics ranging from crypto to networking. We collected the software artifacts for these projects from software sources, from which we obtained the compilation tagets. The compilation was considered successful if the targets were built.

We include the OSS-Fuzz-Gen as the only baseline, which is a state-of-the-art fuzzing tool that automatically generates fuzz tests for C/C++ projects. More specifically, the oss-fuzz-gen/experimental/c-cpp provide 14 different auto-build-heuristics base on the files existed in target repo. For example, if the repo has a Makefile, the heuristic with a make command will be used to build the project.

Table 1: Overall Effectiveness Compared with Baseline

Method Success Rate Expensive(U.S. Dollar $) Time
OSS-Fuzz-Gen* 25% / 53h
AutoCompiler 89% 16.53$ 8.4h

*The automatic building component of OSS-Fuzz-Gen works without relying on LLMs.

Table 1 has shown the outperformance of AutoCompiler in terms of success rate, cost, and time.

Downstream Task: auto-fuzzing

To further explore the possibility of deploying AutoCompiler in the auto-fuzzing context, we replace the original auto-build-heuristics in OSS-Fuzz-Gen with our AutoCompiler.
More specifically, the manager.py existed in oss-fuzz-gen/experimental/c-cpp will invoke a auto-building try before generating fuzz drivers, which deem building a static library and linking it with an empty-fuzz-driver as success. We also take that as the success criteria in this evaluation. And the following table shows the results:

Table 2: Building Results in Auto-Fuzzing Compared with Baseline

Repo Repo-Type Oss-Fuzz-Gen AutoCompiler
success failure reason time* success failure reason time
mbedtls crypto × extra-deps(pypi) / / 159s
blis math × not-follow-inst / / 51s
zstd compress / 28mins(12) / 147s
lvgl graphics × not-follow-inst, extra-deps(ninja) / / 55s
sqlite3 database × not-follow-inst, extra-deps(dev) / / 178s
libvips image × not-follow-inst, extra-deps(tool) / / 182s
miniaudio audio × no-static-option / / 32s
curl network / 12mins(13) / 173s
libexpat format × default-not-static / / 119s

*The number in the bracket is the number of auto-build-heuristics invoked.

The Table 2 give us the same conclusion as the overall effectiveness evaluation. AutoCompiler outperforms the baseline. To further analyze, we summarize the failure reasons from auto-build-heuristics as follows:

  • extra-deps: the project requires extra dependencies that are not installed in the environment, such as dev libraries, pypi libraries, and other tools.
  • not-follow-inst: one building try does not follow the standard instruction provided in the project.
  • no-static-option: the project does not provide a static library building option.
  • default-not-static: the project use a default building configuration without static library.

Downstream Task: code security checks

Many code security checks are done in or after the compile-time. When the number of code repository waiting for check scaled up (up to hundreds or thousands), manual compilation becomes infeasible.
Therefore, we also conduct experiments on including code security checks in the compilation process with the AutoCompiler:

  • gcc-sanitizer: gcc built-in code sanitizer, using with -fsanitize=address -fsanitize=leak flags.
  • scan-build: the code static analyzer comes up with clang, wrapper building command with scan-build to perform static analysis.

Table 3: Experiment Results in Code Security Check

Repo Repo-Type w/ gcc sanitizer* Time w/ scan-build Bug found** Time
mbedtls crypto 161s 23 212s
blis math 99s 0 101s
zstd compress 121s 20 186s
lvgl graphics 63s 8 105s
sqlite3 database 257s x ≥107 ≥30mins(timeout)
libvips image 9mins 15 203s
miniaudio audio 71s 0 40s
curl network 153s 4 210s
libexpat format 222s 0 152s

*The gcc sanitizers check code safety in runtime, we thus did not run the binaries manually due to hardly reach full coverage.

**We report the number of bug loggings during compilation with scan-build.

Table 3 presents the compilation results with gcc sanitizer and scan-build. Even if it takes longer time to building, the most of projects can still be compiled with the checkers, except for sqlite3 which is timeout with scan-build. However, the bug found by the code static analyzer is valuable for the code security check. Notably, the bugs reported by scanbuild does not always mean dangerous vulnerabilities, but may just be warnings. The number of bugs is not less, e.g., example, sqlite is reported at least 107. In my opinion, it is a very promising study to design agent-based methods to identify and fix these bugs.

Downstream Task: malware code audits

In malware audits, the source code of malware may be obtained through various ways. Unlike well-maintained software projects, such source codes typically have no comprehensive documentation. Therefore, we evaluated the effectiveness of AutoCompiler in the automatic compilation of malware source code. Here, we present a case study that documents the reasoning process of the agents during the compilation procedure.

We selected a open source trojan repository Backdoor.Linux.Bashdoor, a project that lacks documentation regarding its build process. The following are the important steps of reasoning organized by author:

step 1: understanding and planning

To compile the project "Linux.Bash-door" from source code, I will first examine the project structure to understand its layout. Then, I will use the `CompileNavigtor` tool to find out the compilation instructions. After that, ......

Action: SHELL
Action Input: tree . -L 2
Observation:
root@c0mpi1er-c0nta1ner:/work# tree . -L 2
.
├── Makefile
├── Readme
├── SeCshell.c
├── bashdoor.c
└── bashp

1 directory, 5 files

step 2: identfiy the compilation instructions

The project structure shows a `Makefile`, which is commonly used for compilation. I will use the `CompileNavigator` tool to analyze the project structure and determine the compilation instructions.

Action: CompileNavigator
Action Input: { "files": ["Makefile", "Readme", "SeCshell.c", "bashdoor.c", "bashp"] }
Observation:
To compile the Linux.Bash-door project, use the following instructions:  

1. Open a terminal and navigate to the project directory.  
2. Compile the C source files using `gcc`:  

   gcc -o bashdoor bashdoor.c SeCshell.c  

3. The compiled binary will be named `bashdoor`.  

These instructions assume the use of the `gcc` compiler and standard compilation practices for C projects. If additional dependencies or flags are required, they should be specified in the project documentation or Makefile.

step 3: try to build

Based on the CompileNavigator's output, the project can be compiled using the `gcc` command. I will now execute the compilation command in the Shell.

Action: Shell  
Action Input: gcc -o bashdoor bashdoor.c SeCshell.c  
Observation: 
root@c0mpi1er-c0nta1ner:/work# gcc -o bashdoor bashdoor.c SeCshell.c

bashdoor.c: In function ‘bash’:
bashdoor.c:32:1: warning: implicit declaration of function ‘sleep’ [ttps://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-function-declaration-Wimplicit-function-declaration]8;;]
   32 | sleep(3);
      | ^~~~~
bashdoor.c:37:1: warning: implicit declaration of function ‘chdir’ [ttps://gcc.gnu.org/onlinedocs/gcc/Warning-Options.
...

step 4: try to solve the errors

The compilation process encountered errors, specifically related to undefined references to functions like `crypt`, `getpass`, `sleep`, `chdir`, `getuid`, `setuid`, `setgid`, and `execl`. These functions are part of standard libraries, and their headers might not be included in the source files. Additionally, there is a linker error due to multiple definitions of the `main` function, which suggests that both `bashdoor.c` and `SeCshell.c` contain a `main` function, which is not allowed in a single executable.

To resolve these issues, I need to:
1. Include the necessary headers for the missing functions.
2. Fix the multiple `main` function issue by ensuring only one `main` function exists in the project.

I will first check if the necessary headers are included in the source files. If not, I will add them. Then, I will modify the project to ensure only one `main` function exists.

Action: Shell  
Action Input: cat bashdoor.c SeCshell.c  
Observation: 
root@c0mpi1er-c0nta1ner:/work# cat bashdoor.c SeCshell.c
/*
 * Bash-door.c - By bob for www.dtors.net 
 ......
*/

The source files `bashdoor.c` and `SeCshell.c` both contain a `main` function, which is causing the linker error. Additionally, the missing function declarations (e.g., `sleep`, `chdir`, `getuid`, `crypt`, etc.) are due to the absence of necessary header files. To resolve these issues:

1. I will modify the project to ensure only one `main` function exists. Since `bashdoor.c` appears to be the primary file, I will remove the `main` function from `SeCshell.c` and make it a library or helper function.
2. I will include the necessary headers (`<unistd.h>` for `sleep`, `chdir`, `getuid`, `setuid`, `setgid`, `execl`, and `<crypt.h>` for `crypt`).

Let me proceed with these changes.

Action: Shell  
Action Input: sed -i '/int main()/,/^}/d' SeCshell.c && echo '#include <unistd.h>' >> SeCshell.c && echo '#include <crypt.h>' >> SeCshell.c  
Observation: 
root@c0mpi1er-c0nta1ner:/work# sed -i '/int main()/,/^}/d' SeCshell.c && echo '#include <unistd.h>' >> SeCshell.c && echo '#include <crypt.h>' >> SeCshell.c

I have removed the `main` function from `SeCshell.c` and added the necessary headers. Now, I will recompile the project using the `gcc` command.

Action: Shell  
Action Input: gcc -o bashdoor bashdoor.c SeCshell.c -lcrypt  
Observation: 
root@c0mpi1er-c0nta1ner:/work# gcc -o bashdoor bashdoor.c SeCshell.c -lcrypt

bashdoor.c: In function ‘bash’:
bashdoor.c:32:1: warning: implicit declaration of function ‘sleep’ [ttps://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-function-declaration-Wimplicit-function-declaration]8;;]
   32 | sleep(3);
      | ^~~~~
......

step 5: invoke ErrorSolver

Discussion Final Solutions:

The compilation is still failing due to conflicting types for `sleep` and `getuid`. This is likely because the `<unistd.h>` header was added after the function declarations were already implicitly defined. To resolve this, I need to ensure that the `<unistd.h>` header is included at the top of `bashdoor.c`, before any function calls. I will move the `#include <unistd.h>` line to the beginning of the file.

step 6: try to build

Action: Shell  
Action Input: gcc -o bashdoor bashdoor.c SeCshell.c -lcrypt  
Observation: 
root@c0mpi1er-c0nta1ner:/work# gcc -o bashdoor bashdoor.c SeCshell.c -lcrypt

The compilation command executed without any errors or warnings, which indicates that the project has been successfully compiled. The output binary `bashdoor` should now exist in the `/work/` directory. I will verify this by checking the contents of the directory.

JUMP X STEPS for clearness

step ?: check if the compilation is successful

Action: Shell  
Action Input: ls  
Observation: 
root@c0mpi1er-c0nta1ner:/work# ls
Makefile  Readme  SeCshell.c  bashdoor  bashdoor.c  bashp

The `ls` command confirms that the `bashdoor` binary has been successfully created in the `/work/` directory. This indicates that the compilation process was successful.

Final Answer: COMPILATION-SUCCESS

Although errors arised during the compilation process, AutoCompiler can handle them and call tools to get possible solution, and finally compile the project successfully.

Discussion and Future Wpork

Extension to More Languages

Extending AutoCompiler to support more compiled programming languages is a natural progression. The current implementation focuses on C/C++ projects, but the underlying principles can be applied to other languages with similar compilation processes.

Extension to Multiple Architectures

In high-level programming language ecosystems (such as C/C++/Rust/Go), cross-architecture compilation toolchains are often available, allowing the same source code to be compiled into executables for different architectures to run on various machines. However, some projects use low-level architecture-specific code to enhance performance/security, which hinders their multi-architecture scalability. Considering these factors, our evaluation did not perform multi-architecture compilation for the selected projects.

Fine-Grained Compilation Configuration

In some scenarios, we need to set compiler parameters or use compilation options beyond the default configuration to build different binaries. For example, during software release, stripping and Os parameters need to be specified, while during debugging, we need binaries with debugging information. Additionally, enabling dynamic compilation, special algorithm implementations, or compiling different distributions may require configurations related to specific user needs, such as compiling static libs for fuzzing.
While we did not conduct experiments on AutoCompiler with different configurations due to the inability to exhaust all these needs, we have open-sourced AutoCompiler and rely on the community to test it in various configurations.

Refine Compilation Steps

AutoCompiler generates many trial-and-error steps during the compilation process. These steps are erroneous branches in the compilation workflow, causing additional overhead and reducing compilation efficiency. Taking measures to prune erroneous actions during the compilation process would be beneficial for maintaining a feasible compilation orchestration, avoiding the inclusion of incorrect commands in the replay. Such optimizations do not affect the current effectiveness of AutoCompiler but merely refine the automatic compilation results. Therefore, we consider these optimizations as future work.

Conclusion

In this post, we introduced AutoCompiler, the first automatic repo-level compilation tool based on AI agents. Extensive experiments demonstrate the effectiveness and practicality of AutoCompiler in various downstream tasks. We hope that AutoCompiler will be a valuable tool for automating compilation tasks in various scenarios, making it easier to integrate compilation tasks into various workflows.