This is a short introduction on what is clang and how it can be used.
The Clang project provides a language front-end and tooling infrastructure for languages in the C language family (C, C++, Objective C/C++, OpenCL, CUDA, and RenderScript) for the LLVM project. Clang can parse and analyze any source code in the C language family and has a wonderful modular design that makes it easy to use. Both a GCC-compatible compiler driver (clang) and an MSVC-compatible compiler driver (clang-cl.exe) are provided by Clang. It is very useful for doing static analysis and has a decent documentation. Also, the Clang mailing list is very active and helpful if you ever find yourself stuck on something.
The goal of this tutorial is to learn about what is Clang and how it can be used.
B. How does Clang work?
Not only is Clang a compiler for the C language family, but it is also an infrastructure to build tools. Thanks to its library based architecture, which makes the reuse and integration of new features more flexible and easier to integrate into other projects.
Like many other compilers design, Clang compiler has three phase:
- The front end: that parses source code, checks for errors, and builds a language-specific Abstract Syntax Tree (AST) to represent the input code.
- The optimizer: its goal is to do some optimization on the AST generated by the front end.
- The back end : that generate the final code to be executed by the machine, it depends of the target.
The optimizer and back end of Clang is handeled by LLVM.
In most cases, Clang will run the Preprocessor (expanding all macros) and parse your source code into an Abstract Syntax Tree (AST). The preprocessed AST is a lot easier to work with than the source-level C code, but you can always reference the original code with ease. In fact, every data structure in Clang used to represent the code (AST, CFG, etc) can always relate back to the original source, which is quite useful for many analysis purposes (refactoring, etc).
If you need to analyze or modify code at the source level, Clang is better than LLVM. Doing analysis with LLVM means you must use LLVM’s internal representation of the code, which is similar to assembly.
C. Clang Abstract Syntax Tree (AST)
Almost every compiler and static analysis tool uses an AST to represent the source code. Clang’s AST is very detailed and complex, but you’ll actually enjoy learning about all the different Classes of AST elements. Here’s an introduction to the Clang AST, but the easiest way to learn about it is to just dump the AST for a simple source file to see how the AST is laid out.
- FunctionDecl — Represents a function declaration or definition.
- BinaryOperator — A builtin binary operation expression such as “x + y” or “x <= y”.
- CallExpr — Represents a function call, such as foo(x,2).
Most Classes in the AST are pretty self-explanatory, like ForStmt, IfStmt, and ReturnStmt. You’ll get the hang of the AST after playing with it for a few minutes. You can usually find the documentation for the Class by googling something like “clang functiondecl”.
D. How Can I Use Clang?
Clang can be used as a drop-in replacement for gcc and it offers some cool built-in static analysis tools. As a programmer (not just a normal user!), you can access the full power of Clang by using it as a library in one of three ways, depending on how you wish to program.
First, go check out Clang’s own description of each interface. In addition to everything stated on that site, we’ve highlighted some other significant differences between the multiple Clang interfaces below.
LibClang is a stable high level C interface to clang. When in doubt LibClang is probably the interface you want to use. Consider the other interfaces only when you have a good reason not to use LibClang. Clang changes periodically, and if you use a Plugin or Libtooling, you might have to update your code to match Clang’s changes (but don’t let that discourage you!). If you need to access Clang’s API from a language other than C++ (like Python), you must use LibClang. But if you want full control over the AST, then Plugins and LibTooling are better choice.
D.2 Clang Plugin
Clang Plugins make it possible to run extra user defined actions during a compilation. Your code is the plugin itself and is run as a completely new instance for each source file, meaning you cannot keep any global information or other contextual information across different source files (but you can still run it on multiple files sequentially). A plugin is run by passing some options to your build system (Clang, Make, etc) via command-line arguments. It’s almost like enabling an optimization in GCC (e.g., “-O1”). You won’t be able to run any custom task before or after a source file is analyzed. If you want full control on how Clang is set up, then LibTooling is a better choice.
D.3 LibTooling (Clang Tool)
Your code is a normal C++ program; it has a normal main() function as the entry point. LibTooling is usually for running analysis on some source code (multiple files, if you want) separately from your normal build process. A new instance of your analysis code (and a new AST) will be created for each new source file (much like a Clang Plugin), but you are able to maintain contextual information across each source file because data items like global variables will persist. Since you have a main() function, you can also run tasks before or after Clang has finished analyzing all of your source files.
E. Getting Started with Clang
Now that you know a bit about the basics, let’s get started! For reference, these instructions work on any version of Linux (and probably Mac OS X) but were tested on Ubuntu 16.04. You can follow the tutorial on Building LLVM with OpenMP support to obtain LLVM, Clang and OpenMP.