Inside Clang C++ Compile Time Evaluation: AST vs Bytecode

Introduction

C++ has been around for over 41 years. Over the years, many features have been added, and one of the most powerful is compile-time computation. C++ actually had this capability before its standardization in 1998, particularly after templates were introduced in 1990 with the Cfront 3.0 compiler. Later, in 1994, Erwin Unruh accidentally discovered that C++ templates are Turing complete, demonstrating this with his famous prime number sequence example:

//*
//* This is not the exact literal example but it holds the same semantics 
//*

template <int i>struct D { D(void*); };

template <int p, int i> struct is_prime {
  enum { res = (p % i) && is_prime<p, i - 1>::res };
};

template <int p> struct is_prime<p, 1> {
  enum { res = 1 };
};

template <int i> struct prime_print {
  Prime_print <i-1> a;
  enum { v = is_prime<i, i - 1>::res };
  D<i> d = v ? 1 : 0;
};

template<> struct prime_print<1> { };

Prime_print<10> x;

which if compiled with Clang, we get:

main.cpp:27:12: error: no viable conversion from 'int' to 'D<7>'                                                                                                                                                                                                                                             
   27 |   D<i> d = v ? 1 : 0;
      |            ^~~~~~~~~

main.cpp:27:12: error: no viable conversion from 'int' to 'D<5>'                                                                                                                                                                                                                                             
   27 |   D<i> d = v ? 1 : 0;
      |            ^~~~~~~~~

main.cpp:27:12: error: no viable conversion from 'int' to 'D<3>'                                                                                                                                                                                                                                             
   27 |   D<i> d = v ? 1 : 0;
      |            ^~~~~~~~~

main.cpp:27:12: error: no viable conversion from 'int' to 'D<2>'
   27 |   D<i> d = v ? 1 : 0;
      |            ^~~~~~~~~

This technique is known as Template Metaprogramming (TMP), and it is still used for compile-time computation today. But let's be honest: any C++ programmer who has done serious TMP knows how ugly it can get. To address this, C++11 introduced constexpr, making compile-time computation possible with ordinary functions rather than relying solely on TMP magic. This made writing compile-time code significantly easier and, most importantly, sound—because all constexpr code evaluated at compile time must be free of Undefined Behavior (UB).

Initially, constexpr was quite limited. With each new C++ version, it expanded to cover more computational facilities, like branches, loops, and if constexpr for conditional compilation. In C++20, consteval was introduced. Unlike constexpr—which tells the compiler a function can be evaluated at compile time if given constant dependencies, but otherwise runs at runtime—consteval strictly enforces evaluation at compile time.

Why Two Evaluators ?

If you look at the current constant expression evaluator (ExprConstant.cpp), you will see a massive recursive tree walker evaluating the AST of constant expressions. Let's be real: a C++ AST (regardless of the compiler) can be massive. Walking the tree and doing all that pointer chasing during evaluation is expensive in terms of memory and CPU cycles. Furthermore, iterative or recursive code requires the evaluator to revisit nodes and rebuild the context for them N times. It is not the most resourceful evaluator on earth.

But the Clang community didn’t give up. In 2019, a new bytecode interpreter was introduced by Nandor Licker, and later a Red Hat team led by Timm Bäder (along with other contributors) worked heavily to bring it to its current state. The new evaluator is faster and has lower memory footprint.

The bytecode interpreter has matured and is still maturing, but as of the time of writing, it is still experimental. It doesn’t yet fully cover all C++ constant expression evaluation, so the interpreter may still fall back to the tree evaluator. Note that the bytecode interpreter is currently an opt-in feature.

AST Evaluator

I will use a simple example here to demonstrate how both evaluators work , the example is nothing but a simple function computing the factorial of number N , also I will be using compiler explorer to check different outputs

constexpr int factorial(int n){
  return n <= 1 ? 1 : n * factorial(n-1);
}

int main(){
  constexpr auto x = factorial(5);
  return x;  
}

If you try compiling this code with Clang, you will get following output:

as you can notice in the assembly output the factorial of 5 (which is 120) is moved to return register %eax before cleaning the stack frame and returning , so the code for computing factorial was not generated but rather evaluated at compile time as expected.

Now lets take a look at Clang AST for the code above (you can generate the AST using clang -Xclang -ast-dump -fcolor-diagnostics -fsyntax-only factorial.cpp and you can check it at godbolt)

With the tree evaluator , the evaluator will walk the tree recursively to evaluate a certain constant expression so in our example when the compiler sees constexpr int n = factorial(5); it will try to evaluate such expression by visiting all terms and subexpression , the evaluator in our example will see AST node CallExpr <col:23, col:34> 'int' knowing that it’s a constexpr function with constant argument 5 being passed and then it will visit the declaration referenced by the expression (function factorial) and it visit/evaluate the function starting from there FunctionDecl <line:1:1, line:3:1> line:1:15 used constexpr factorial 'int (int)' impl.

In the AST you can see -value: Int 120 as child of VarDecl ! this is because result of constant expression evaluation is stored or written back to AST node, in our example it’s VarDecl <col:5, col:34> col:19 referenced n 'const int' constexpr cinit initializer or Init child node where the value was written back, this applies to both evaluators.

APValue *VarDecl::getEvaluatedValue() const {
  if (EvaluatedStmt *Eval = getEvaluatedStmt())
    if (Eval->WasEvaluated)
      return &Eval->Evaluated;

  return nullptr;
}

EvaluatedStmt *VarDecl::getEvaluatedStmt() const {
  return dyn_cast_if_present<EvaluatedStmt *>(Init);
}

Bytecode Interpreter

The pipeline of bytecode interpretation is clear

Generate the bytecode: Inside Clang there is another smaller compiler that emits the bytecode in memory, it walks the AST emitting bytecode for constexpr/consteval functions after parsing them ! (e.g constexpr int factorial(int n) ) and it also happens evaluation point like when visiting a constexpr variable initializer (e.g. constexpr auto x = factorial(5)) emitting instruction on fly so it will emit and execute Call when visiting CallExpr AST node.
Interpret the bytecode: Simply execute the code generated from 1 at evaluation point to evaluate the constant expression but here where the devil lays (in details) which I am going to talk about deeply below.

The Compilation

Before starting the interpreter has modest number of instructions that you can check from source under Opcodes.td.

Now lets walk through clang source code and see how it does evaluate our factorial example , to use the bytecode interpreter you should pass flag -fexperimental-new-constant-interpreter to clang so something like clang -cc1 -triple x86_64-pc-windows-msvc19.44 .35217 <YOUR_FLAGS> -fexperimental-new-constant-interpreter factorial.cpp

It all starts from clang::Sema::CheckConstexprFunctionDefinition, which Clang triggers after it finishes parsing a function body. This leads to the following call chain: CheckConstexprFunctionBody → clang::Expr::isPotentialConstantExpr.

bool Expr::isPotentialConstantExpr(const FunctionDecl *FD,
                                   SmallVectorImpl<
                                     PartialDiagnosticAt> &Diags) {
  // FIXME: It would be useful to check constexpr function templates, but at the
  // moment the constant expression evaluator cannot cope with the non-rigorous
  // ASTs which we build for dependent expressions.
  if (FD->isDependentContext())
    return true;
 //.....
 //.....
 //.....
  if (Info.EnableNewConstInterp) {
    Info.Ctx.getInterpContext().isPotentialConstantExpr(Info, FD);
    return Diags.empty();
  }
  //....

the Info.EnableNewConstInterp is set to true when passing flag -fexperimental-new-constant-interpreter which from there Info.Ctx.getInterpContext().isPotentialConstantExpr(Info, FD); will get called

bool Context::isPotentialConstantExpr(State &Parent, const FunctionDecl *FD) {
  assert(Stk.empty());

  // Get a function handle.
  const Function *Func = getOrCreateFunction(FD);
  if (!Func)
    return false;

  // Compile the function.
  Compiler<ByteCodeEmitter>(*this, *P).compileFunc(
      FD, const_cast<Function *>(Func));

  if (!Func->isValid())
    return false;

  ++EvalID;
  // And run it.
  return Run(Parent, Func);
}

As I mentioned in earlier, Clang verifies if a function is constant after parsing. If it is, the function is compiled using compileFunc:

void ByteCodeEmitter::compileFunc(const FunctionDecl *FuncDecl,
                                  Function *Func) {
  assert(FuncDecl);
  assert(Func);
  assert(FuncDecl->isThisDeclarationADefinition());

  // Manually created functions that haven't been assigned proper
  // parameters yet.
  if (!FuncDecl->param_empty() && !FuncDecl->param_begin())
    return;

  // Set up lambda captures.
  if (const auto *MD = dyn_cast<CXXMethodDecl>(FuncDecl);
      MD && isLambdaCallOperator(MD)) {
    // Set up lambda capture to closure record field mapping.
    .
    .
    .
    .
  }
    .
    .
    .
  // Compile the function body.
  if (!IsEligibleForCompilation || !visitFunc(FuncDecl)) {
    Func->setIsFullyCompiled(true);
    return;
  }

  // Create scopes from descriptors.
  llvm::SmallVector<Scope, 2> Scopes;
  for (auto &DS : Descriptors) {
    Scopes.emplace_back(std::move(DS));
  }

  // Set the function's code.
  Func->setCode(FuncDecl, NextLocalOffset, std::move(Code), std::move(SrcMap),
                std::move(Scopes), FuncDecl->hasBody(), IsValid);
  Func->setIsFullyCompiled(true);
}

which will walk/visit the tree and emit the instructions. If compilation is successful, it will check if code is well-formed for at least on argument value (more about that later).

Now lets look at the generated bytecode:

factorial 0x121b6ccc7a0
frame size: 0
arg size:   8
rvo:        0
this arg:   0
0      InitScope         0                    
16     GetParamSint32    0                    
32     ConstSint32       1                    
48     LESint32                               
56     Jf                32                --+
72     ConstSint32       1                   |
88     Jmp               88                  |  --+
104    GetParamSint32    0                 <-+    |
120    GetParamSint32    0                        |
136    ConstSint32       1                        |
152    SubSint32                                  |
160    Call              0x121b6ccc7a0 0          |
184    MulSint32                                  |
192    Destroy           0                      <-+
208    RetSint32         
216    Destroy           0 
232    NoRet

I will explain some of the “interesting“ instructions and semantics of others

InitScope: Starts a locals scope. A scope is necessary so the interpreter knows when to invoke the destructors of non trivially constructible objects or deallocate that of trivial objects once the scope ends.
Our example presents a highly interesting case due to the conditional expression (? :). A scope is created specifically to destroy any temporaries created in either or both conditional arms (the true arm is 1, and the false arm is n * factorial(n-1)). During execution, the interpreter tracks which branch was taken and only executes the destructors associated with that active branch.

From Clang source code:

 template <class Emitter>
 bool Compiler<Emitter>::VisitAbstractConditionalOperator(
     const AbstractConditionalOperator *E) {
   const Expr *Condition = E->getCond();
   const Expr *TrueExpr = E->getTrueExpr();
   const Expr *FalseExpr = E->getFalseExpr();

   if (std::optional<bool> BoolValue = getBoolValue(Condition)) {
     if (*BoolValue)
       return this->delegate(TrueExpr);
     return this->delegate(FalseExpr);
   }

   // Force-init the scope, which creates a InitScope op. This is necessary so
   // the scope is not only initialized in one arm of the conditional operator.
   this->VarScope->forceInit();
  // The TrueExpr and FalseExpr of a conditional operator do _not_ create a
   // scope, which means the local variables created within them unconditionally
   // always exist. However, we need to later differentiate which branch was
   // taken and only destroy the varibles of the active branch. This is what the
   // "enabled" flags on local variables are used for.
   llvm::SaveAndRestore LAAA(this->VarScope->LocalsAlwaysEnabled,
                             /*NewValue=*/false);

Also notice how the compiler is trying to do sparse conditional constant folding on the fly at getBoolValue(Condition) to generate the code for the taken branch only.

JF: Stands for “Jump False” , jump operand/target is an offset and it’s simple to calculate (target address - jump instruction address - jump instruction width) so in our example JF 32 translate to (104 - 56 - 16 ).
The formula here is for simplification how the compiler calculates it is different.

Note: jump instruction width depends on the target clang is running on ! the machine I am using is an X86-64 machine

When the compiler visit a jump instruction like JF

 bool ByteCodeEmitter::jumpFalse(const LabelTy &Label) {
   return emitJf(getOffset(Label), SourceInfo{});
 }

it doesn’t emit the operand eagerly because a forward jump to arbitrary address is unknow so clang emit a relocation entry/placeholder

 int32_t ByteCodeEmitter::getOffset(LabelTy Label) {
   // Compute the PC offset which the jump is relative to.
   const int64_t Position =
       Code.size() + align(sizeof(Opcode)) + align(sizeof(int32_t));
   assert(aligned(Position));

   // If target is known, compute jump offset.
   if (auto It = LabelOffsets.find(Label); It != LabelOffsets.end())
     return It->second - Position;

   // Otherwise, record relocation and return dummy offset.
   LabelRelocs[Label].push_back(Position);
   return 0ull;
 }

then later when it visits the AST expression of the target branch it will emit the label aka the correct offset replacing the relocation value from before , actually label emitter do this “replacement” for any jump instruction targeting label Label.

 void ByteCodeEmitter::emitLabel(LabelTy Label) {
   const size_t Target = Code.size();
   LabelOffsets.insert({Label, Target});

   if (auto It = LabelRelocs.find(Label); It != LabelRelocs.end()) {
     for (unsigned Reloc : It->second) {
       using namespace llvm::support;

       // Rewrite the operand of all jumps to this label.
       void *Location = Code.data() + Reloc - align(sizeof(int32_t));
       assert(aligned(Location));
       const int32_t Offset = Target - static_cast<int64_t>(Reloc);
       endian::write<int32_t, llvm::endianness::native>(Location, Offset);
     }
     LabelRelocs.erase(It);
   }
 }

Destroy: Ends the lifetime of all local variables in a scope. The argument passed to it is the index of the scope being destroyed. "Cleanup" here means the abstract deallocation of the memory block representing the underlying storage of a variable (stack or heap). For non-trivially constructible locals, explicit calls to their destructors are emitted, but in many cases, destruction logic is handled implicitly as a result of executing Destroy. In our bytecode, the first Destroy cleans up InitScope (handling the conditional expression temporaries), while the second Destroy is part of standard frame cleanup, emitted unconditionally whether it is reachable or not.
```
 template <class Emitter>
 bool Compiler<Emitter>::visitReturnStmt(const ReturnStmt *RS) {
   if (this->InStmtExpr)
     return this->emitUnsupported(RS);

   if (const Expr *RE = RS->getRetValue()) {
     LocalScope<Emitter> RetScope(this);
     if (ReturnType) {
       // Primitive types are simply returned.
       if (!this->visit(RE))
         return false;
       this->emitCleanup();
       return this->emitRet(*ReturnType, RS);
     }
   .
   .
   .
 }
```
the first Destroy is emitted by this->emitCleanup(); to cover cases like ours (conditional expressions) then when RetScope life ends it will emit another Destroy for frame cleanup.

NoRet: so when evaluation happens clang can guarantee that no unreachable instruction was reached avoiding UB of not returning in returning function

 template <class Emitter>
 bool Compiler<Emitter>::visitFunc(const FunctionDecl *F) {
 .
 .
 .
 .
 .
 // Emit a guard return to protect against a code path missing one.
   if (F->getReturnType()->isVoidType())
     return this->emitRetVoid(SourceInfo{});
   return this->emitNoRet(SourceInfo{});
 }

The Interpretation

Abstractly speaking, this is a stack machine. Tracing any program is just a matter of following what each instruction does and how it manipulates the evaluation stack. The call stack is represented as an InterpStack, and each frame in the call stack is an InterpFrame. These frames manage native local variables and are crucial for emitting accurate stack traces in compiler diagnostics.

A critical element during interpretation is InterpState, which holds the global state of the interpreter (the call stack, current active frame, bottom frame, source mapping, etc.). Because it is a C++ operational semantic interpreter, it carries heavy responsibilities: enforcing standard C++ semantics, diagnosing Undefined Behavior, and generating accurate error messages.

This is the main interpretation loop from Intrep.cpp

bool Interpret(InterpState &S) {
  // The current stack frame when we started Interpret().
  // This is being used by the ops to determine wheter
  // to return from this function and thus terminate
  // interpretation.
  const InterpFrame *StartFrame = S.Current;
  assert(!S.Current->isRoot());
  CodePtr PC = S.Current->getPC();

  // Empty program.
  if (!PC)
    return true;

  for (;;) {
    auto Op = PC.read<Opcode>();
    CodePtr OpPC = PC;

    switch (Op) {
#define GET_INTERP
    #include "Opcodes.inc"
#undef GET_INTERP
    }
  }
}

A careful reader may have noticed from the compilation phase above under isPotentialConstantExpr the interpreter is running or executing ! when there is really no invocation or usage still ! but how so !?

Actually this is not an evaluation run but it’s a verification run , so what happen here is that Clang will check that there is at least on control flow path where program is well-formed constant expression per C++ standard definition (till C++23) so it does evaluate at compile time and if it’s the case then the compiler is free to optimistically assume that it’s the case for all other inputs (even if it’s not the case) and it’s not really required from clang to generate any diagnostic but clang does in some cases , now if function is ill-formed for all paths then an error will get generated.

Quoting from C++ standard [dcl.constexpr] p6 in N4868

For a constexpr function or constexpr constructor that is neither defaulted nor a template, if no argument values exist such that an invocation of the function or constructor could be an evaluated subexpression of a core constant expression, or, for a constructor, an evaluated subexpression of the initialization full-expression of some constant-initialized object ([basic.start.static]), the program is ill-formed, no diagnostic required

IMPORTANT (15/02/2026):
There is a notable disparity in behavior between the old and new evaluators regarding this rule. The new bytecode interpreter bailout from well-form program check early in cases like loading a parameter (e.g. GetParamSint32) so it doesn’t check the true/false arms of ternary expression accordingly it will NOT generate an error when there is no argument where function is well-formed and can be evaluated as constant expression unlike the AST tree evaluator which does emit an error but per C++ standard it doesn’t have to but all of the big three compilers (GCC,MSVC,Clang(with Tree Evaluator)) do emit such error.

so both evaluators are correct theoretically but I thinks this is a Quality of Implementation (QoI) regression since it’s expected that at least the same diagnostics will be emitted by the new evaluator.

constexpr void f(bool b) {
    return b ? throw 0: throw 1;
}

// GCC, MSVC, and Clang(Tree Evaluator) all generate an error.
// Clang(Bytecode Evaluator) will emit nothing.
// Error below produced by Clang (Tree Evaluator):
error: constexpr function never produces a
      constant expression [-Winvalid-constexpr]
   12 | constexpr void f(bool b)
      |                ^

(Note: In C++23 and above this was relaxed and compilers are no longer asked to do so but from C++11 till C++20 while optional compilers do it anyway).

Now lets take a look at evaluation starting point in context of our example

bool Expr::EvaluateAsInitializer(APValue &Value, const ASTContext &Ctx,
                                 const VarDecl *VD,
                                 SmallVectorImpl<PartialDiagnosticAt> &Notes,
                                 bool IsConstantInitialization) const {
  assert(!isValueDependent() &&
         "Expression evaluator can't be called on a dependent expression.");
  assert(VD && "Need a valid VarDecl");

  llvm::TimeTraceScope TimeScope("EvaluateAsInitializer", [&] {
    std::string Name;
    llvm::raw_string_ostream OS(Name);
    VD->printQualifiedName(OS);
    return Name;
  });
  .
  .
  .
  .
  if (Info.EnableNewConstInterp) {
    auto &InterpCtx = const_cast<ASTContext &>(Ctx).getInterpContext();
    if (!InterpCtx.evaluateAsInitializer(Info, VD, this, Value))
      return false;

    return CheckConstantExpression(Info, DeclLoc, DeclTy, Value,
                                   ConstantExprKind::Normal);
  }
  .
  .
  .
  . 
}

This eventually calls bool Interpret(InterpState &S) aka the interpretation loop to evaluate the initialization expression which will get stored back to the variable declaration AST node

//Context.cpp (where caching of result happens)
bool Context::evaluateAsInitializer(State &Parent, const VarDecl *VD,
                                    const Expr *Init, APValue &Result) {
  ++EvalID;
  bool Recursing = !Stk.empty();
  .
  .
  .
  .
  Result = Res.stealAPValue();
  return true;
}

//Decl.pp
APValue *VarDecl::evaluateValueImpl(SmallVectorImpl<PartialDiagnosticAt> &Notes,
                                    bool IsConstantInitialization) const {
  EvaluatedStmt *Eval = ensureEvaluatedStmt(); //return pointer to AST node Init 
  .
  .
  .
  .
  ASTContext &Ctx = getASTContext();
  bool Result = Init->EvaluateAsInitializer(Eval->Evaluated, Ctx, this, Notes,
                                            IsConstantInitialization);
}

but this is not the end of story , compiled function will get evaluated again to be cached for arguments passed since constant functions in C++ are pure so they will return same result for same arguments so factorial(5) will always return 120 so it gets cached in case it gets used in the future avoiding reinterpretation all over again.

this action get triggered after the parser is done with parsing and analyzing a variable declaration , the call expression factorial(5); is ConstantExpr AST node which has CallExpr as it’s child and because it’s ConstantExpr it was elected to EvaluateAndDiagnoseImmediateInvocation

static void EvaluateAndDiagnoseImmediateInvocation(
    Sema &SemaRef, Sema::ImmediateInvocationCandidate Candidate) {
  llvm::SmallVector<PartialDiagnosticAt, 8> Notes;
  Expr::EvalResult Eval;
  Eval.Diag = &Notes;
  ConstantExpr *CE = Candidate.getPointer();
  //eval here 
  bool Result = CE->EvaluateAsConstantExpr( 
      Eval, SemaRef.getASTContext(), ConstantExprKind::ImmediateInvocation);
  .
  .
  .
  .
  //cache here
  CE->MoveIntoResult(Eval.Val, SemaRef.getASTContext());
}

Now I will talk a little about how the new bytecode interpreter model memory management especially Blocks and Descriptors which are really important to understand how memory blocks, pointers, references, etc.. get modeled and tracked to detect UB.

Blocks: represent a memory storage in program it could be memory block on the stack or heap or even static for globals, static members, etc.. when a block is valid and alive then pointers to it are so and when it’s not then get copied to a Dead block also it’s worth mentioning that blocks representing stack slots are the ones that gets invalidated static ones do not and are alive as long as program is alive, the block tracks the pointer chain to it and when it’s invalidated for reasons like when stack frame InterpStack life ends then blocks gets deallocated and get copied to a dead block which is managed by the interpreter instead of the stack frame then all pointers to the block get invalidated by pointing to the dead block instead, with this mind you can imagine an event like dereferencing a dangling pointer in source program can be detected when ever it happens generating an error due to it being an UB, it’s also worth mentioning that blocks can represent a primitive type storage , array of primitives, array of composites/complex objects, records/complex object. aside from primitives all other blocks have descriptors containing meta-data necessary to track state of elements or fields.
```
  /// A memory block, either on the stack or in the heap.
  ///
  /// The storage described by the block is immediately followed by
  /// optional metadata, which is followed by the actual data.
  ///
  /// Block*        rawData()                  data()
  /// │               │                         │
  /// │               │                         │
  /// ▼               ▼                         ▼
  /// ┌───────────────┬─────────────────────────┬─────────────────┐
  /// │ Block         │ Metadata                │ Data            │
  /// │ sizeof(Block) │ Desc->getMetadataSize() │ Desc->getSize() │
  /// └───────────────┴─────────────────────────┴─────────────────┘
  ///
  /// Desc->getAllocSize() describes the size after the Block, i.e.
  /// the data size and the metadata size.
  ///
  class Block final {
  .
  .
  .
  .
  }
```
Descriptors: represent a description of a memory block so they provide info about a memory location like the size of data, size of metadata, source location info, type of the underlying block (int,float,etc..) and many other info , there are more specialized descriptors for arrays, records and others like:
- GlobalInlineDescriptor: describes a block of static variables
- InlineDescriptor: Used when creating blocks for arrays and records where each element or field data is preceded by such descriptor hence the name ‘inline‘ they have info about elements or field offset in array or record , if it’s const field , if it’s volatile , if it’s immutable etc...
Pointers: I don’t have much to add over clang docs but essentially they model a C++ pointer where it can reference and track block/memory , function, member, intptr, typeid opaque type

Comparing Performance

Under clang or LLVM source tree there are no microbenchmarks comparing the two evaluators but we can do a simple comparison between both evaluators using couple of code samples to see how it affects compilation speed as a whole I will use five samples and not so perfect methodology which is :

Run each sample 5 times
Take the median of the full compilation for each sample (to have rough idea how overall compilation is like)
Take the median of the constant expression evaluation pass EvaluateAsConstantExpr for each sample which is generated by clang using flag -ftime-trace (I know this is not fare since this doesn’t account bytecode compilation time but again this is for giving a rough idea)

I will list the samples first then the table showing the numbers (samples were generated by AI)

Platform: Windows11-X64
Clang Version: 21.1.8
Iterations per test: 5
C++ Standard: C++20
CPU: Raptor-Lake (i913900K)
Compiler Flags: clang++ -std=c++20 -fconstexpr-steps=50000000 -fconstexpr-depth=2048 -ftime-trace -fuse-ld=lld -Xlinker /subsystem:console <SAMPLE>.cpp -o <SAMPLE>.exe

Fibonacci:

#include <cstdint>

consteval uint64_t fib(int n) {
    if (n <= 1) return n;
    return fib(n - 1) + fib(n - 2);
}

consteval auto compute() {
    return fib(25);
}

int main() {
    constexpr auto result = compute();
    return result & 1;
}

Metric	Old Interpreter (median)	New Interpreter (median)	Change
Total Compilation Time	290.58ms	233.22ms	+19.74%
EvaluateAsConstantExpr	108.9ms	79.9ms	+26.62%

Primes:

#include <array>
#include <cstddef>

consteval bool is_prime(int n) {
    if (n < 2) return false;
    for (int i = 2; i * i <= n; ++i) {
        if (n % i == 0) return false;
    }
    return true;
}

consteval auto generate_primes() {
    std::array<int, 1000> primes{};
    int count = 0;
    for (int i = 2; count < 1000; ++i) {
        if (is_prime(i)) {
            primes[count++] = i;
        }
    }
    return primes;
}

int main() {
    constexpr auto primes = generate_primes();
    return primes[999];
}

Metric	Old Interpreter (median)	New Interpreter (median)	Change
Total Compilation Time	377.11ms	352.9ms	+6.42%
EvaluateAsConstantExpr	55.87ms	39.88ms	+28.62%

Matrix:

#include <array>

template<size_t N>
consteval auto matrix_mult() {
    std::array<std::array<int, N>, N> a{}, b{}, result{};

    for (size_t i = 0; i < N; ++i) {
        for (size_t j = 0; j < N; ++j) {
            a[i][j] = i * N + j;
            b[i][j] = j * N + i;
        }
    }

    for (size_t i = 0; i < N; ++i) {
        for (size_t j = 0; j < N; ++j) {
            int sum = 0;
            for (size_t k = 0; k < N; ++k) {
                sum += a[i][k] * b[k][j];
            }
            result[i][j] = sum;
        }
    }

    return result;
}

int main() {
    constexpr auto mat = matrix_mult<50>();
    return mat[49][49] & 0xFF;
}

Metric	Old Interpreter (median)	New Interpreter (median)	Change
Total Compilation Time	1315.54ms	1001.35ms	+23.88%
EvaluateAsConstantExpr	517.68ms	365.78ms	+29.34%

siphash:

#include <array>
#include <string_view>

inline constexpr std::size_t HASH_ITER  = 500'000;

consteval uint64_t rotl64(uint64_t x, int k) {
    return (x << k) | (x >> (64 - k));
}

consteval uint64_t siphash(uint64_t seed) {
    uint64_t v0 = seed ^ 0x736f6d6570736575ULL;
    uint64_t v1 = seed ^ 0x646f72616e646f6dULL;
    uint64_t v2 = seed ^ 0x6c7967656e657261ULL;
    uint64_t v3 = seed ^ 0x7465646279746573ULL;

    for (std::size_t i = 0; i < HASH_ITER; ++i) {
        v0 += v1; v1 = rotl64(v1, 13); v1 ^= v0; v0 = rotl64(v0, 32);
        v2 += v3; v3 = rotl64(v3, 16); v3 ^= v2;
        v0 += v3; v3 = rotl64(v3, 21); v3 ^= v0;
        v2 += v1; v1 = rotl64(v1, 17); v1 ^= v2; v2 = rotl64(v2, 32);

        v0 += v1; v1 = rotl64(v1, 13); v1 ^= v0; v0 = rotl64(v0, 32);
        v2 += v3; v3 = rotl64(v3, 16); v3 ^= v2;
        v0 += v3; v3 = rotl64(v3, 21); v3 ^= v0;
        v2 += v1; v1 = rotl64(v1, 17); v1 ^= v2; v2 = rotl64(v2, 32);
    }
    return v0 ^ v1 ^ v2 ^ v3;
}

int main() {
    constexpr auto hash = siphash(0x42);
    return hash & 0xFF;
}

Metric	Old Interpreter (median)	New Interpreter (median)	Change
Total Compilation Time	26144.88ms	12091.97ms	+53.75%
EvaluateAsConstantExpr	12849ms	5886.13ms	+54.19%

combinatorics:

#include <array>

consteval int64_t factorial(int n) {
    int64_t result = 1;
    for (int i = 2; i <= n; ++i) {
        result *= i;
    }
    return result;
}

consteval int64_t binomial(int n, int k) {
    if (k > n) return 0;
    if (k == 0 || k == n) return 1;
    return factorial(n) / (factorial(k) * factorial(n - k));
}

consteval auto pascal_triangle() {
    constexpr auto t = 21;    
    std::array<std::array<int64_t, t>, t> triangle{};
    for (int n = 0; n < t; ++n) {
        for (int k = 0; k <= n; ++k) {
            triangle[n][k] = binomial(n, k);
        }
    }
    return triangle;
}

int main() {
    constexpr auto triangle = pascal_triangle();
    return triangle[29][15] & 0xFF;
}

Metric	Old Interpreter (median)	New Interpreter (median)	Change
Total Compilation Time	271.93ms	274.58ms	-0.98%
EvaluateAsConstantExpr	3.21ms	3.28ms	-2.18%

Total:

Metric	Average Change
Total Compilation Time	20.56%
EvaluateAsConstantExpr	27.32%

As we can see, the new bytecode interpreter shows significant improvement in most cases. The only sample that showed minor degradation was the combinatorics test. I haven't profiled it deeply, so it could just be environmental noise. While this exercise isn't perfectly scientific, it gives a strong, rough idea of the performance gains and how the new interpreter is poised to drastically improve C++ compile times in the future.

Inside Clang C++ Compile Time Evaluators : AST Evaluator & Bytecode Interpreter

Introduction

Why Two Evaluators ?

AST Evaluator

Bytecode Interpreter

The Compilation

The Interpretation

Comparing Performance

Comments

More from this blog

LLVM-Clang Static Analyzers And SMT Solvers

Implementing C# consteval compile time evaluation — A Deep Dive into Roslyn's Compiler Pipeline

The Chords The Colors The Registers The SSA Optimality

SSA to Stack: Retargeting LLVM to Stack Machines—A Deep Dive Through the WebAssembly Backend

Command Palette

Introduction

Why Two Evaluators ?

AST Evaluator

Bytecode Interpreter

The Compilation

The Interpretation

Comparing Performance

Comments

More from this blog