How to implement a constant-expression counter in C++

Created	Mon, 11 May 2015 11:13:27 +0200
Modified	Wed, 2 Jun 2021 14:15:00 +0200
Tags	c++, c++11, constant-expression, stateful meta-programming

Disclaimer: The technique described in this post is primarily meant as "just another clever hack, diving into the dark corners of C++". I do not recommend anyone to incorporate the contents of this post into production code without considering its caveats.

Note: I have not had time to review the contents of this post as much as I have been wanting to; please use the comment section if you find anything that is hard to understand, or misleading.

Introduction

In the previous post we went through the basic idea of adding state to the world of constant-expressions, and with that; adding state to the phase of translation.

The technique described introduced something similar to the intrinsic type bool, only having the additional property of being able to change its value during the compilation of our program (instead of just during runtime).

We were presented with, and solved, the following challenge:

// <insert solution here>

int main () {
  int constexpr a = f ();
  int constexpr b = f ();

  static_assert (a != b, "try again");
}

Note: One could make the above compile using #define f() __LINE__, but that is quite obviously not the sentiment of the post (even though it is a somewhat clever way to circumvent the wording of the problem).

So, what now?

In this post we will take the concept even further, effectively making the following snippet compile with the desired semantics (not firing the static_assert):

// <insert solution here>

int main () {
  constexpr int a = next ();
  constexpr int b = next ();
  constexpr int c = next ();

  static_assert (a == 1 && b == a+1 && c == b+1, "try again");
}

We will also deal with the somewhat hidden caveats of using the technique described, and how to circumvent them in order to write code that follows the guarantees of the ISO C++ Standard (N3290, aka. C++11).

Introduction
Prerequisites
Solution
- Implementation
- Elaboration
Caveats
Conclusion
Appendix

Prerequisites

Besides the contents of the previous post in this series, one must understand the following concepts in order to fully grasp the upcoming implementation:

Argument-Dependent Lookup (ADL)
The evaluation of default arguments
The gist of SFINAE
The fundamental concept of value representation

Note: The contents of this section is meant to be used as an introduction to the technical aspects related to the solution at the end of this post, as such it could potentially be skipped by more experienced (or eager) readers.

If you are only here for the cake, please, skip to the solution.

The semantics of Argument-Dependent Lookup

When a compiler sees an unqualified-id, a name that does not contain the scope operator (::), as the postfix-expression in a function call; the rules of C++ says that the compiler shall potentially search for additional functions that might not be available in the immediate context of said function call.

The search for additional functions is governed by the types of the arguments involved in the function call itself, where each and every argument potentially gives birth to additional scopes to search in order to find the best viable function.

Consider the following example:

namespace N {
  struct  X { };
  void func (X, int);    // (1)
}

void func (int, int);    // (2)

namespace M {
  struct Y {
    operator int ();
  };

  void func (N::X, int); // (3)
}

int main () {
  N::X x;
  M::Y y;

     func  (x, 0); //           ok, ADL finds  (1)
    (func) (x, 0); //   ill-formed, would call (2)
     func  (x, y); //    ambiguous, ADL finds  (1) and (3)
  M::func  (x, y); // qualified-id, calls      (3)
}

Note: (func) is not a name, but an expression, as such argument dependent lookup doesn't apply.

What are the additional scopes associated with an argument of type `T`?

Note: A complete list is available at [basic.lookup.argdep]p2 in the ISO C++ Standard, but the below can be used as a reference when dealing with ADL in the most common cases.

If the argument type, T, is;

an intrinsic type, there will be no additional scopes to search.
a pointer to U or an array of U, the additional scopes are those associated with U.
a function type, the additional scopes include:
- the scopes associated with its parameter types, and;
- the scopes associated with its return type.
an enumeration type, the additional scopes include:
- the namespace in which it is declared, and;
- the class in which it is a member (if any).
a class type (or pointer to such), the additional scopes include:
- in case it is a template specialization;
  - the scopes associated with each template parameter of the specialization,
- the class itself,
- the class of which it is a member (if any),
- its direct and indirect base classes (if any), and;
- the namespaces in which the class, and its bases (if any), are declared.

References: [basic.lookup.argdep]p1-3

The evaluation of default arguments

C++ has two tightly coupled types of default arguments, one associated with template parameters, and the other with function parameters. Even though vastly different in their use case, they have many rules in common.

A default argument is:

an expression (not a value),
evaluated each time it is required,
cannot be specified for a template parameter pack, and;
in case it refers to a dependent name;
- such name is not instantiated unless used.

The fact that default-arguments are evaluated each time they are used may not come as a shock to anyone. Having code such as the below, we expect "hello world\n" to be printed three times:

int  g () {
  std::cout << "hello world\n";
  return 0;
}

void f (int = g ()) {
  /* intentionally left blank */  
}

int main () {
  for (int i = 0; i < 3; ++i)
    f ();
}

Going further, it is not overly surprising that the below is legal C++ even though it may look ludicrous to the untrained eye:

template<int N>
struct A {
  int arr[-N];
};

template<int N>
void f (int arg = A<N>{}) {
  /* intentionally left blank */
}

int main () {
  f<123> (0);
}

Note: The default-argument for arg is never evaluated nor instantiated, which inherently means that there is no issue with the non-existing instantiation of A<123> (where A<123>::arr would have a negative size) being ill-formed.

References: [dcl.fct.default]p1-9, [temp.deduct]p5, [temp.point]p2, [temp.inst]p10

The gist of SFINAE

Having a set of specializations, for class- or function templates, it is sometimes desired to have different behavior depending on whether some quality of an argument passed into the template has been met.

Let us consider a minor, contrived, example:

namespace detail {
  template<class T, class = char[(sizeof(T) > 4 ? 1 : -1)]>
  constexpr bool is_big (int) {              // ^-.
    return true;                             //   '- the check
  }

  template<class T>
  constexpr bool is_big (float) {
    return false;
  }
}

template<class T>
constexpr bool is_big () {
  return detail::is_big<T> (0); // call most viable function
}

int main () {
  static_assert (is_big<  char> () == false, "!!"); // ok
  static_assert (is_big<double> () ==  true, "!!"); // ok
}

There are two important concepts dealt with in the above example;

Overload Resolution

Disregarding everything that has to do with templates, we see that we have two functions named is_big in namespace detail, where one accepts an int as its argument, and the other a float.
```
bool overloaded (int);   // (1)
bool overloaded (float); // (2)
```
The basics of Overload Resolution says that a compiler shall find the best candidate function (considering its parameters) whenever it encounters a call to a function name that is associated with more than one overload.

In simple terms, the best candidate is the one that would require the least amount of work (conversions, type deduction, etc.), in regard to the arguments passed, in order to be called.
```
overloaded (0);          // calls (1)
```
If there is more than one function that is an equally good fit (ie. requires the same amount of work in order to be called), the program is ill-formed.
```
overloaded ('0');        // ill-formed, ambiguity between (1) and (2)
```
Substitution Failure Is Not An Error

When working with templates we might stumble into cases where a certain specialization can only be instantiated with a certain set of arguments.

If a failure emerges during deduction or substitution of the template parameters involved would mean that the entire compilation process comes to a halt, we would not be able to express the following:
```
template<class T>
void print (T const& val) {       // (1)
  std::cout << val << std::endl;
}
```
```
template<class T>
void print (T* ptr) {             // (2)
  print (*ptr);
}
```
```
int main () {
  int const   x = 123;
  int const * p = &x; 

  print (x); // calls (1)
  print (p); // calls (2)
}
```
Even though there is no way to form a pointer-to-const-T when looking at (2) having print(x) in main, the ISO C++ Standard does not consider the application to be ill-formed – instead it states that such error simply discards the specialization in favor of other viable alternatives.

Moving on, the standard explicitly states that an array type cannot have a negative size, which further means that detail::is_big<T> (int) can only be instantiated if sizeof(T) > 4:
```
namespace detail {
  template<class T, class = char[(sizeof(T) > 4 ? 1 : -1)]>
  constexpr bool is_big (int) {              // ^-.
    return true;                             //   '- the check
  }

  template<class T>
  constexpr bool is_big (float) {
    return false;
  }
}
```
When the implementation is looking for the function to call in detail::is_big<T> (0), it will try to instantiate every possible alternative, and discard alternatives that are not viable.

There are two potential outcomes having detail::is_big<T> (0) (depending on type T):
- If both templates are viable after template argument substitution, the one taking an int is a better match for 0 than the one taking a float and as such; overload resolution will pick the former.
- If only the latter template can be instantiated, overload resolution simply picks the function taking a float (even though it would require a conversion of the argument passed) — as there are no other viable alternatives.

References: [over.match], [dcl.array]p1, [temp.deduct]p1-9, [temp.deduct.call]p1-6

The fundamental concept of value representation

Note: This section is meant as a light introduction to representing values that cannot fit in one given entity (by using several).

The usage of "value representation" is not to be confused with what the ISO C++ Standard says about value representation (even though the two terms share some similarities).

An object of the intrinsic type bool can be toggled between two different states, it's either false or true, and this is similar to the "variable" we introduced in the previous post.

There is however one important difference; a bool can switch between the two states in whatever order we want, whereas our "variable" can only go from one state to the other - but not back.

If we want to represent values beyond those that can be stored in these entities, we will need to combine several of such to get where we want - and because of their differences, the approach differs.

Representing `N` states using `bools`

If we were to represent N different states (where N > 0) using a set of bools (or bits), we would need that set to have a size of at least log2(N).

  N = 4
SET = { bool A, bool B } // size = log2(N) => 2

         A      B
       -----  -----
0 => { false, false }
1 => { false, true  }
2 => {  true, false }
3 => {  true, true  }

Representing `N` states using `constexpr` functions

The previous sub-section shows how the value of B toggles from one state back to the other, but as previously stated, our "variable" does not support this concept.

In order to represent N different states by conditionally defining constexpr functions (abbreviated as cexprf in the below), we would need that set of functions to have a size of at least N-1.

  N = 4
SET = { cexprf X, cexprf Y, cexprf Z } // size = N-1 => 3

           X          Y          Z
       ---------  ---------  ---------
0 => { undefined, undefined, undefined }
1 => {   defined, undefined, undefined }
2 => {   defined,   defined, undefined }
3 => {   defined,   defined,   defined }

Solution

Implementation

template<int N>
struct flag {
  friend constexpr int adl_flag (flag<N>);
};

template<int N>
struct writer {
  friend constexpr int adl_flag (flag<N>) {
    return N;
  }

  static constexpr int value = N;
};

template<int N, int = adl_flag (flag<N> {})>
int constexpr reader (int, flag<N>) {
  return N;
}

template<int N>
int constexpr reader (float, flag<N>, int R = reader (0, flag<N-1> {})) {
  return R;
}

int constexpr reader (float, flag<0>) {
  return 0;
}

template<int N = 1>
int constexpr next (int R = writer<reader (0, flag<32> {}) + N>::value) {
  return R;
}

int main () {
  constexpr int a = next ();
  constexpr int b = next ();
  constexpr int c = next ();

  static_assert (a == 1 && b == a+1 && c == b+1, "try again");
}

Note: Because of bugs in vc++ (Visual C++, Microsoft), it will not yield the desired behavior when fed the above snippet; see the appendix for a vc++ specific workaround.

Elaboration

The "variables"

Following what was stated in The fundamental concept of value representation, we quickly realize that in order to represent N different states using the technique described in the previous post, we need variables — plural.

It would be tedious if we, as developers, had to explicitly write out N different names to act as "variables" when implementing our counter. Instead we will use a class template so that the compiler can generate the different entities for us — saving us a tremendous amount of typing.

template<int N>
struct flag {
  friend constexpr int adl_flag (flag<N>);
};

When a specialization of the above primary class template is instantiated, the compiler will effectively introduce a function named int adl_flag (flag<N>) that can only be found through Argument Dependent-Lookup.

The Writers

Having more than one "variable", we of course need more than one "writer", and following the same logic as applied in the previous section; we use a class template to create a definition for adl_flag (flag<N>).

template<int N>
struct writer {
  friend constexpr int adl_flag (flag<N>) {
    return N;
  }

  static constexpr int value = N;
};

Since we need a simple way to force instantiation of writer<N>, each specialization of writer<N> has a static constexpr data-member that simply has the value of the passed non-type template-parameter N.

The Readers

This is by far the trickiest part of the implementation, but it is not as complicated as it may seem at first glance.

In order to query which functions has been given a definition, so that we can get the current count, we will have to walk through the different overloads of adl_flag and check their status; an easy task if we use the rules of overload resolution.

The different overloads of reader will always be invoked with two arguments, the first being int (0), and the second a specialization of template<int> struct flag, as in the below:

reader (0, flag<N> {});

The Match
```
template<int N, int = adl_flag (flag<N> {})>
int constexpr reader (int, flag<N>) {
  return N;
}
```
Without explicitly passing template-arguments to the above template, a specialization will only be viable if adl_flag (flag<N> {}) is usable where a constant-expression is required, in other words; it can only be called if the appropriate overload of adl_flag has been defined.
The Searcher
```
template<int N>
int constexpr reader (float, flag<N>, int R = reader (0, flag<N-1> {})) {
  return R;
}
```
The Matcher accepts an int as its first argument, whereas The Seacher takes a float. Knowing the rules of overload resolution we quickly see that passing 0 as the first argument will pick the former, if it is viable, otherwise it will pick the latter.

In case a Searcher is picked, the default-argument specified for R will be evaluated. This will effectively walk down our different overloads until it finds a function that will stop the recursion — yielding the current count.
The Base
```
int constexpr reader (float, flag<0>) {
  return 0;
}
```
Since the searcher walks down the path by looking at the next overload matching a value less than N, we must have a base-case so that we do not run into infinite recursion (in case reader (int, flag<N>) can never be instantiated).

The Helper

The readers has granted us the power to get the current count, the only thing left is to implement a function that will, depending on the current count, instantiate the next writer<N> - effectively increasing the count.

template<int N = 1>
int constexpr next (int R = writer<reader (0, flag<32> {}) + N>::value) {
  return R;
}

We will have the readers start searching at 32, and then work themselves down towards 0, this in turn means that 32 will be the maximum value of our counter.

Note: The implementation could be reversed so that we start searching at 0, and walk ourselves up towards inf – making it possible to have an "infinite" counter.

Since such implementation is somewhat more complex, I will leave it as an exercise for the reader.

Caveats

It is of utter importance that one remembers that what we are currently dealing with involves some quite complex rules of the language. To top it off; some of these rules were not intended to be used as we are (ab)using them.

As always when dealing with complex rules, there are caveats that are easily triggered unless we thoroughly think about what the standard does, and does not, guarantee.

The order of template instantiation

In the previous post we spent a fair share of time talking about the points of instantiation when it comes to templates, and how function templates have more than one point of instantiation.

What we did not explicitly address is the fact that since a function template may have several points of instantation, the relative order of code generation for two different entities are not guaranteed.

template<class, int Dummy = 1>
constexpr int indirection () {
  return next<Dummy> ();
}

int main () {
  constexpr int a = indirection<bool> (); // (1)
  constexpr int b = indirection<void> (); // (2)

  static_assert (a <  b, "!!"); // not guaranteed
  static_assert (a != b, "!!"); // ok
}

Note: Even though (2) appears after (1), a compiler is free to generate code for the bodies of the specializations in any order it feels appropriate.

The order of code generation is normally not observable, but when we are dealing with changing the global state of constant-expressions, it suddenly matters a lot, mainly because we rely on the fact that some change happens before another.

Basically there is no guarantee that a < b yields true in the previous snippet because the order of code generation is implementation-defined.

How to circumvent the issue

The solution is quite simple; don't rely on changes of the global state inside the body of a function template, only use them in contexts where the order of evaluation is guaranteed, such as:

As the default-argument for template- and function parameters,
directly inside class template specializations (these have at most one point of instantiation),
in the declaration of a function template, or;
in the declaration or definition of a non-template function.

The order of template argument substitution

In C++11 the order of template argument substitution (between non-dependent expressions) is implementation-defined, this means that we can run into cases where we logically think something is guaranteed to happen in some order — even though it is not.

The wording in C++14 changed to make the order defined across implementations, and I have written a Q&A that explains why the order of template argument substitution matters (not only to the technique described in this post):

Why does the order of template argument substitution matter?

Translation Units

It is very easy to run into cases where using the technique breaks the One Definition Rule (ODR), if used across different translation units (TU).

The problem boils down to that the technique relies on conditionally providing a definition of a function, by checking if it has already been defined within the current TU.

Since a function can have at most one definition (ODR), and there is no way to check whether a function has a definition in some other TU — one needs to properly consider this when using the technique described.

What about using `inline` functions?

An alternative to using an anonymous namespace is to rely on the wording that an inline function may have more than one definition, as long as it is the same across different translation units.

3.2p3 One definition rule [basic.def.odr]p3

Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program; no diagnostic required. The definition can appear explicitly in the program, it can be found in the standard or a user-defined library, or (when appropriate) it is implicitly defined (see 12.1, 12.4 and 12.8). An inline function shall be defined in every translation unit in which it is odr-used.

Even though this is a suitable workaround, it is an extra mental barrier that — according to me — is best to be avoided; the easier it is to reason about a program, the better.

How to circumvent the issue

Only use the technique in contexts that are local to the current translation unit, such as by wrapping the boiler plate inside an unanonmyous namespace to prevent the definition(s) from causing conflicts with some other definition for the same name, in some other translation unit.

Conclusion

This post has explained the technique involved to implement a counter that is usable where a constant-expression is required, and the somewhat hidden caveats of using the implementation (if one doesn't properly consider its side-effects (pun intended)).

Together with the previous post, it effectively shows that constant-expressions are not "stateless", and that one can rely on the changing state to solve "impossible" problems.

What's next?

We can view our counter implementation as a way to increment a counter in the global scope. The next post will show how to also associate an arbitrary value associated with each state — effectively making it possible to write something as the below.

using LX = ...;

LX::push<void, void, void, void> ();
LX::set<0, class Hello> ();
LX::set<2, class World> ();
LX::pop ();

LX::value<> x; // type_list<class Hello, void, class World>

Further Down The Road

In upcoming posts we will also dive deep into the ISO C++ Standard to discuss whether the behavior we are (ab)using is something worth keeping around — and how to potentially form wording to prevent the black magic this technique is using.

We will also discuss other techniques that boils down to effectively solving the same sort of problems (word of warning; they are quite complex).

Appendix

Workaround for `vc++`

vc++ has trouble with the original solution due to the following:

vc++ does not have proper support for expression SFINAE
vc++ does not support aggregate-initialization in constant-expressions
vc++ does not re-evaluate expressions used as template-arguments in function parameter default-arguments

template<int N>
struct flag {
  friend constexpr int adl_flag (flag<N>);
};

template<int N>
struct writer {
  friend constexpr int adl_flag (flag<N>) {
    return N;
  }

  static constexpr int value = N;
};

template<int N, class = char[noexcept(adl_flag(flag<N> ()))?+1:-1]>
int constexpr reader (int, flag<N>) {
  return N;
}

template<int N>
int constexpr reader (float, flag<N>, int R = reader (0, flag<N-1> ())) {
  return R;
}

int constexpr reader (float, flag<0>) {
  return 0;
}

template<int N = 1, int C = reader (0, flag<32> ())>
int constexpr next (int R = writer<C + N>::value) {
  return R;
}

int main () {
  constexpr int a = next ();
  constexpr int b = next ();
  constexpr int c = next ();

  static_assert (a == 1 && b == a+1 && c == b+1, "try again");
}

Additional Credit

First and foremost, I would like to take the opportunity to thank those who have donated to Project Keep-Alive, effectively helping me from going home- — and with that — internet-less.

All donors have asked to remain anonymous; but you know who you are, and I am very thankful for the aid provided! When struggling to put food on the table it is hard to find strength to write posts such as this.

In other words; the contents of this blog would not have been possible without you guys, and once again; thanks!

Shout-outs

Mara Bos, for:
- Helping me with the wording of this post.
- Keeping my spirit up when I sometimes have felt like rm -rf-ing the whole thing (because of Writer's Block).
Didrik Nordström, for:
- Asking very interesting questions regarding the contents of the post.
- Keeping me company during late nights at The Royal Insitutite of Technology.
"Columbu", for:
- In a way not quoting the standard and throwing standardese in my face, though this post is not as techinical as the previous one, I appreciate his review of the contents.

Author	Filip Roséen
Contact	filip.roseen@gmail.com

How to implement a constant-expression counter in C++

Introduction

So, what now?

Table of Contents

Prerequisites

The semantics of Argument-Dependent Lookup

What are the additional scopes associated with an argument of type `T`?

The evaluation of default arguments

The gist of SFINAE

The fundamental concept of value representation

Representing `N` states using `bools`

Representing `N` states using `constexpr` functions

Solution

Implementation

Elaboration

The "variables"

The Writers

The Readers

The Match

The Searcher

The Base

The Helper

Caveats

The order of template instantiation

How to circumvent the issue

The order of template argument substitution

Translation Units

What about using `inline` functions?

`3.2p3` One definition rule `[basic.def.odr]p3`

How to circumvent the issue

Conclusion

What's next?

Further Down The Road

Appendix

Workaround for `vc++`

Additional Credit

Shout-outs

`Mara Bos`, for:

`Didrik Nordström`, for:

`"Columbu"`, for:

How to implement a constant-expression counter in C++

Introduction

So, what now?

Table of Contents

Prerequisites

The semantics of Argument-Dependent Lookup

What are the additional scopes associated with an argument of type T?

The evaluation of default arguments

The gist of SFINAE

The fundamental concept of value representation

Representing N states using bools

Representing N states using constexpr functions

Solution

Implementation

Elaboration

The "variables"

The Writers

The Readers

The Match

The Searcher

The Base

The Helper

Caveats

The order of template instantiation

How to circumvent the issue

The order of template argument substitution

Translation Units

What about using inline functions?

3.2p3 One definition rule [basic.def.odr]p3

How to circumvent the issue

Conclusion

What's next?

Further Down The Road

Appendix

Workaround for vc++

Additional Credit

Shout-outs

Mara Bos, for:

Didrik Nordström, for:

"Columbu", for:

What are the additional scopes associated with an argument of type `T`?

Representing `N` states using `bools`

Representing `N` states using `constexpr` functions

What about using `inline` functions?

`3.2p3` One definition rule `[basic.def.odr]p3`

Workaround for `vc++`

`Mara Bos`, for:

`Didrik Nordström`, for:

`"Columbu"`, for: