How to implement a constant-expression counter in C++

Created Mon, 11 May 2015 11:13:27 +0200
Modified Sat, 30 May 2015 16:05:04 +0200
Tags c++, c++11, constant-expression, stateful meta-programming

Disclaimer: The technique described in this post is primarily meant as "just another clever hack, diving into the dark corners of C++". I do not recommend anyone to incorporate the contents of this post into production code without considering its caveats.

Note: I have not had time to review the contents of this post as much as I have been wanting to; please use the comment section if you find anything that is hard to understand, or misleading.

Introduction

In the previous post we went through the basic idea of adding state to the world of constant-expressions, and with that; adding state to the phase of translation.

The technique described introduced something similar to the intrinsic type bool, only having the additional property of being able to change its value during the compilation of our program (instead of just during runtime).

We were presented with, and solved, the following challenge:

// <insert solution here>
int main () {
  int constexpr a = f ();
  int constexpr b = f ();

  static_assert (a != b, "try again");
}

Note: One could make the above compile using #define f() __LINE__, but that is quite obviously not the sentiment of the post (even though it is a somewhat clever way to circumvent the wording of the problem).

So, what now?

In this post we will take the concept even further, effectively making the following snippet compile with the desired semantics (not firing the static_assert):

// <insert solution here>
int main () {
  constexpr int a = next ();
  constexpr int b = next ();
  constexpr int c = next ();

  static_assert (a == 1 && b == a+1 && c == b+1, "try again");
}

We will also deal with the somewhat hidden caveats of using the technique described, and how to circumvent them in order to write code that follows the guarantees of the ISO C++ Standard (N3290, aka. C++11).

Table of Contents

Prerequisites

Besides the contents of the previous post in this series, one must understand the following concepts in order to fully grasp the upcoming implementation:

Note: The contents of this section is meant to be used as an introduction to the technical aspects related to the solution at the end of this post, as such it could potentially be skipped by more experienced (or eager) readers.

If you are only here for the cake, please, skip to the solution.

The semantics of Argument-Dependent Lookup

When a compiler sees an unqualified-id, a name that does not contain the scope operator (::), as the postfix-expression in a function call; the rules of C++ says that the compiler shall potentially search for additional functions that might not be available in the immediate context of said function call.

The search for additional functions is governed by the types of the arguments involved in the function call itself, where each and every argument potentially gives birth to additional scopes to search in order to find the best viable function.

Consider the following example:

namespace N {
  struct  X { };
  void func (X, int);    // (1)
}
void func (int, int);    // (2)
namespace M {
  struct Y {
    operator int ();
  };

  void func (N::X, int); // (3)
}
int main () {
  N::X x;
  M::Y y;

     func  (x, 0); //           ok, ADL finds  (1)
    (func) (x, 0); //   ill-formed, would call (2)
     func  (x, y); //    ambiguous, ADL finds  (1) and (3)
  M::func  (x, y); // qualified-id, calls      (3)
}

Note: (func) is not a name, but an expression, as such argument dependent lookup doesn't apply.

What are the additional scopes associated with an argument of type T?

Note: A complete list is available at [basic.lookup.argdep]p2 in the ISO C++ Standard, but the below can be used as a reference when dealing with ADL in the most common cases.

If the argument type, T, is;

References: [basic.lookup.argdep]p1-3

The evaluation of default arguments

C++ has two tightly coupled types of default arguments, one associated with template parameters, and the other with function parameters. Even though vastly different in their use case, they have many rules in common.

A default argument is:

The fact that default-arguments are evaluated each time they are used may not come as a shock to anyone. Having code such as the below, we expect "hello world\n" to be printed three times:

int  g () {
  std::cout << "hello world\n";
  return 0;
}
void f (int = g ()) {
  /* intentionally left blank */  
}
int main () {
  for (int i = 0; i < 3; ++i)
    f ();
}

Going further, it is not overly surprising that the below is legal C++ even though it may look ludicrous to the untrained eye:

template<int N>
struct A {
  int arr[-N];
};
template<int N>
void f (int arg = A<N>{}) {
  /* intentionally left blank */
}
int main () {
  f<123> (0);
}

Note: The default-argument for arg is never evaluated nor instantiated, which inherently means that there is no issue with the non-existing instantiation of A<123> (where A<123>::arr would have a negative size) being ill-formed.

References: [dcl.fct.default]p1-9, [temp.deduct]p5, [temp.point]p2, [temp.inst]p10

The gist of SFINAE

Having a set of specializations, for class- or function templates, it is sometimes desired to have different behavior depending on whether some quality of an argument passed into the template has been met.

Let us consider a minor, contrived, example:

namespace detail {
  template<class T, class = char[(sizeof(T) > 4 ? 1 : -1)]>
  constexpr bool is_big (int) {              // ^-.
    return true;                             //   '- the check
  }

  template<class T>
  constexpr bool is_big (float) {
    return false;
  }
}
template<class T>
constexpr bool is_big () {
  return detail::is_big<T> (0); // call most viable function
}
int main () {
  static_assert (is_big<  char> () == false, "!!"); // ok
  static_assert (is_big<double> () ==  true, "!!"); // ok
}

There are two important concepts dealt with in the above example;

  1. Overload Resolution

    Disregarding everything that has to do with templates, we see that we have two functions named is_big in namespace detail, where one accepts an int as its argument, and the other a float.

    bool overloaded (int);   // (1)
    bool overloaded (float); // (2)
    

    The basics of Overload Resolution says that a compiler shall find the best candidate function (considering its parameters) whenever it encounters a call to a function name that is associated with more than one overload.

    In simple terms, the best candidate is the one that would require the least amount of work (conversions, type deduction, etc.), in regard to the arguments passed, in order to be called.

    overloaded (0);          // calls (1)
    

    If there is more than one function that is an equally good fit (ie. requires the same amount of work in order to be called), the program is ill-formed.

    overloaded ('0');        // ill-formed, ambiguity between (1) and (2)
    
  1. Substitution Failure Is Not An Error

    When working with templates we might stumble into cases where a certain specialization can only be instantiated with a certain set of arguments.

    If a failure emerges during deduction or substitution of the template parameters involved would mean that the entire compilation process comes to a halt, we would not be able to express the following:

    template<class T>
    void print (T const& val) {       // (1)
      std::cout << val << std::endl;
    }
    
    template<class T>
    void print (T* ptr) {             // (2)
      print (*ptr);
    }
    
    int main () {
      int const   x = 123;
      int const * p = &x; 
    
      print (x); // calls (1)
      print (p); // calls (2)
    }
    

    Even though there is no way to form a pointer-to-const-T when looking at (2) having print(x) in main, the ISO C++ Standard does not consider the application to be ill-formed – instead it states that such error simply discards the specialization in favor of other viable alternatives.

    Moving on, the standard explicitly states that an array type cannot have a negative size, which further means that detail::is_big<T> (int) can only be instantiated if sizeof(T) > 4:

    namespace detail {
      template<class T, class = char[(sizeof(T) > 4 ? 1 : -1)]>
      constexpr bool is_big (int) {              // ^-.
        return true;                             //   '- the check
      }
    
      template<class T>
      constexpr bool is_big (float) {
        return false;
      }
    }
    

    When the implementation is looking for the function to call in detail::is_big<T> (0), it will try to instantiate every possible alternative, and discard alternatives that are not viable.

    There are two potential outcomes having detail::is_big<T> (0) (depending on type T):

    • If both templates are viable after template argument substitution, the one taking an int is a better match for 0 than the one taking a float and as such; overload resolution will pick the former.

    • If only the latter template can be instantiated, overload resolution simply picks the function taking a float (even though it would require a conversion of the argument passed) — as there are no other viable alternatives.

References: [over.match], [dcl.array]p1, [temp.deduct]p1-9, [temp.deduct.call]p1-6

The fundamental concept of value representation

Note: This section is meant as a light introduction to representing values that cannot fit in one given entity (by using several).

The usage of "value representation" is not to be confused with what the ISO C++ Standard says about value representation (even though the two terms share some similarities).

An object of the intrinsic type bool can be toggled between two different states, it's either false or true, and this is similar to the "variable" we introduced in the previous post.

There is however one important difference; a bool can switch between the two states in whatever order we want, whereas our "variable" can only go from one state to the other - but not back.

If we want to represent values beyond those that can be stored in these entities, we will need to combine several of such to get where we want - and because of their differences, the approach differs.

Representing N states using bools

If we were to represent N different states (where N > 0) using a set of bools (or bits), we would need that set to have a size of at least log2(N).

  N = 4
SET = { bool A, bool B } // size = log2(N) => 2
         A      B
       -----  -----
0 => { false, false }
1 => { false, true  }
2 => {  true, false }
3 => {  true, true  }

Representing N states using constexpr functions

The previous sub-section shows how the value of B toggles from one state back to the other, but as previously stated, our "variable" does not support this concept.

In order to represent N different states by conditionally defining constexpr functions (abbreviated as cexprf in the below), we would need that set of functions to have a size of at least N-1.

  N = 4
SET = { cexprf X, cexprf Y, cexprf Z } // size = N-1 => 3
           X          Y          Z
       ---------  ---------  ---------
0 => { undefined, undefined, undefined }
1 => {   defined, undefined, undefined }
2 => {   defined,   defined, undefined }
3 => {   defined,   defined,   defined }

Solution

Implementation

template<int N>
struct flag {
  friend constexpr int adl_flag (flag<N>);
};
template<int N>
struct writer {
  friend constexpr int adl_flag (flag<N>) {
    return N;
  }

  static constexpr int value = N;
};
template<int N, int = adl_flag (flag<N> {})>
int constexpr reader (int, flag<N>) {
  return N;
}

template<int N>
int constexpr reader (float, flag<N>, int R = reader (0, flag<N-1> {})) {
  return R;
}

int constexpr reader (float, flag<0>) {
  return 0;
}
template<int N = 1>
int constexpr next (int R = writer<reader (0, flag<32> {}) + N>::value) {
  return R;
}
int main () {
  constexpr int a = next ();
  constexpr int b = next ();
  constexpr int c = next ();

  static_assert (a == 1 && b == a+1 && c == b+1, "try again");
}

Note: Because of bugs in vc++ (Visual C++, Microsoft), it will not yield the desired behavior when fed the above snippet; see the appendix for a vc++ specific workaround.

Elaboration

The "variables"

Following what was stated in The fundamental concept of value representation, we quickly realize that in order to represent N different states using the technique described in the previous post, we need variables — plural.

It would be tedious if we, as developers, had to explicitly write out N different names to act as "variables" when implementing our counter. Instead we will use a class template so that the compiler can generate the different entities for us — saving us a tremendous amount of typing.

template<int N>
struct flag {
  friend constexpr int adl_flag (flag<N>);
};

When a specialization of the above primary class template is instantiated, the compiler will effectively introduce a function named int adl_flag (flag<N>) that can only be found through Argument Dependent-Lookup.

The Writers

Having more than one "variable", we of course need more than one "writer", and following the same logic as applied in the previous section; we use a class template to create a definition for adl_flag (flag<N>).

template<int N>
struct writer {
  friend constexpr int adl_flag (flag<N>) {
    return N;
  }

  static constexpr int value = N;
};

Since we need a simple way to force instantiation of writer<N>, each specialization of writer<N> has a static constexpr data-member that simply has the value of the passed non-type template-parameter N.

The Readers

This is by far the trickiest part of the implementation, but it is not as complicated as it may seem at first glance.

In order to query which functions has been given a definition, so that we can get the current count, we will have to walk through the different overloads of adl_flag and check their status; an easy task if we use the rules of overload resolution.

The different overloads of reader will always be invoked with two arguments, the first being int (0), and the second a specialization of template<int> struct flag, as in the below:

reader (0, flag<N> {});

The Helper

The readers has granted us the power to get the current count, the only thing left is to implement a function that will, depending on the current count, instantiate the next writer<N> - effectively increasing the count.

template<int N = 1>
int constexpr next (int R = writer<reader (0, flag<32> {}) + N>::value) {
  return R;
}

We will have the readers start searching at 32, and then work themselves down towards 0, this in turn means that 32 will be the maximum value of our counter.

Note: The implementation could be reversed so that we start searching at 0, and walk ourselves up towards inf – making it possible to have an "infinite" counter.

Since such implementation is somewhat more complex, I will leave it as an exercise for the reader.

Caveats

It is of utter importance that one remembers that what we are currently dealing with involves some quite complex rules of the language. To top it off; some of these rules were not intended to be used as we are (ab)using them.

As always when dealing with complex rules, there are caveats that are easily triggered unless we thoroughly think about what the standard does, and does not, guarantee.

The order of template instantiation

In the previous post we spent a fair share of time talking about the points of instantiation when it comes to templates, and how function templates have more than one point of instantiation.

What we did not explicitly address is the fact that since a function template may have several points of instantation, the relative order of code generation for two different entities are not guaranteed.

template<class, int Dummy = 1>
constexpr int indirection () {
  return next<Dummy> ();
}
int main () {
  constexpr int a = indirection<bool> (); // (1)
  constexpr int b = indirection<void> (); // (2)

  static_assert (a <  b, "!!"); // not guaranteed
  static_assert (a != b, "!!"); // ok
}

Note: Even though (2) appears after (1), a compiler is free to generate code for the bodies of the specializations in any order it feels appropriate.

The order of code generation is normally not observable, but when we are dealing with changing the global state of constant-expressions, it suddenly matters a lot, mainly because we rely on the fact that some change happens before another.

Basically there is no guarantee that a < b yields true in the previous snippet because the order of code generation is implementation-defined.

How to circumvent the issue

The solution is quite simple; don't rely on changes of the global state inside the body of a function template, only use them in contexts where the order of evaluation is guaranteed, such as:

The order of template argument substitution

In C++11 the order of template argument substitution (between non-dependent expressions) is implementation-defined, this means that we can run into cases where we logically think something is guaranteed to happen in some order — even though it is not.

The wording in C++14 changed to make the order defined across implementations, and I have written a Q&A that explains why the order of template argument substitution matters (not only to the technique described in this post):

Translation Units

It is very easy to run into cases where using the technique breaks the One Definition Rule (ODR), if used across different translation units (TU).

The problem boils down to that the technique relies on conditionally providing a definition of a function, by checking if it has already been defined within the current TU.

Since a function can have at most one definition (ODR), and there is no way to check whether a function has a definition in some other TU — one needs to properly consider this when using the technique described.

What about using inline functions?

An alternative to using an anonymous namespace is to rely on the wording that an inline function may have more than one definition, as long as it is the same across different translation units.

3.2p3 One definition rule [basic.def.odr]p3

Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program; no diagnostic required. The definition can appear explicitly in the program, it can be found in the standard or a user-defined library, or (when appropriate) it is implicitly defined (see 12.1, 12.4 and 12.8). An inline function shall be defined in every translation unit in which it is odr-used.

Even though this is a suitable workaround, it is an extra mental barrier that — according to me — is best to be avoided; the easier it is to reason about a program, the better.

How to circumvent the issue

Only use the technique in contexts that are local to the current translation unit, such as by wrapping the boiler plate inside an unanonmyous namespace to prevent the definition(s) from causing conflicts with some other definition for the same name, in some other translation unit.

Further reading:

Conclusion

This post has explained the technique involved to implement a counter that is usable where a constant-expression is required, and the somewhat hidden caveats of using the implementation (if one doesn't properly consider its side-effects (pun intended)).

Together with the previous post, it effectively shows that constant-expressions are not "stateless", and that one can rely on the changing state to solve "impossible" problems.

What's next?

We can view our counter implementation as a way to increment a counter in the global scope. The next post will show how to also associate an arbitrary value associated with each state — effectively making it possible to write something as the below.

using LX = ...;
LX::push<void, void, void, void> ();
LX::set<0, class Hello> ();
LX::set<2, class World> ();
LX::pop ();
LX::value<> x; // type_list<class Hello, void, class World>

Further Down The Road

In upcoming posts we will also dive deep into the ISO C++ Standard to discuss whether the behavior we are (ab)using is something worth keeping around — and how to potentially form wording to prevent the black magic this technique is using.

We will also discuss other techniques that boils down to effectively solving the same sort of problems (word of warning; they are quite complex).

Appendix

Workaround for vc++

vc++ has trouble with the original solution due to the following:

template<int N>
struct flag {
  friend constexpr int adl_flag (flag<N>);
};
template<int N>
struct writer {
  friend constexpr int adl_flag (flag<N>) {
    return N;
  }

  static constexpr int value = N;
};
template<int N, class = char[noexcept(adl_flag(flag<N> ()))?+1:-1]>
int constexpr reader (int, flag<N>) {
  return N;
}
template<int N>
int constexpr reader (float, flag<N>, int R = reader (0, flag<N-1> ())) {
  return R;
}
int constexpr reader (float, flag<0>) {
  return 0;
}
template<int N = 1, int C = reader (0, flag<32> ())>
int constexpr next (int R = writer<C + N>::value) {
  return R;
}
int main () {
  constexpr int a = next ();
  constexpr int b = next ();
  constexpr int c = next ();

  static_assert (a == 1 && b == a+1 && c == b+1, "try again");
}

Additional Credit

First and foremost, I would like to take the opportunity to thank those who have donated to Project Keep-Alive, effectively helping me from going home- — and with that — internet-less.

All donors have asked to remain anonymous; but you know who you are, and I am very thankful for the aid provided! When struggling to put food on the table it is hard to find strength to write posts such as this.

In other words; the contents of this blog would not have been possible without you guys, and once again; thanks!

Shout-outs

Author Filip Roséen
Contact filip.roseen@gmail.com

Note: If you appreciate the contents of this blog, please feel free to make a donation (paypal) to show your support.

Powered by scorpiondata.com