Created | Mon, 11 May 2015 11:13:27 +0200 |
---|---|
Modified | Wed, 2 Mar 2022 06:55:49 +0100 |
Tags | c++, c++11, constant-expression, stateful meta-programming |
Disclaimer: The technique described in this post is primarily meant as "just another clever hack, diving into the dark corners of C++". I do not recommend anyone to incorporate the contents of this post into production code without considering its caveats.
Note: I have not had time to review the contents of this post as much as I have been wanting to; please use the comment section if you find anything that is hard to understand, or misleading.
In the previous post we went through the basic idea of adding state to the world of constant-expressions, and with that; adding state to the phase of translation.
The technique described introduced something similar to the intrinsic type
bool
, only having the additional property of being able to change its value
during the compilation of our program (instead of just during runtime).
We were presented with, and solved, the following challenge:
// <insert solution here>
int main () {
int constexpr a = f ();
int constexpr b = f ();
static_assert (a != b, "try again");
}
Note: One could make the above compile using
#define f() __LINE__
, but that is quite obviously not the sentiment of the post (even though it is a somewhat clever way to circumvent the wording of the problem).
In this post we will take the concept even further, effectively making the
following snippet compile with the desired semantics (not firing the
static_assert
):
// <insert solution here>
int main () {
constexpr int a = next ();
constexpr int b = next ();
constexpr int c = next ();
static_assert (a == 1 && b == a+1 && c == b+1, "try again");
}
We will also deal with the somewhat hidden caveats of using the technique described, and how to circumvent them in order to write code that follows the guarantees of the ISO C++ Standard (N3290, aka. C++11).
Besides the contents of the previous post in this series, one must understand the following concepts in order to fully grasp the upcoming implementation:
Note: The contents of this section is meant to be used as an introduction to the technical aspects related to the solution at the end of this post, as such it could potentially be skipped by more experienced (or eager) readers.
If you are only here for the cake, please, skip to the solution.
When a compiler sees an unqualified-id, a name that does not contain the
scope operator (::
), as the postfix-expression in a function call; the rules
of C++ says that the compiler shall potentially search for additional functions
that might not be available in the immediate context of said function call.
The search for additional functions is governed by the types of the arguments involved in the function call itself, where each and every argument potentially gives birth to additional scopes to search in order to find the best viable function.
Consider the following example:
namespace N {
struct X { };
void func (X, int); // (1)
}
void func (int, int); // (2)
namespace M {
struct Y {
operator int ();
};
void func (N::X, int); // (3)
}
int main () {
N::X x;
M::Y y;
func (x, 0); // ok, ADL finds (1)
(func) (x, 0); // ill-formed, would call (2)
func (x, y); // ambiguous, ADL finds (1) and (3)
M::func (x, y); // qualified-id, calls (3)
}
Note:
(func)
is not a name, but an expression, as such argument dependent lookup doesn't apply.
T
?Note: A complete list is available at
[basic.lookup.argdep]p2
in the ISO C++ Standard, but the below can be used as a reference when dealing with ADL in the most common cases.
If the argument type, T
, is;
U
or an array of U
, the additional scopes are those
associated with U
.References:
[basic.lookup.argdep]p1-3
C++ has two tightly coupled types of default arguments, one associated with template parameters, and the other with function parameters. Even though vastly different in their use case, they have many rules in common.
A default argument is:
The fact that default-arguments are evaluated each time they are used may not
come as a shock to anyone. Having code such as the below, we expect "hello world\n"
to be printed three times:
int g () {
std::cout << "hello world\n";
return 0;
}
void f (int = g ()) {
/* intentionally left blank */
}
int main () {
for (int i = 0; i < 3; ++i)
f ();
}
Going further, it is not overly surprising that the below is legal C++ even though it may look ludicrous to the untrained eye:
template<int N>
struct A {
int arr[-N];
};
template<int N>
void f (int arg = A<N>{}) {
/* intentionally left blank */
}
int main () {
f<123> (0);
}
Note: The default-argument for
arg
is never evaluated nor instantiated, which inherently means that there is no issue with the non-existing instantiation ofA<123>
(whereA<123>::arr
would have a negative size) being ill-formed.
References:
[dcl.fct.default]p1-9
,[temp.deduct]p5
,[temp.point]p2
,[temp.inst]p10
Having a set of specializations, for class- or function templates, it is sometimes desired to have different behavior depending on whether some quality of an argument passed into the template has been met.
Let us consider a minor, contrived, example:
namespace detail {
template<class T, class = char[(sizeof(T) > 4 ? 1 : -1)]>
constexpr bool is_big (int) { // ^-.
return true; // '- the check
}
template<class T>
constexpr bool is_big (float) {
return false;
}
}
template<class T>
constexpr bool is_big () {
return detail::is_big<T> (0); // call most viable function
}
int main () {
static_assert (is_big< char> () == false, "!!"); // ok
static_assert (is_big<double> () == true, "!!"); // ok
}
There are two important concepts dealt with in the above example;
Overload Resolution
Disregarding everything that has to do with templates, we see that we have
two functions named is_big
in namespace detail
, where one accepts an
int
as its argument, and the other a float
.
bool overloaded (int); // (1)
bool overloaded (float); // (2)
The basics of Overload Resolution says that a compiler shall find the best candidate function (considering its parameters) whenever it encounters a call to a function name that is associated with more than one overload.
In simple terms, the best candidate is the one that would require the least amount of work (conversions, type deduction, etc.), in regard to the arguments passed, in order to be called.
overloaded (0); // calls (1)
If there is more than one function that is an equally good fit (ie. requires the same amount of work in order to be called), the program is ill-formed.
overloaded ('0'); // ill-formed, ambiguity between (1) and (2)
Substitution Failure Is Not An Error
When working with templates we might stumble into cases where a certain specialization can only be instantiated with a certain set of arguments.
If a failure emerges during deduction or substitution of the template parameters involved would mean that the entire compilation process comes to a halt, we would not be able to express the following:
template<class T>
void print (T const& val) { // (1)
std::cout << val << std::endl;
}
template<class T>
void print (T* ptr) { // (2)
print (*ptr);
}
int main () {
int const x = 123;
int const * p = &x;
print (x); // calls (1)
print (p); // calls (2)
}
Even though there is no way to form a pointer-to-const-T
when
looking at (2)
having print(x)
in main
, the ISO C++ Standard does not
consider the application to be ill-formed – instead it states that
such error simply discards the specialization in favor of other viable
alternatives.
Moving on, the standard explicitly states that an array type cannot have
a negative size, which further means that detail::is_big<T> (int)
can
only be instantiated if sizeof(T) > 4
:
namespace detail {
template<class T, class = char[(sizeof(T) > 4 ? 1 : -1)]>
constexpr bool is_big (int) { // ^-.
return true; // '- the check
}
template<class T>
constexpr bool is_big (float) {
return false;
}
}
When the implementation is looking for the function to call in
detail::is_big<T> (0)
, it will try to instantiate every possible
alternative, and discard alternatives that are not viable.
There are two potential outcomes having detail::is_big<T> (0)
(depending
on type T
):
If both templates are viable after template argument substitution, the
one taking an int
is a better match for 0
than the one taking a
float
and as such; overload resolution will pick the former.
If only the latter template can be instantiated, overload resolution
simply picks the function taking a float
(even though it would
require a conversion of the argument passed) — as there are no
other viable alternatives.
References:
[over.match]
,[dcl.array]p1
,[temp.deduct]p1-9
,[temp.deduct.call]p1-6
Note: This section is meant as a light introduction to representing values that cannot fit in one given entity (by using several).
The usage of "value representation" is not to be confused with what the ISO C++ Standard says about value representation (even though the two terms share some similarities).
An object of the intrinsic type bool
can be toggled between two different
states, it's either false
or true
, and this is similar to the "variable" we
introduced in the previous post.
There is however one important difference; a bool
can switch between the two
states in whatever order we want, whereas our "variable" can only go from one
state to the other - but not back.
If we want to represent values beyond those that can be stored in these entities, we will need to combine several of such to get where we want - and because of their differences, the approach differs.
N
states using bools
If we were to represent N
different states (where N > 0
) using a set of
bools
(or bits), we would need that set to have a size of at least log2(N)
.
N = 4
SET = { bool A, bool B } // size = log2(N) => 2
A B
----- -----
0 => { false, false }
1 => { false, true }
2 => { true, false }
3 => { true, true }
N
states using constexpr
functionsThe previous sub-section shows how the value of B
toggles from one state back
to the other, but as previously stated, our "variable" does not support this
concept.
In order to represent N
different states by conditionally defining constexpr
functions (abbreviated as cexprf in the below), we would need that set of
functions to have a size of at least N-1
.
N = 4
SET = { cexprf X, cexprf Y, cexprf Z } // size = N-1 => 3
X Y Z
--------- --------- ---------
0 => { undefined, undefined, undefined }
1 => { defined, undefined, undefined }
2 => { defined, defined, undefined }
3 => { defined, defined, defined }
template<int N>
struct flag {
friend constexpr int adl_flag (flag<N>);
};
template<int N>
struct writer {
friend constexpr int adl_flag (flag<N>) {
return N;
}
static constexpr int value = N;
};
template<int N, int = adl_flag (flag<N> {})>
int constexpr reader (int, flag<N>) {
return N;
}
template<int N>
int constexpr reader (float, flag<N>, int R = reader (0, flag<N-1> {})) {
return R;
}
int constexpr reader (float, flag<0>) {
return 0;
}
template<int N = 1>
int constexpr next (int R = writer<reader (0, flag<32> {}) + N>::value) {
return R;
}
int main () {
constexpr int a = next ();
constexpr int b = next ();
constexpr int c = next ();
static_assert (a == 1 && b == a+1 && c == b+1, "try again");
}
Note: Because of bugs in
vc++
(Visual C++, Microsoft), it will not yield the desired behavior when fed the above snippet; see the appendix for avc++
specific workaround.
Following what was stated in The fundamental concept of value
representation, we quickly realize that in order to represent
N
different states using the technique described in the previous
post, we need variables — plural.
It would be tedious if we, as developers, had to explicitly write out N
different names to act as "variables" when implementing our counter. Instead we
will use a class template so that the compiler can generate the different
entities for us — saving us a tremendous amount of typing.
template<int N>
struct flag {
friend constexpr int adl_flag (flag<N>);
};
When a specialization of the above primary class template is instantiated,
the compiler will effectively introduce a function named int adl_flag (flag<N>)
that can only be found through Argument Dependent-Lookup.
Having more than one "variable", we of course need more than one "writer", and
following the same logic as applied in the previous section; we use a class
template to create a definition for adl_flag (flag<N>)
.
template<int N>
struct writer {
friend constexpr int adl_flag (flag<N>) {
return N;
}
static constexpr int value = N;
};
Since we need a simple way to force instantiation of writer<N>
, each
specialization of writer<N>
has a static
constexpr
data-member that
simply has the value of the passed non-type template-parameter N
.
This is by far the trickiest part of the implementation, but it is not as complicated as it may seem at first glance.
In order to query which functions has been given a definition, so that we can
get the current count, we will have to walk through the different overloads of
adl_flag
and check their status; an easy task if we use the rules of overload
resolution.
The different overloads of reader
will always be invoked with two arguments,
the first being int
(0
), and the second a specialization of template<int> struct flag
, as in the below:
reader (0, flag<N> {});
template<int N, int = adl_flag (flag<N> {})>
int constexpr reader (int, flag<N>) {
return N;
}
Without explicitly passing template-arguments to the above template,
a specialization will only be viable if adl_flag (flag<N> {})
is
usable where a constant-expression is required, in other words; it can
only be called if the appropriate overload of adl_flag
has been defined.
template<int N>
int constexpr reader (float, flag<N>, int R = reader (0, flag<N-1> {})) {
return R;
}
The Matcher accepts an int
as its first argument, whereas The Seacher
takes a float
. Knowing the rules of overload resolution we quickly see
that passing 0
as the first argument will pick the former, if it is
viable, otherwise it will pick the latter.
In case a Searcher is picked, the default-argument specified for R
will be evaluated. This will effectively walk down our different overloads
until it finds a function that will stop the recursion — yielding the
current count.
int constexpr reader (float, flag<0>) {
return 0;
}
Since the searcher walks down the path by looking at the next overload
matching a value less than N
, we must have a base-case so that we do not
run into infinite recursion (in case reader (int, flag<N>)
can never be
instantiated).
The readers has granted us the power to get the current count, the only
thing left is to implement a function that will, depending on the current
count, instantiate the next writer<N>
- effectively increasing the count.
template<int N = 1>
int constexpr next (int R = writer<reader (0, flag<32> {}) + N>::value) {
return R;
}
We will have the readers start searching at 32
, and then work themselves
down towards 0
, this in turn means that 32
will be the maximum value of our
counter.
Note: The implementation could be reversed so that we start searching at
0
, and walk ourselves up towardsinf
– making it possible to have an "infinite" counter.Since such implementation is somewhat more complex, I will leave it as an exercise for the reader.
It is of utter importance that one remembers that what we are currently dealing with involves some quite complex rules of the language. To top it off; some of these rules were not intended to be used as we are (ab)using them.
As always when dealing with complex rules, there are caveats that are easily triggered unless we thoroughly think about what the standard does, and does not, guarantee.
In the previous post we spent a fair share of time talking about the points of instantiation when it comes to templates, and how function templates have more than one point of instantiation.
What we did not explicitly address is the fact that since a function template may have several points of instantation, the relative order of code generation for two different entities are not guaranteed.
template<class, int Dummy = 1>
constexpr int indirection () {
return next<Dummy> ();
}
int main () {
constexpr int a = indirection<bool> (); // (1)
constexpr int b = indirection<void> (); // (2)
static_assert (a < b, "!!"); // not guaranteed
static_assert (a != b, "!!"); // ok
}
Note: Even though
(2)
appears after(1)
, a compiler is free to generate code for the bodies of the specializations in any order it feels appropriate.
The order of code generation is normally not observable, but when we are dealing with changing the global state of constant-expressions, it suddenly matters a lot, mainly because we rely on the fact that some change happens before another.
Basically there is no guarantee that a < b
yields true
in the previous
snippet because the order of code generation is implementation-defined.
The solution is quite simple; don't rely on changes of the global state inside the body of a function template, only use them in contexts where the order of evaluation is guaranteed, such as:
In C++11 the order of template argument substitution (between non-dependent expressions) is implementation-defined, this means that we can run into cases where we logically think something is guaranteed to happen in some order — even though it is not.
The wording in C++14 changed to make the order defined across implementations, and I have written a Q&A that explains why the order of template argument substitution matters (not only to the technique described in this post):
It is very easy to run into cases where using the technique breaks the One Definition Rule (ODR), if used across different translation units (TU).
The problem boils down to that the technique relies on conditionally providing a definition of a function, by checking if it has already been defined within the current TU.
Since a function can have at most one definition (ODR), and there is no way to check whether a function has a definition in some other TU — one needs to properly consider this when using the technique described.
inline
functions?An alternative to using an anonymous namespace is to rely on the wording that
an inline
function may have more than one definition, as long as it is the
same across different translation units.
3.2p3
One definition rule[basic.def.odr]p3
Every program shall contain exactly one definition of every non-
inline
function or variable that is odr-used in that program; no diagnostic required. The definition can appear explicitly in the program, it can be found in the standard or a user-defined library, or (when appropriate) it is implicitly defined (see 12.1, 12.4 and 12.8). Aninline
function shall be defined in every translation unit in which it is odr-used.
Even though this is a suitable workaround, it is an extra mental barrier that — according to me — is best to be avoided; the easier it is to reason about a program, the better.
Only use the technique in contexts that are local to the current translation unit, such as by wrapping the boiler plate inside an unanonmyous namespace to prevent the definition(s) from causing conflicts with some other definition for the same name, in some other translation unit.
Further reading:
This post has explained the technique involved to implement a counter that is usable where a constant-expression is required, and the somewhat hidden caveats of using the implementation (if one doesn't properly consider its side-effects (pun intended)).
Together with the previous post, it effectively shows that constant-expressions are not "stateless", and that one can rely on the changing state to solve "impossible" problems.
We can view our counter implementation as a way to increment a counter in the global scope. The next post will show how to also associate an arbitrary value associated with each state — effectively making it possible to write something as the below.
using LX = ...;
LX::push<void, void, void, void> ();
LX::set<0, class Hello> ();
LX::set<2, class World> ();
LX::pop ();
LX::value<> x; // type_list<class Hello, void, class World>
In upcoming posts we will also dive deep into the ISO C++ Standard to discuss whether the behavior we are (ab)using is something worth keeping around — and how to potentially form wording to prevent the black magic this technique is using.
We will also discuss other techniques that boils down to effectively solving the same sort of problems (word of warning; they are quite complex).
vc++
vc++
has trouble with the original solution due to the following:
vc++
does not have proper support for expression SFINAEvc++
does not support aggregate-initialization in constant-expressionsvc++
does not re-evaluate expressions used as template-arguments in function parameter default-argumentstemplate<int N>
struct flag {
friend constexpr int adl_flag (flag<N>);
};
template<int N>
struct writer {
friend constexpr int adl_flag (flag<N>) {
return N;
}
static constexpr int value = N;
};
template<int N, class = char[noexcept(adl_flag(flag<N> ()))?+1:-1]>
int constexpr reader (int, flag<N>) {
return N;
}
template<int N>
int constexpr reader (float, flag<N>, int R = reader (0, flag<N-1> ())) {
return R;
}
int constexpr reader (float, flag<0>) {
return 0;
}
template<int N = 1, int C = reader (0, flag<32> ())>
int constexpr next (int R = writer<C + N>::value) {
return R;
}
int main () {
constexpr int a = next ();
constexpr int b = next ();
constexpr int c = next ();
static_assert (a == 1 && b == a+1 && c == b+1, "try again");
}
First and foremost, I would like to take the opportunity to thank those who have donated to Project Keep-Alive, effectively helping me from going home- — and with that — internet-less.
All donors have asked to remain anonymous; but you know who you are, and I am very thankful for the aid provided! When struggling to put food on the table it is hard to find strength to write posts such as this.
In other words; the contents of this blog would not have been possible without you guys, and once again; thanks!
Mara Bos
, for:rm -rf
-ing the whole
thing (because of Writer's Block).Didrik Nordström
, for:"Columbu"
, for:Author | Filip Roséen |
---|---|
Contact | filip.roseen@gmail.com |
Note: If you appreciate the contents of this blog, please feel free to make a donation (paypal) to show your support.