Saturday, August 27, 2016

C++'s syntactic sugars

Lets first look at the Wikipedia definition of syntactic sugar:
Syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express.
For the first syntactic sugar we have to go back in time. In C a[i] is syntactically equivalent to *(a + i) which allows us to write this:
int a[5];
a[2] = 1;
but also this:
3[a] = 5;
Which more than perfectly illustrates what a syntactic sugar is.

C++11 introduced the range-based for loop:
std::vector<int> v = {0, 1, 2, 3, 4, 5};
for (const int& i : v) // access by const reference
    std::cout << i << ' '; 
And it is syntactically equivalent to:

{
auto && __range = range_expression ;
for (auto __begin = begin_expr, __end = end_expr;
__begin != __end; ++__begin) {
range_declaration = *__begin;
loop_statement }
}

where:
  • If range_expression is an expression of built-in array type
    1. begin_expr is  __range
    2. end_expr is (__range + __bound)

  • If range_expression is a class type C that has a member named begin and end  
    1. begin_expr is __range.begin()
    2. end_expr is __range.end();

  • Otherwise
    1. begin_expr is begin(__range)
    2. end_expr is end(__range)
But it was realized that:
The existing range-based for loop is over-constrained. The end iterator is never incremented, decremented, or dereferenced. Requiring it to be an iterator serves no practical purpose." (P0184R0)
and from C++17 the range-based for loop will be syntactically equivalent to:

{
auto && __range = range_expression ;
auto __begin = begin_expr ;
auto __end = end_expr ;
for ( ; __begin != __end; ++__begin) {
range_declaration = *__begin;
loop_statement
}
}

Check this Stack Overflow question for more details of how this is useful.

The incoming C++17 brings us two more syntactic sugars - structured binding and if/switch statement with initializer

Structured bindings looks like this:
auto [x, y, z] = expression;
where this line is syntactically equivalent to one of these tree cases (heavily copy/pasted from the proposal):

Built-in array:
auto __a = expression;

auto x = __a[0];
auto y = __a[1];
auto z = __a[2];
get<> for std::tuple and std::array:

tuple_element<0, decltype(E)>::type x = get<0>(__a);
tuple_element<1, decltype(E)>::type y = get<1>(__a);
tuple_element<2, decltype(E)>::type z = get<2>(__a);

public data for C-style structs and std::pair:

auto x = __a.mem1;
auto y = __a.mem2;
auto z = __a.mem3;

And here are some examples:

tuple<T1, T2, T3>  f();
auto [x, y, z] = f(); // types are: T1, T2, T3

struct mystruct { int i; string s; double d; }; mystruct s = { 1, "xyzzy"s, 3.14 };
auto [x, y, z] = s; // types are: int, string, double



for (auto&& [first,second] : mymap) {
// use first and second
}

auto tuple = std::make_tuple(1, 'a', 2.3);
auto& [ i, c, d ] = tuple; // NB! references to the elements inside the tuple

The if/switch statements with initializer is as follows:

if constexpr(optional) ( init-statement condition )
statement-true
else
statement-false
Is syntactically equivalent to
{
init_statement
if constexpr(optional) ( condition )
statement-true
else

statement-false
}
one benefit is that  the variables declared in the init-statement just like in the for loop are not leaked in the ambient scope:

auto it = m.find(10);
if (it != m.end()) {
return it->size();
} // "it" is leaked into the ambient scope.

if (auto it = m.find(10); it != m.end()) {
return it->size();
} // "it" is destructed and than undefined
And some examples (again heavily copy/pasted from the proposal):
if (std::lock_guard<std::mutex> lock(mx); shared_flag) {
unsafe_ping();
shared_flag = false;
} // on exiting scope the lock_guard's destructor is called and the mutex is unlocked

if (auto it = m.find(10); it != m.end()) {
return it->size();
}

if (status_code c = bar(); c != SUCCESS) {
return c;
}

if (auto [first, second] p = m.try_emplace(key, value); !second) {
FATAL("Element already registered");
} else {
process(second);
}
Same goes for the switch statement:

switch (Foo x = make_foo(); x.status())
{
case Foo::FINE: /* ... */
case Foo::GOOD: /* ... */
case Foo::NEAT: /* ... */
default: /* ... */
} 

Lastly the lambda functions  can also be considered syntactic sugar for functors. Sort of. For example this (adopted from here):
[](X& item){ item.DoTheJob(); }

std::for_each(par, items.begin(), items.end(),
[](X& item){ item.DoTheJob(); });
will probably be replaced with a compiler generated something that looks like this but with even more compiler generated stuff:
class _CompilerGeneratedNotReadable_
{
public:
void operator() (X& item) const
{
item.DoTheJob();
}
}
std::for_each(par, items.begin(), items.end(), _CompilerGeneratedNotReadable_{});
Capturing variables from outside:

[multiplier, &sum](X& item){ sum += item.Width() * multiplier; }


is replaced by something like this:


class _CompilerGeneratedNotReadable_
{
public:
_CompilerGeneratedNotReadable_(int& s, int m) : sum_{s}, multiplier_{m} {}
void operator() (X& item) const
{
sum_ += item.Width() * multiplier_;
}
private:
int& sum_;
int multiplier_;
} 

C++14 introduced generic lambdas:

for_each( begin(v), end(v), [](const auto& x) { cout << x; } );

sort( begin(w), end(w), [](const auto& a, const auto& b) { return *a<*b; } );

auto size = [](const auto& m) { return m.size(); };

Note that the last one works with all classes that have size() method.

The underlying implementation should be something like this. This lambda :

int multiplier = 2, sum = 0;
[multiplier, &sum](auto& item){
sum += item.Width() * multiplier;
}

is replaced by something like this:

class _CompilerGeneratedNotReadable_
{
public:
_CompilerGeneratedNotReadable_(int& sum, int multiplyer) : sum_{s}, multiplier_{m} {}

template<class T>
void operator() (T& item) const
{
sum_ += item.Width() * multiplier_;
}
private:
int& sum_;
int multiplier_;
} 


As a conclusion - know your syntactic sugars. If I missed something please tell me in the comments.

P.S. This article is based on a presentation I did at C++ User Group Sofia Meeting 7

Links:

No comments:

Post a Comment