jeudi 4 décembre 2014

C++: erasing elements of std::vector using a lambda

Removing elements from a vector is a task that one can encounter pretty often and that isn't as easy as one could think.

The simplest case is when the index of the unwanted element is known. The std::vector class provides a first form of the erase() member function that takes an (const) iterator as argument.

Thus, if I want to remove, say the 10th element, it's as easy as:

    std::vector<whatever> myVec;
//... fill with more than 10 elements
    myVec.erase( myVec.begin() + 9 );

And if you want to remove the 3 elements between positions 10 and 12, it will be the second form of this function, which has two arguments:

    myVec.erase( myVec.begin() + 9, myVec.begin() + 12 );

(Yes, the second argument defines the first one you want to keep)

But what happens when you want to remove elements based on their value ? Say remove all elements that have value foo (assuming that value is of type whatever).

This is a task for std::remove(). It actually does not remove anything, it just switches element around so that the ones to be erased will be at the end, and it returns an iterator pointing on the first element to be erased. The next step is to feed that iterator to std::vector::erase().

The code will using its second form:

    myVec.erase(
        std::remove(            // returns iterator on
            myVec.begin(),      // first element to
            myVec.end(),        // be removed
            foo
        ),
        myVec.end()
    );

(This is known as the Erase–remove idiom.)

Next, what if you want to remove elements based on some property they have ? Consider for example a vector of vectors:

   std::vector<std::vector<Whatever>> myVec2;

And now the task is to remove elements that hold less than 2 elements. Okay, so we need to check every element, and decide to remove it or not.

This is a task for the second form of that same algorithm, remove_if(). Instead of a value, it takes as third argument a predicate, and will "remove" (move, actually) the considered element if that predicate returns "true". A predicate is usually implemented as a functor, which is an object of some class that defines the operator() and that returns a bool, based on the given value.

At first, this seems like a harsh constraint, as no one wants to declare a class for such a trivial task. But before C++11 came out, that was required (unless, maybe, using some Boost library). Or else, we needed to iterate through the vector and test each element, copy it or not, and swap:

   vector<vector<whatever>> newv;
   newv.reserve( myVec.size() ); // to avoid resizing when using push_back
      for( size_t i=0; i < myVec.size(); i++ )
         if( myVec[i].size()<MinSize )
            newv.push_back( myVec[i] );
   std::swap( myVec, newv );

This is where C++11 and lambdas come in. A lambda can be seen as a sort of "anonymous inline function", that captures variables in scope. Here, as the function iterates over all the elements, each of them will be a std::vector.

A lambda is made of three parts:

  • [how capturing variables happens],
  • (the functions arguments),
  • {The body of the function}.

The complete code:

   std::size_t MinSize = ... (some value);
   myVec2.erase(
      std::remove_if(
         myVec2.begin(),
         myVec2.end(),
         [&]( const std::vector<whatever>& vw ) // lambda
            { return vw.size() < minsize; } 
     ),
     myVec2.end()
   );

More on c++ lambdas.

1 commentaire: