Using C++23 Ranges to Sort Parallel Arrays

Published on Sep 28, 2023 4 min read

I often encounter variations of the following problem: You are given two or more lists (as e.g. std::vector) of elements and you want to sort them “in parallel”, using one of the lists as keys. I always found the solutions to these kind of problems to be clunky - but with C++23 (or the excellent range-v3 library) we get a very elegant way of doing this.

The Problem

Say you have two std::vector (call them vecA and vecB) filled with elements that “belong together”, i.e., vecA[i] and vecB[i] somehow belong together. You want them to be sorted by the values in vecA, but the elements should stay in sync, i.e., vecA[j] and vecB[j] should later still belong together.

Figure 1: Two parallel arrays being sorted

This problem arises surprisingly often when building high-performance algorithms. Taking an example from my day job (I currently build a public transit router for a living), say you have all of the calls of vehicles at a particular station in a particular day. These calls have various attributes, e.g. an arrival time, a departure time, one or several identifier(s) for the vehicle involved, and some other stuff. One thing your algorithm will need to do often is iterate through these in chronological order of departure time. You could just build a large vector of all of these objects and sort that vector by e.g. departure time.

However, this vector would be rather large because the contained objects are large. This means that your iterative algorithm needs to walk all that memory just to iterate the departure times. This is very inefficient - all that memory needs to be loaded into the CPU. It would be a lot more efficient if you had a - much smaller - vector of only the departure times, then you could just iterate in that smaller vector and when you have found what you are looking for, you could use the index of your found element to access whatever you were interested in in the first place.

This is what is often done in practice: All of the attributes of your objects (i.e., calls in this example) are stored in separate vectors, and these vectors are being kept in sync.

The C++23 / `range-v3` Solution

Let’s get to the point. You’re here for solutions, not problems. This is the neat C++23 / range-v3 solution (obligatory Godbolt link):

 1#include <vector>
 2#include <iostream>
 3#include <ranges>
 4#include <algorithm>
 5
 6int main() {
 7	std::vector<size_t> keys   { 1,   4,   3,   0,   2,   8,   6,   5,   7,   9 };
 8	std::vector<char> values   {'E', 'O', 'L', 'H', 'L', 'L', 'O', 'W', 'R', 'D'};
 9
10	// A zip view creates a range of (x,y) tuple-like objects from one range of x's and one range of
11	// y's
12	auto zip = std::ranges::views::zip(keys, values);
13	// ranges::sort (or std::ranges::sort) basically behaves like std::sort, but does take a range
14	// instead of an iterator pair
15	std::ranges::sort(zip, [](const auto & lhs, const auto & rhs) {
16		// lhs and rhs are elements of the zip range, i.e., 'tuple-like objects'
17		return std::get<0>(lhs) < std::get<0>(rhs);
18	});
19
20	for (char c : values) {
21		std::cout << c;
22	}
23}

Note that the actual sorting only takes four lines, without any “magic”: In line 12, a zip view is created that transforms the two vectors into a single range of tuple-like (well, pair-like in this example) objects, and this is then sorted in lines 15-18. No temporary vectors are created, no objects are (unnecessarily) copied. So elegant!

The end result is that the values vector now contains the characters as sorted by keys (which should spell HELLOWORLD).

Using C++23 Ranges to Sort Parallel Arrays

The Problem

The C++23 / `range-v3` Solution

Comments

Reply to tinloaf's post

Using C++23 Ranges to Sort Parallel Arrays

The Problem

The C++23 / range-v3 Solution

Comments

The C++23 / `range-v3` Solution