Mapping between combinations and indices: A combinatoric puzzle

By .
Posted at 20 October 2024, 6:51. Last updated at 22 October 2024, 1:21.

Consider the set S={0,…,Nβˆ’1} of N values. In this article, we solve the following combinatorial problem: given a value M, 0≀M≀N, find two functions to_position and from_position such that

  1. to_position maps each subset πšƒβŠ†S holding exactly M values (|πšƒ|=M) to a unique index in the range 0,…,xβˆ’1 with x the total number of distinct subsets of S holding exactly M values; and

  2. from_position maps indices in the range 0,…,xβˆ’1 back to the subset πšƒβŠ†S holding exactly M with from_position(to_position(T)) == T.

The construction of highly-efficient versions of both to_position and from_position boils down to solving a combinatoric puzzle.

First, in SectionΒ 1, we will provide a high-level sketch of my use-case for the functions to_position and from_position. Next, in SectionΒ 2, we detail the necessary mathematical background to solve our combinatoric puzzle. Finally, in SectionΒ 3 we present our solution.

Background

Consider a costly function complex_task that takes as input N optional values of which exactly M, M≀N, need to be provided. Based on only which M input values are provided (e.g., the first, third, and fifth value), function complex_task will first compute a helper object helper that will detail how to deal with these specific M input values. Next, function complex_task will use helper together with the provided input values to compute the result of the function. In the following code fragment, we sketch the function complex_task:

auto complex_task(std::span<std::optional<value_t>, N> values)
{
    /* Check which @{M} values are present in @{values}. After this loop,
     * @{indices[i]} is true if the @{i}-th input in @{values} is available. */
    std::bitset<N> indices = {};
    for (size_t i = 0; i < N; ++i) {
        indices.set(i, values[i].has_value());
    }
    assert(indices.count() == M);

    /* Compute a helper object to deal with the provided input values. */
    auto helper = compute_helper(indices);

    /* Solve the actual problem using this helper. */        
    return solve_problem(values, helper);
}

In this function, we represent the values that are provided by a bitset indices in which the i-th bit is set if the i-th input value is provided. As one can see in the sketch, the helper object helper is computed using only the information in the bitset indices that details which input values in values are present.

The function complex_task will be used very frequently (thousands of times per second). Hence, we are interested in optimizing this function. Inspection of the function shows that the computation performed by compute_helper, the computation of the helper object helper, is rather taxing. Luckily, helper is only dependent on which input values are providedβ€”as represented by indicesβ€”and not on the content of these input values. Hence, to optimize complex_task, we can pre-compute all possible helper objects once and use these pre-computed values.

As an initial attempt toward optimizing complex_task, we can try something like the following:

using mapping_t = std::unordered_map<std::bitset<N>, helper_t>;
const mapping_t pre_computed_helpers = compute_helpers();

auto complex_task(std::span<std::optional<value_t>, N> values)
{
    /* Check which @{M} values are present in @{values}. After this loop,
     * @{indices[i]} is true if the @{i}-th input in @{values} is available. */
    std::bitset<N> indices = {};
    for (size_t i = 0; i < N; ++i) {
        indices.set(i, values[i].has_value());
    }
    assert(indices.count() == M);

    /* Solve the actual problem using this helper. */        
    return solve_problem(values, pre_computed_helpers.at(indices));
}

In the above, compute_helpers is a function that computes a mapping between possible values for indices and the corresponding helper objects. The following sketch illustrates how we can implement compute_helpers if we can enumerate all possible N-bit bitsets.

auto compute_helpers()
{
    mapping_t result;
    for (/* each possible @{N}-bit bitset indices */) {
        if (indices.count() == M) {
            result.emplace(indices, compute_helper(indices));
        }
    }
    return result;
}

To complete compute_helpers, we will provide an initial approach toward enumerating over all N-bit bitsets. In C++, bitsets can be constructed out of unsigned long long integers. Assume, for example, that unsigned long long is a 32-bit value. In this case, each unsigned long long value is a sequence b31b30…b0 of 32 bits that represents the value b0β‹…20+b1β‹…21+…+b31β‹…231. For example, the value 357 has the binary representation 101100101 and, indeed, 1β‹…20+1β‹…22+1β‹…25+1β‹…26++1β‹…28=357. In this representation, all N-bit bitsets, N<32, are represented by the values 0,1,…,2Nβˆ’1. In addition, using the binary representation of unsigned values to our advantage, we can compute 2N by setting the bit bN using a bit-shift 1ull << N. Hence, for sufficiently-small values of N, we can complete compute_helpers as follows:

auto compute_helpers()
{
    static_assert(N < std::numeric_limits<unsigned long long>::digits,
                  "N not sufficiently small!");

    mapping_t result;
    for (unsigned long long v = 0; v < (1ull << N); ++v) {
        std::bitset<N> indices(v);
        if (indices.count() == M) {
            result.emplace(indices, compute_helper(indices));
        }
    }
    return result;
}

We note that the above program is not necessary very fast: compute_helpers will check 2N distinct values. Hence, the worst-case runtime complexity of compute_helpers is at-least Ξ©(2N). For example, with N=30, compute_helpers will already check 230=1073741824 distinct values v, which is more than a billion values! Assuming N is not too large, the complexity of compute_helpers is not necessarily a huge issue, however: we only need to call compute_helpers once. Still, later in this article we will develop a much more efficient version of compute_helpers.

The optimization outlined above will reduces the cost of complex_task significantly. We can do better, however: std::unordered_map requires a considerable amount of storage overhead to store our helper objects (namely the entire hash table structure used by std::map internally); requires multiple memory operations to find a specific helper object (namely to look up the hash bucket and to search in that bucket); and requires dynamic memory allocation, thereby ruling out the capability to pre-compute pre_computed_helpers at compile time via a constexpr function. Similar drawbacks apply if we would switch to std::map (a binary search tree).

Instead of using standard dictionaries such as std::map or std::unordered_map with their inherent drawbacks, we are looking for a perfect solution. Assume pre_computed_helpers will end up holding x key-value pairs. A perfect representation of pre_computed_helpers should aim for the following three properties:

  1. Store the minimal amount of data we can potentially store: exactly x helper objects. To do so, we want to represent pre_computed_helpers by an array holding exactly x helper objects (without storing any other data).

  2. Perform the minimal number of memory operations to look up a helper object for a given bitset indices. To do so, we want a function that can turn bitset indices into the position p such that pre_computed_helpers[p] is the helper obect corresponding to indices.

  3. A highly-efficient function to construct pre_computed_helpers that only considers x distinct bitsets (instead of the 2N bitsets considered by compute_helpers above). In addition, pre_computed_helpers is preferably constructed at compile time if the helper objects support compile time construction.

For the second and third property, we will use the functions to_position and from_position we described in the introduction. The construction of these functions requires an elementary background in combinatorics, which we present next.

An overview of elementary combinatorics

Consider a set S={v1,…,vN} of N distinct values. First, we consider the number of distinct sequences one can create out of set S. When creating a single such sequence (permutation), we have N options for the first value in the sequence. After choosing the first value, we have Nβˆ’1 options left for the second value in the sequence. Likewise, after choosing the second value, we have Nβˆ’2 options left. We continue until we end up with a single option, which becomes the last value in our sequence. Hence, we can make N!=Nβ‹…(Nβˆ’1)β‹…(Nβˆ’2)⋅…⋅1 distinct sequences of length N out of the set S. These N! distinct sequences are typically called the permutations of S. As an example, consider the four values {a,b,c,d}. We have a total of 4!=24 permutations: [a,b,c,d],[a,b,d,c],[a,c,b,d],[a,c,d,b],[a,d,b,c],[a,d,c,b],[b,a,c,d],[b,a,d,c],[b,c,a,d],[b,c,d,a],[b,d,a,c],[b,d,c,a],[c,a,b,d],[c,a,d,b],[c,b,a,d],[c,b,d,a],[c,d,a,b],[c,d,b,a],[d,a,b,c],[d,a,c,b],[d,b,a,c],[d,b,c,a],[d,c,a,b],[d,c,b,a].

Next, we consider the number distinct sequences of length M, M≀N, we can create out of S. The construction of such a sequence of length M is similar to the construction of a permutation: we repeatedly choose one of the values from S until the sequence has M values, after which we still have Nβˆ’M values left in S. Hence, we can make P(N,M)=Nβ‹…(Nβˆ’1)β‹…(Nβˆ’2)⋅…⋅(Nβˆ’M+1) distinct sequences of length M out of the set S. These P(N,M) distinct sequences are typically called the M-permutations of S. As an example, consider the four values {a,b,c,d}. We have a total of P(4,2)=4β‹…3=12 two-permutations of {a,b,c,d}: [a,b],[a,c],[a,d],[b,a],[b,c],[b,d],[c,a],[c,b],[c,d],[d,a],[d,b],[d,c]. Note that P(N,M) is equivalent to the first M terms of N!: the remaining terms of N! are (Nβˆ’M)!=(Nβˆ’M)β‹…(Nβˆ’Mβˆ’1)β‹…(Nβˆ’Mβˆ’2)⋅…⋅1. Hence, P(N,M)=N!(Nβˆ’M)!.

Lastly, we consider the number of distinct subsets we can create out of set S holding M values, M≀N. In sets, we do not consider the order of values. Hence, the number of subsets of M values is not equal to the number of M-permutations P(N,M).

Now consider an M-permutation p of set S. This permutation p holds a set PβŠ†S of |P|=M values. As P is a set of |P|=M values, we can construct M! permutations of P. Each of these M! permutations holds the same values as P and, hence, represents the same set of M values. Hence, we can make (NM)=P(N,M)M!=Nβ‹…(Nβˆ’1)β‹…(Nβˆ’2)⋅…⋅(Nβˆ’M+1)Mβ‹…(Mβˆ’1)β‹…(Mβˆ’2)⋅…⋅1=N!M!(Nβˆ’M)!. distinct subsets out of set S. These (NM) distinct subsets are typically called the M-combinations of S. As an example, consider the four values {a,b,c,d}. We have a total of (42)=6 two-combinations of {a,b,c,d}: {a,b},{a,c},{a,d},{b,c},{b,d},{c,d}.

Solving our combinatorics puzzle

Our functions to_position and from_position will map between N-bit bitsets indices and positions in {0,…,xβˆ’1} with x the number of distinct values for indices. Note that each of these N-bit bitsets represents an M-combination of the values {0,…,Nβˆ’1}. Hence, we have x=(NM) and our functions will have the following declarations:

/*
 * Return a unique position for the @{M}-combination of values 0, ..., @{N} - 1
 * represented by @{indices}. Requires @{indices.count() == M}.
 */
constexpr std::size_t to_position(std::bitset<N> indices);

/*
 * Return the unique @{M}-combination of values 0, ..., @{N} - 1 represented by
 * the position @{pos}. Requires @{0 <= pos < x} with @{x} the number of
 * @{M}-combinations of a set of @{N} values.
 */
constexpr std::bitset<N> from_position(std::size_t pos);

First, we need to decide which position we wish to assign to a given bitset indices. We will simply assign an ordering on all possible bitsets and assign positions with respect to this order. In specific, we will use the following ordering: a bitset X comes-before bitset Y if X≠Y and the largest value in X\Y is smaller than the largest value in Y\X. For example, {5,3,2} comes-before {6,3,2} (because 5 is smaller than 6) and {7,3,1} comes-before {7,3,2} (because 1 is smaller than 2).

The function to_position

First, we look at the function to_position. We start with a recursive version to_position_rec with the following declaration:

/*
 * Assume that @{m <= n <= N} and @{m <= M}. Return a unique position for the
 * m-combination of values 0, ..., @{n} - 1 represented by @{indices}. Requires
 * @{indices.count() == m}.
 */
std::size_t to_position_rec(std::bitset<N> indices, unsigned n, unsigned m);

One calls this recursive function initially with n=N and m=M. The base case for this recursive function is the case n=m. In this case, (nm)=(nn)=1: there is only one n-combination out of a set of n values. Hence, in the base case, we can assign the position 0.

Next, we consider the recursive cases. In the recursive case, we only look at whether the value nβˆ’1 is represented by the bitset indices. We distinguish two cases:

  1. The value nβˆ’1 is in indices.

    As nβˆ’1 is the largest value we are considering, the position of any bitset that holds value nβˆ’1 must come after the position of all bitsets that do not hold value nβˆ’1. The bitsets that do not hold value nβˆ’1 represent m-combinations over the set {0,…,nβˆ’2} of nβˆ’1 values. Hence, we have a total of (nβˆ’1m) such bitsets.

    As such, we start the position of all bitsets that hold value nβˆ’1 at pbase=(nβˆ’1m). Next, to further determine the position of the bitset represented by indices, we have to determine a suitable position relative to pbase for the remaining mβˆ’1 values in indices. These remaining mβˆ’1 values are taken from the set {0,…,nβˆ’2} of nβˆ’1 values. Hence, the remaining mβˆ’1 values in indices represent an (mβˆ’1)-combination of the values {0,…,nβˆ’2}. As such, we can use a recursive call to_position_rec(rem, n - 1, m - 1), with rem the bitset representing the remaining mβˆ’1 values in indices, to compute the suitable position that we will use relative to pbase.

  2. The value nβˆ’1 is not in indices.

    As argued in the previous case, the bitsets that do not hold value nβˆ’1 represent m-combinations over the set {0,…,nβˆ’2} of nβˆ’1 values and we have a total of (nβˆ’1m) such bitsets. As the previous case ensures that all bitsets that do hold value nβˆ’1 start at position pbase=(nβˆ’1m), we can simply compute a position in this case by interpreting indices as an m-combination over the set {0,…,nβˆ’2}. As such, we can use a recursive call to_position_rec(indices, n - 1, m) to compute a suitable position for this bitset.

The above recursive construction leads to the following implementation:

/*
 * Assume that @{m <= n <= N} and @{m <= M}. Return a unique position for the
 * @{m}-combination of values 0, ..., @{n} - 1 represented by @{indices}.
 * Requires @{indices.count() == m}.
 */
constexpr std::size_t to_position_rec(std::bitset<N> indices,
                                      unsigned n, unsigned m)
{
    assert(m <= n && n <= N && m <= M && M <= N);
    assert(indices.count() == m);

    if (m == n) {
        /* Base case. */
        return 0;
    }
    else {
        /* Recursive cases.  */
        if (indices.test(n - 1)) {
            auto rem = indices;
            rem.set(n - 1, false);
            auto p_base = combinations(n - 1, m);
            return p_base + to_position_rec(rem, n - 1, m - 1);
        }
        else {
            return to_position_rec(indices, n - 1, m);
        }
    }
}

The above function requires a function combinations to compute (nβˆ’1m). Based on the definition given in SectionΒ 2, we derive

/*
 * Compute the number of subsets of size @{m} of a set of @{n} values, according
 * to the definition of @{m}-combinations.
 *
 * This version only works for small values of @{n} and @{m}, as otherwise the
 * computation will produce values that do not fit in the type @{std::size_t}.
 * Later we will look at a version of this function that can better deal with
 * large inputs.
 */
constexpr std::size_t combinations(unsigned n, unsigned m)
{
    assert(m <= n);
    const auto max_v = std::numeric_limits<std::size_t>::max();

    std::size_t numerator = 1;
    std::size_t denominator = 1;
    while (m >= 1) {
        assert(numerator <= (max_v / n)); 
        numerator *= n--;
        denominator *= m--;
    }
    return numerator / denominator;
}

We can simplify the above function to_position_rec in two ways. First, we notice that there is no need to construct rem: we inspect each value only once. Second, we can easily replace the recursion by a simple loop. By doing so, we obtain:

constexpr std::size_t to_position(std::bitset<N> indices)
{
    assert(M <= N);
    assert(indices.count() == M);

    auto n = N;
    auto m = M;

    std::size_t pos = 0;
    while (m != n) {
        if (indices.test(n - 1)) {
            pos += combinations(n - 1, m);
            --n;
            --m;
        }
        else {
            --n;
        }
    }
    return pos;
}

The complexity of this function is easy to determine: we inspect all N bits once and we call combinations up-to-M times. The complexity of the call combinations(n - 1, m) is entirely determined by the parameter m: the function combinations performs m steps in its main loop. The up-to-M calls to combinations use values m=M,(Mβˆ’1),(Mβˆ’2),…,1. Hence, the total complexity of this version of to_position is π’ͺ(N+M2).

We can further reduce the cost of to_position to π’ͺ(N+M) by eliminating the repeated calls to combinations. We can do so by maintaining the value (nm) during the algorithm. Assume we have a variable n_choose_m holding the value of (nm). We need to update n_choose_m in two cases:

  1. The value nβˆ’1 is in indices.

    In this case, the if-case in the loop decreases the value of both n and m by one. Hence, we need to compute (nβˆ’1mβˆ’1) from (nm). We have (nm)=nβ‹…(nβˆ’1)β‹…(nβˆ’2)⋅…⋅(nβˆ’m+1)mβ‹…(mβˆ’1)β‹…(mβˆ’2)⋅…⋅1=nmβ‹…(nβˆ’1mβˆ’1). Hence, we have (nβˆ’1mβˆ’1)=(nm)β‹…mn.

    In addition, in this case we need the value (nβˆ’1m). We have (nβˆ’1m)=(nβˆ’1)β‹…(nβˆ’2)⋅…⋅(nβˆ’m+1)β‹…(nβˆ’m)mβ‹…(mβˆ’1)β‹…(mβˆ’2)⋅…⋅1. Hence, we have nβ‹…(nβˆ’1m)=(nm)β‹…(nβˆ’m) and we conclude (nβˆ’1m)=(nm)β‹…(nβˆ’m)n.

  2. The value nβˆ’1 is not in indices.

    In this case, the else-case in the loop decreases the value of n by one. Hence, we need to compute (nβˆ’1m) from (nm). In the previous case, we already showed that: (nβˆ’1m)=(nm)β‹…(nβˆ’m)n.

We apply the above strategy to maintain (nm) to to_position and we obtain:

constexpr std::size_t to_position(std::bitset<N> indices)
{
    assert(M <= N);
    assert(indices.count() == M);

    auto n = N;
    auto m = M;

    std::size_t pos = 0;
    std::size_t n_choose_m = combinations(n, m);
    while (m != n) {
        auto nd_choose_m = (n_choose_m * (n - m)) / n;
        auto nd_choose_md = (n_choose_m * m) / n;
        if (indices.test(n - 1)) {
            pos += nd_choose_m;
            n_choose_m = nd_choose_md;
            --n;
            --m;
        }
        else {
            n_choose_m = nd_choose_m;
            --n;
        }
    }
    return pos;
}

This improved and final version of to_position has a complexity of only π’ͺ(N+M)=π’ͺ(N).

The function from_position

Next, we look at the function from_position. The design of this function mirrors the design of to_position. In specific, given a position pos, we will reconstruct the bitset indices by trying to add values starting at the largest value Nβˆ’1 and ending with 0. Consider we try to add the maximum value Nβˆ’1. We again distinguish two cases:

  1. The value Nβˆ’1 must be added to indices.

    By the construction detailed the previous section, we must have (Nβˆ’1M)β‰€πš™πš˜πšœ. The remaining position πš™πš˜πšœβˆ’(Nβˆ’1M) is the position of the (Mβˆ’1)-combination over the set {0,1,…,Nβˆ’2} of all Mβˆ’1 values less-than Nβˆ’1 that must be added to indices.

  2. The value Nβˆ’1 must not be added to indices.

    By the construction detailed the previous section, we must have πš™πš˜πšœ<(Nβˆ’1M). The position pos is the position of the M-combination over the set {0,1,…,Nβˆ’2} of all M values less-than Nβˆ’1 that must be added to indices.

We apply the above strategy and we obtain:

constexpr std::bitset<N> from_position(std::size_t pos)
{
    assert(pos < combinations(N, M));

    auto n = N;
    auto m = M;

    std::bitset<N> indices;
    std::size_t n_choose_m = combinations(n, m);
    while (m != 0) {
        auto nd_choose_m = (n_choose_m * (n - m)) / n;
        auto nd_choose_md = (n_choose_m * m) / n;
        if (nd_choose_m <= pos) {
            pos -= nd_choose_m;
            indices.set(n - 1, true);
            n_choose_m = nd_choose_md;
            --n;
            --m;
        }
        else {
            n_choose_m = nd_choose_m;
            --n;
        }
    }
    return indices;
}

This only and final version of from_position has a complexity of only π’ͺ(N+M)=π’ͺ(N).

An efficient function compute_helpers

The original compute_helpers of SectionΒ 1 inspects all 2N subsets of {0,…,Nβˆ’1} for a worst-case running time of at-least Ξ©(2N). For any non-trivial value of N, this will take unreasonably long. Luckily, we can use the tools we just developed to implement compute_helpers more efficiently.

We observe that we must compute x=(NM) helper objects. These helper objects will correspond with positions 0,…,xβˆ’1. For each of these positions, we can compute the corresponding bitset using from_position. We obtain:

constexpr auto compute_helpers()
{
    constexpr auto x = combinations(N, M);
    std::array<helper_t, x> result;

    for (auto pos = 0; pos < x; ++pos) {
        result[pos] = compute_helper(from_position(pos));
    }

    return result;
}

The complexity of this version of compute_helpers is π’ͺ(xβ‹…N) which, for most values of M, is significantly better than π’ͺ(2N).

Computing large values (NM)

The function combinations of SectionΒ 3.1 is very limited: the computation of numerator and denominator will quickly overflow and not fit in a fixed-size unsigned integer. Indeed, upon completion the value denominator will hold m!. For m=13, we already have m!=6227020800, which would not fit in a 32-bit unsigned integer, and for m=21, we have m!=51090942171709440000, which would not fit in a 64-bit unsigned integer. The value numerator is even larger than the value denominator. Still, there are plenty of cases in which (nm) is a small value, while the function of SectionΒ 3.1 will fail to compute it. For example, (3525)=183579396, which easily fits in a 32-bit unsigned integer, but cannot be computed by combinations even if std::size_t is a 64-bit unsigned integer type.

To see how we can improve combinations, we consider the computation of (3525). By the definition of (3525), we have (3525)=35β‹…34β‹…33⋅…⋅1125β‹…24β‹…23⋅…⋅1. Assume we are computing the numerator v and denominator w of this fraction using 32-bit unsigned integers. Hence, the maximum value v and w can hold is 232βˆ’1=4294967295. With this maximum, we can only compute the first six terms of v, as we have 35β‹…34β‹…33β‹…32β‹…31β‹…30=1168675200. After computing the first six terms of v and w, we have v=1168675200 and w=25β‹…24β‹…23β‹…22β‹…21β‹…20=127512000 and we have (3525)=vβ‹…29β‹…28β‹…27⋅…⋅11wβ‹…20β‹…19β‹…18⋅…⋅1=vwβ‹…(29β‹…28β‹…27⋅…⋅1119β‹…18β‹…17⋅…⋅1).

We can simplify the above fraction by reducing the values v and w. Remember, we can simplify a any fraction by removing common terms from their numerator and denominator. For example, we can simplify the fraction 210462 to 210462=105231 by dividing both the numerator 210 and the denominator 462, which are both dividable by two, by two. Likewise, we can further simplify 210462=105231=3577=511 by dividing both the numerator and the denominator by three and seven. We cannot simplify 511 further, however, as the 5 and 11 do not have a common divisor.

We can apply the same simplification to the computation of (3525): we simplify the term vw by dividing both v and w by a large common divisor. Note that in this case v and w are guaranteed to have a large common divisor: as both v and w are the multiplication of six consecutive integers, at-least three of these consecutive integers are divisible by two, two are divisible by three, and one is divisible by five. Hence, we can at-least use the common divisor 23β‹…32β‹…5=360. This is not the largest common divisor of v and w, however: the largest possible divisor is known as the greatest common divisor, which can be computed using Euclid’s Algorithm (or any of the other well-known greatest common divisor algorithms). Conveniently, C++ already provides an algorithm to compute the greatest common divisor via std::gcd.

As it turns out, the greatest common divisor of v and w is gcd(v,w)=1108800. Hence, after dividing both v and w by 1108800, we can continue the computation with a few more terms without overflowing either v or w.

Note that the computation of (3525) involves the multiplication of 25 terms. We can reduce this to only ten terms using the definition of (NM). Let K=Nβˆ’M. We have M=Nβˆ’K and, by the definition of (NM), we have (NM)=N!M!(Nβˆ’M!)=N!(Nβˆ’M)!M!=N!K!(Nβˆ’K)!=(NK)=(NNβˆ’M). Hence, (3525)=(3535βˆ’25)=(3510), the computation of which only involves ten terms.

Applying the above two ideas will lead to the following improved variant of combinations:

/*
 * Compute the number of subsets of size @{m} of a set of n values, according to
 * the definition of @{m}-combinations. Return @{0} when the combination cannot
 * be represented by a fixed-sized value.
 */
constexpr std::size_t combinations(unsigned n, unsigned m)
{
    /* Internally, we use the largest unsigned integer type availabe to reduce
     * the number of overflows of @{numerator}. */
    const auto max_v = std::numeric_limits<std::uintmax_t>::max();

    std::uintmax_t numerator = 1;
    std::uintmax_t denominator = 1;

    /* Minimize the number of terms we will compute. */
    m = std::min(m, n - m);

    while (m >= 1) {

        /* Check whether multiplying would overflow @{numerator}. */
        if ((max_v / n) < numerator) {
            /* Multiplying the @{numerator} by @{n} would overflow. We remove
             * the common divisor @{cd} from @{numerator} and @{denominator}. */
            auto cd = std::gcd(numerator, denominator);
            numerator /= cd;
            denominator /= cd;

            /* Check whether removing the common divisor prevents overflow. */
            if ((max_v / n) < numerator) {
                return 0;
            }
        }

        numerator *= n--;
        denominator *= m--;
    }
    
    auto result = numerator / denominator;
    
    /* Check whether the result fits in the output type. */
    const auto max_size_v = std::numeric_limits<std::size_t>::max();
    if (max_size_v < result) {
        return 0;
    }
    else {
        return result;
    }
}

We note that the resulting function combinations still cannot compute all possible values (nm): these values can easily grow very large. For example, (3618)=9075135300 does not fit in a 32-bit unsigned integer and (6834)=28453041475240576740 does not fit in a 64-bit unsigned integer. To be able to compute and represent (nm) for arbitrary large values of n and m, one will require a custom arbitrary large integer representation.

The full solution

Below we provide the full solution in a single program. In this full solution, we have made a single change: we made the maximum value for N a template parameter and we made the values N and M function arguments. By doing so, to_position and from_position can be used with arbitrary values of N and M. To facilitate the usage of this solution, I release all code on this page under the 2-Clause BSD License:

/*
 * Copyright (c) 2024 Jelle Hellings
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 *
 * 1. Redistributions of source code must retain the above copyright notice,
 *    this list of conditions and the following disclaimer.
 *
 * 2. Redistributions in binary form must reproduce the above copyright notice,
 *    this list of conditions and the following disclaimer in the documentation
 *    and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 * POSSIBILITY OF SUCH DAMAGE.
 */



/* First, the functions combinations, to_position, and from_position. */
#include <algorithm>
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <bitset>
#include <limits>
#include <numeric>


/* By default, we only support up-to-64-bit bitsets. */
constexpr std::size_t default_max_N = 64;


/*
 * Compute the number of subsets of size @{m} of a set of @{n} values, according
 * to the definition of @{m}-combinations. Return @{0} when the combination
 * cannot be represented by a fixed-sized value.
 */
constexpr std::size_t combinations(unsigned n, unsigned m)
{
    /* Internally, we use the largest unsigned integer type availabe to reduce
     * the number of overflows of @{numerator}. */
    const auto max_v = std::numeric_limits<std::uintmax_t>::max();

    std::uintmax_t numerator = 1;
    std::uintmax_t denominator = 1;

    /* Minimize the number of terms we will compute. */
    m = std::min(m, n - m);

    while (m >= 1) {

        /* Check whether multiplying would overflow @{numerator}. */
        if ((max_v / n) < numerator) {
            /* Multiplying the @{numerator} by @{n} would overflow. We remove
             * the common divisor @{cd} from @{numerator} and @{denominator}. */
            auto cd = std::gcd(numerator, denominator);
            numerator /= cd;
            denominator /= cd;

            /* Check whether removing the common divisor prevents overflow. */
            if ((max_v / n) < numerator) {
                return 0;
            }
        }

        numerator *= n--;
        denominator *= m--;
    }
    
    auto result = numerator / denominator;
    
    /* Check whether the result fits in the output type. */
    const auto max_size_v = std::numeric_limits<std::size_t>::max();
    if (max_size_v < result) {
        return 0;
    }
    else {
        return result;
    }
}


/*
 * Return a unique position for the @{m}-combination of values 0, ..., @{n} - 1
 * represented by @{indices}. Requires @{indices.count() == m}.
 */
template<std::size_t MaxN = default_max_N>
constexpr std::size_t to_position(std::bitset<MaxN> indices,
                                  unsigned n, unsigned m)
{
    assert(n <= MaxN);
    assert(m <= n);
    assert(indices.count() == m);

    std::size_t pos = 0;
    std::size_t n_choose_m = combinations(n, m);
    assert(n_choose_m != 0);

    while (m != n) {
        auto nd_choose_m = (n_choose_m * (n - m)) / n;
        auto nd_choose_md = (n_choose_m * m) / n;
        if (indices.test(n - 1)) {
            pos += nd_choose_m;
            n_choose_m = nd_choose_md;
            --n;
            --m;
        }
        else {
            n_choose_m = nd_choose_m;
            --n;
        }
    }
    return pos;
}


/*
 * Return the unique @{m}-combination of values 0, ..., @{n} - 1 represented by
 * the position @{pos}. Requires @{0 <= pos < x} with x the number of
 * @{m}-combinations of a set of @{n} values.
 */
template<std::size_t MaxN = default_max_N>
constexpr std::bitset<MaxN> from_position(std::size_t pos,
                                          unsigned n, unsigned m)
{
    assert(n <= MaxN);
    assert(pos < combinations(n, m));

    std::bitset<MaxN> indices;
    std::size_t n_choose_m = combinations(n, m);
    assert(n_choose_m != 0);

    while (m != 0) {
        auto nd_choose_m = (n_choose_m * (n - m)) / n;
        auto nd_choose_md = (n_choose_m * m) / n;
        if (nd_choose_m <= pos) {
            pos -= nd_choose_m;
            indices.set(n - 1, true);
            n_choose_m = nd_choose_md;
            --n;
            --m;
        }
        else {
            n_choose_m = nd_choose_m;
            --n;
        }
    }
    return indices;
}



/* Next, the the demonstration application involving complex_task. */
#include <array>
#include <optional>
#include <span>
#include <string>


/* A dummy value type, just as an example. */
using value_t = std::string;

/* A dummy helper type, just as an example. This helper can store an internal
 * representation of a bitset. */
using helper_t = unsigned long long;


/*
 * A dummy function to construct a helper for indices @{indices}.
 */
template<std::size_t N>
constexpr helper_t compute_helper(std::bitset<N> indices)
{
    /* Return an internal representation of the indices @{indices}. */
    return indices.to_ullong();
}


/*
 * The dummy problem being solved.
 */
template<std::size_t N>
auto solve_problem(std::span<std::optional<value_t>, N> values,
                   const helper_t& helper)
{
    /* Simply return the helper, which represent the bitset @{indices}. */
    return helper;
}


/*
 * Pre-compute the helpers used by @{complex_task}. Note that here the template
 * parameters @{N} and @{M} are required as the size of the resulting array
 * is a function of these parameters.
 */
template<std::size_t N, std::size_t M>
constexpr auto compute_helpers()
{
    constexpr auto x = combinations(N, M);
    std::array<helper_t, x> result;

    for (auto pos = 0; pos < x; ++pos) {
        result[pos] = compute_helper(from_position(pos, N, M));
    }

    return result;
}


/* The pre-computed helpers used by @{complex_task}. */
template<std::size_t N, std::size_t M>
constexpr auto pre_computed_helpers = compute_helpers<N, M>();


/*
 * The complex task that takes @{N} optional values, of which @{M} must be
 * provided.
 */
template<std::size_t N, std::size_t M>
requires (N != std::dynamic_extent)
auto complex_task(std::span<std::optional<value_t>, N> values)
{
    /* Check which @{M} values are present in @{values}. After this loop,
     * @{indices[i]} is true if the @{i}-th input in @{values} is available. */
    std::bitset<N> indices = {};
    for (size_t i = 0; i < N; ++i) {
        indices.set(i, values[i].has_value());
    }
    assert(indices.count() == M);

    /* Compute a helper object to deal with the provided input values. */
    auto helper = pre_computed_helpers<N, M>[to_position(indices, N, M)];

    /* Solve the actual problem using this helper. */        
    return solve_problem(values, helper);
}



/* Finally, a main entry point demonstrating this library. */
#include <iostream>

int main(int argc, char* argv[])
{
    /* First, lets illustrate that the library works. */
    const std::size_t N = 7;
    const std::size_t M = 3;
    auto x = combinations(N, M);

    /* We enumerate each M-combination of 0, 1, ..., 6. */
    std::cout << "There are "  << x <<
                 " 3-combination of 0, 1, ..., 6."
                 " These combinations are:" << std::endl;
    for (auto pos = 0; pos < x; ++pos) {
        auto indices = from_position(pos, N, M);
        
        std::cout << " * the " << pos << "-th combination is: ";
        auto delim = "{";
        for (auto index = 0; index < N; ++index) {
            if (indices.test(index)) {
                std::cout << delim << index;
                delim = ", ";
            }
        }
        std::cout << "}, ";
        
        std::cout << " (position: "
                  << to_position(indices, N, M)
                  << ")" << std::endl;
    }
    std::cout << std::endl;
    
    /* Now lets see if @{complex_task} works. */
    std::array<std::optional<value_t>, N> vs = {
        std::nullopt,
        "second",
        std::nullopt,
        "third",
        "fourth",
        std::nullopt,
        std::nullopt
    };

    /* The dummy solve_problem simply returns a representation of the bitset
     * representing the values in the above array @{vs}. Hence, the below should
     * print the 7-bit value @{0011010}. */
    std::cout << std::bitset<N>{complex_task<N, M>(vs)} << std::endl;
}
Join the discussion