Question

Is it legal to initialize an array via a functor which takes the array itself as a parameter by reference?

In the question Idiom for initializing an std::array using a generator function taking the index?,which basically asks how one can initialize an array of an arbitrary type which is not necessarily default constructible, I came up with the following (highly unorthodox) solution.

#include <cstddef>
#include <utility>
#include <array>
#include <iostream>

struct Int{
    int v;
    Int(int v):v{v}{}
};

int main()
{
    auto gen = [](size_t i) { return  Int(11*(i+1)); };
    std::array<Int, 500000> arr = [&arr, &gen](){
        for(std::size_t i=0; i < arr.size(); i++)
            new (arr.data() + i) Int(gen(i));
        return arr;
    }();
    for(auto i : arr) { std::cout << i.v << ' ';}
    std::cout << '\n';
}

In the solution, a functor (in this case a lambda, but I'm interested in the general case) is used to initialize the array. The functor takes the array to be initialized by reference, constructs the non-default-constructible elements via placement new, and then returns the array.

I am not entirely sure about whether this is really legitimate or not. GCC, Clang and MSVC all seems to suggest this is valid. For GCC and Clang I have also turned on the sanitizer so that undefined behaviors can be detected. The access of arr.size() seems fine as it is just a compile-time constant. The use of arr.data() also seems fine because the lifetime of the array arr starts after the = in std::array<int, 500000> arr= and arr has a well-defined address, which should just be what arr.data() returns because arr is an aggregate, but I'm not entirely sure. I'm also not sure about whether the placement new is valid from the standard perspective. For the arr = [&arr,&gen]{...; return arr;} I am also not sure whether the new rvalue semantics in C++17 is necessary to make the snippet valid, or whether it is also legitimate in earlier C++ standards (e.g. C++14).

So the question is, is it legitimate to access the to-be-initialized array this way in a functor that is used to initialize itself, and why?


For additional context, I have read answers to the question that inspired this question (linked above) and the suggested duplicates there. The answers there basically boils down to one of two things:

  • Generate a local array in the lambda and returning it. This is helpful in the case where the initialization has difficulty due to const, but not in the case where the type is not default constructible.
  • Using some sort of variant based on std::index_sequence. This suffers from the implementation limit of templates and won't work for large array sizes.

Based on these, I believe the solution here has its practical value unless someone can come up with a solution that does not suffer from the above two limitations, and this is not just a question of theoretical interest.

 13  1472  13
1 Jan 1970

Solution

 12

This is UB, and doesn't work correctly for most types.

The use of arr.data() also seems fine because the lifetime of the array arr starts after the = in std::array<int, 500000> arr=

No, the lifetime starts when the initialization is complete. After = the name merely becomes visible. This makes arr.data() UB.

Returning arr will call a copy constructor of std::array with itself as the argument (because of mandatory copy elision), which will do the same for every element, which is not something type authors ever expect (e.g. adding std::string x; as a data member to Int causes a segfault for me).

You can confirm this by adding logging to the copy constructor:

Int(const Int &x) {std::cout << this << ' ' << &x << '\n';}

For each element, this prints the same address twice.


Pre-C++17, when there was no mandatory copy elision, another possible behavior was for arr to be copied into a temporary by return, which is then moved back into arr. That move uses a move constructor rather than move assignment, which overwrites existing elements without calling their destructors, which is also problematic.

2024-06-26
HolyBlackCat

Solution

 7

As HolyBlackCat points out, formally arr.data() is UB. In practice, this is going to be equal to address of arr itself and I can't think of any actual, reasonable implementation where it would fail.

The bigger issue (also pointed out by HolyBlackCat) is the fact that arr is being returned which triggers a copy constructor to initialize arr itself again. This leads to invocation of copy constructor of Int with itself as an argument.

Int::Int(const Int& other) {
    // &other == this
}

that is something that no one expects. While it may be benign to Int it may cause some real, harmful undefined behavior with other types implemented by someone else. Consider for example:

#include <array>
#include <iostream>
#include <memory>

int construct = 0;
int destruct = 0;

struct Foo{
    Foo() { ++construct; }
    ~Foo() { ++destruct; }
};
using FooPtr = std::shared_ptr<Foo>;

int main() {
    {
        auto gen = []() { return std::make_shared<Foo>(); };
        std::array<FooPtr, 5> arr = [&arr, &gen](){
            for(std::size_t i=0; i < arr.size(); i++)
                new (arr.data() + i) FooPtr(gen());
            return arr;
        }();
    } // arr dies here
    std::cout << "constructs=" << construct << " destructs=" << destruct << "\n";
}

This actually gives me

constructs=5 destructs=0

with all three major compilers (clang, gcc, msvc)

godbolt example

2024-06-26
CygnusX1

Solution

 0

As an alternative, even if the element type is not default-constructible, you can still initialize a buffer of uninitialized memory, and create a std::array from that using std::to_array and a pointer cast.

Be warned: The elements of the temporary array are not destructed after being moved from! Moving leaves most types in a trivially-destructible state, but this might not be true in every case.

Sample code:

#include <array>
#include <memory> // uninitialized_move
#include <ranges>
#include <utility>

using std::size_t;

struct Int{
    int v;
    constexpr Int(int v) noexcept : v{v} {}
};

constexpr size_t array_len = 24;

std::array<Int, array_len> generate_array() noexcept {
    // Generator for the elements of the returned array:
    constexpr auto source = std::ranges::transform_view(
        std::ranges::iota_view(0, static_cast<int>(array_len)),
        [](const int i)constexpr{return Int(11*(i+1));}
    );
    // Buffer of uninitialized memory:
    alignas(Int) std::array<char, sizeof(Int)*array_len> scratch;
 
    std::uninitialized_move_n(
        source.begin(),
        array_len,
        reinterpret_cast<Int*>(scratch.data())
    );
    return std::to_array<Int, array_len>(std::move(*reinterpret_cast<Int(*)[array_len]>(scratch.data())));
}

#include <cstdio>
#include <iostream>

int main() {
    static const auto xs = generate_array();
    for (const Int& x : xs) {
        std::cout << x.v << ' ';
    }
    std::cout << '\n';
    return EXIT_SUCCESS;
}

The current version of MSVC on the Compiler Explorer lacks std::uninitialized_move_n, but Microsoft does support it. As a workaround, you might fall back on std::uninitialized_move.

You could also use the generator above to construct a std::vector and copy your std::array from that. For large arrays, which might cause an (ahem) stack overflow, this has the advantage that it constructs the temporary copy on the heap.

#include <array>
#include <cassert>
#include <ranges>
#include <utility>
#include <vector>

using std::size_t;

struct Int{
    int v;
    constexpr Int(int v) noexcept : v{v} {}
};

constexpr size_t array_len = 24;

std::array<Int, array_len> generate_array() { // Can now throw bad_alloc.
    // Generator for the elements of the returned array:
    constexpr auto source = std::ranges::transform_view(
        std::ranges::iota_view(0, static_cast<int>(array_len)),
        [](const int i)constexpr{return Int(11*(i+1));}
    );
    std::vector<Int> scratch(source.begin(), source.end());
    assert(scratch.size() == array_len);

    return std::to_array<Int, array_len>(std::move(*reinterpret_cast<Int(*)[array_len]>(scratch.data())));
}

#include <cstdio>
#include <iostream>

int main() {
    static const auto xs = generate_array();
    for (const Int& x : xs) {
        std::cout << x.v << ' ';
    }
    std::cout << '\n';
    return EXIT_SUCCESS;
}

Try it on the Godbolt compiler explorer.

Update

As of 2024, a more-useful, but potentially non-portable, solution for large arrays is to bypass std::to_array and directly reinterpret_cast the buffer as a pointer to a std::array. This causes the return object to be move-constructed, which for a large array of this type calls memcpy. For whatever reason, if you use std::to_array, some compilers will spend an excessive amount of time and memory doing template metaprogramming on a large array.

Although the Standard effectively guarantees that a built-in array T[N], a range of pointers from the start to the end of the array, and the range [ data(), data()+size() ) of a std::array or std::vector have the same value representation, the first element of an array is not formally layout-convertible to an array. There is also no requirement that a std::array have the same address as its data(), or that it not have a stricter required alignment. So, this works on all the most-used compilers, but try it at your own risk. I do, however, put in a static_assert that the address of a std::array<T, 1> is the same as its data, as a sanity check against the most-likely way this could break.

Therefore, this version works on the compiler explorer at sizes where the more-portable ones break down.

#include <array>
#include <cassert>
#include <ranges>
#include <utility>
#include <vector>

using std::size_t;

struct Int{
    int v;
    constexpr Int(int v) noexcept : v{v} {}
};

// C++20 does not guarantee that a std::array is layout-compatible with a built-in array, so:
static constexpr std::array<Int,1> layout_check = {Int(0)};
static_assert(static_cast<const void*>(std::addressof(layout_check)) == static_cast<const void*>(layout_check.data()), "");

constexpr size_t array_len = 500'000;

std::array<Int, array_len> generate_array() noexcept {
    // Generator for the elements of the returned array:
    constexpr auto source = std::ranges::transform_view(
        std::ranges::iota_view(0, static_cast<int>(array_len)),
        [](const int i)constexpr{return Int(11*(i+1));}
    );
    // Buffer of uninitialized memory:
    alignas(Int) std::array<char, sizeof(Int)*array_len> scratch;
 
    std::uninitialized_move_n(
        source.begin(),
        array_len,
        reinterpret_cast<Int*>(scratch.data())
    );
    return std::move(*reinterpret_cast<std::array<Int, array_len>*>(scratch.data()));
}

#include <cstdio>
#include <iostream>

int main() {
    static const auto xs = generate_array();
    for (size_t i = 0; i < xs.size(); i += 1000) {
        std::cout << xs[i].v << ' ';
    }
    std::cout << '\n';
    return EXIT_SUCCESS;
}

Although you likely want to create large arrays on the heap:

#include <array>
#include <cassert>
#include <ranges>
#include <utility>
#include <vector>

using std::size_t;

struct Int{
    int v;
    constexpr Int(int v) noexcept : v{v} {}
};

constexpr size_t array_len = 500'000;

// C++20 does not guarantee that a std::array is layout-compatible with a built-in array, so:
static constexpr std::array<Int,1> layout_check = {Int(0)};
static_assert(static_cast<const void*>(std::addressof(layout_check)) == static_cast<const void*>(layout_check.data()), "");

std::array<Int, array_len> generate_array() { // Can throw bad_alloc.
    // Generator for the elements of the returned array:
    constexpr auto source = std::ranges::transform_view(
        std::ranges::iota_view(0, static_cast<int>(array_len)),
        [](const int i)constexpr{return Int(11*(i+1));}
    );
    std::vector<Int> scratch(source.begin(), source.end());
    assert(scratch.size() == array_len);

    return std::move(*reinterpret_cast<std::array<Int, array_len>*>(scratch.data()));
}

#include <cstdio>
#include <iostream>

int main() {
    static const auto xs = generate_array();
    for (size_t i = 0; i < xs.size(); i += 1000) {
        std::cout << xs[i].v << ' ';
    }
    std::cout << '\n';
    return EXIT_SUCCESS;
}

2024-06-27
Davislor