Question
Optimal multiplication of two 3D arrays having a variable dimension
I would like to multiply tensors R = {R_1, R_2, ..., R_M}
and X = {X_1, X_2, ..., X_M}
where R_i
and X_i
are 3×3
and 3×N_i
matrices, respectively. How can I make maximum use of NumPy functionalities during the formation of the R_i × X_i
arrays?
My MWE is the following:
import numpy as np
np.random.seed(0)
M = 5
R = [np.random.rand(3, 3) for _ in range(M)]
X = []
for i in range(M):
N_i = np.random.randint(1, 6)
X_i = np.random.rand(3, N_i)
X.append(X_i)
result = np.zeros((3, 0))
for i in range(M):
R_i = R[i]
X_i = X[i]
result = np.hstack((result, np.dot(R_i, X_i)))
print(result)
Edit #1:
Thanks for everyone who helped me with his valuable comments.
Meanwhile I was thinking about the role of N_i
s in my real problem and came to the conclusion that the number of unique N_i
s is in fact small (1 to 5; 2 is the most common one, but 1 is also very frequent). In this case, would there be a more efficient solution to the treatment of multiplications?
Another aspect which would be important: in practice, I store a 3 × N
matrix X
, not the individual X_i
blocks. The columns of X
are not ordered w.r.t. the R
list. Instead, I store only an index vector p
which provides the correct ordering for the X
columns.
In this case, an einsum
version would be the following (in comparison with the "direct" multiplication):
import numpy as np
M = 30
N = 100
np.random.seed(0)
p = np.random.randint(M, size=N)
R = np.random.rand(M, 3, 3)
X = np.random.rand(3, N)
result_einsum = np.einsum('ijk,ki->ji', R[p], X)
result_direct = np.zeros((3, N))
for i in range(N):
result_direct[:, i] = np.dot(R[p[i]], X[:, i])
print(np.allclose(result_einsum, result_direct))
Edit #2:
It seems that Numba helps quite a lot:
import numpy as np
import numba
from timeit import Timer
M = 30
N = 100
np.random.seed(0)
p = np.random.randint(M, size=N)
R = np.random.rand(M, 3, 3)
X = np.random.rand(3, N)
@numba.njit
def numba_direct(R, p, X, result_direct, N):
for i in range(N):
p_i = p[i]
for j in range(3):
res = 0.0
for k in range(3):
res += R[p_i, j, k] * X[k, i]
result_direct[j, i] = res
result_direct = np.zeros((3, N))
numba_direct(R, p, X, result_direct, N)
result_einsum = np.einsum('ijk,ki->ji', R[p], X)
print(np.allclose(result_einsum, result_direct))
ntimes = 10000
einsum_timer = Timer(lambda: np.einsum('ijk,ki->ji', R[p], X))
einsum_time = einsum_timer.timeit(number=ntimes)
numba_direct_timer = Timer(lambda: numba_direct(R, p, X, result_direct, N))
numba_direct_time = numba_direct_timer.timeit(number=ntimes)
print(f'Einsum runtime: {einsum_time:.4f} seconds')
print(f'Numba direct runtime: {numba_direct_time:.4f} seconds')
The execution times are the following for the above code:
Einsum runtime: 0.0979 seconds
Numba direct runtime: 0.0129 seconds