1) We can solve this without rounding (as rounding may not guarantee that the constraints continue to hold) by defining the following quadratic integer programming problem with linear constraints. Define a least squares optimization problem based on this pseudo-code. The diff constraint ensures that each column is non-decreasing and the last constraint ensures that the first and last rows are not changed. (A variation of would be to use the sum of the absolute differences in the objective instead of the squared differences.)
ABC <- na.approx(data[-(1:2)])
Minimize sum((ABC - X)^2) over X
s.t. X is integer matrix and
rowSums(X) == data$Total and
diff(X) >= 0 and
X[c(1, nr), ] == ABC[c(1, nr), ]
Function opt
below implements this optimization. It has arguments ABC
which is the A, B and C columns of na.approx(data[-(1:2)])
and Total
which is the Total column of data
. Using CVXR (since the problem is convex) define the X
matrix to be solved for, the objective
, the constraints
and the problem
(which is the objective plus the constraints) and then use CVXR's solve
to solve that problem. If you prefer to use absolute differences rather than squared differences in the objective replace (ABC - X)^2
below with abs(ABC - X)
.
library(CVXR)
library(zoo)
opt <- function(ABC, Total) {
nr <- nrow(ABC); nc <- ncol(ABC)
X <- Variable(rows = nr, cols = nc, integer = TRUE)
objective <- Minimize(sum((ABC - X)^2))
constraints <- list(sum_entries(X, 1) == Total, diff(X) >= 0,
X[c(1, nr), ] == ABC[c(1, nr), ])
problem <- Problem(objective, constraints)
res <- solve(problem)
print(res$status)
res$getValue(X)
}
ABC <- na.approx(data[-(1:2)])
replace(data, colnames(ABC), opt(ABC, data$Total))
giving
[1] "optimal"
Year Total A B C
1 2000 50 12 22 16
2 2001 52 13 22 17
3 2002 53 14 22 17
4 2003 57 15 23 19
5 2004 60 16 24 20
6 2005 61 17 24 20
2) We could combine the other two answers with some elements of this one to get this approach. Unlike (1) we are not guaranteed a solution even if a feasible solution exists since the approach does not handle the diff
constraint other than to check afterwards whether it happened to be satisfied.
library(sfsmisc)
library(zoo)
opt2 <- function(ABC, Total, method = "offset-round") {
rowRoundfixS <- function(x) t(apply(x, 1, roundfixS, method = method))
ABC2 <- rowRoundfixS(proportions(ABC, 1) * Total)
nr <- nrow(ABC)
ok <- all(rowSums(ABC2) == Total) && all(diff(ABC2) >= 0) &&
all(ABC2[c(1, nr), ] == ABC[c(1, nr), colnames(ABC)])
print(if (ok) "ok" else "failed")
ABC2
}
ABC <- na.approx(data[-(1:2)])
replace(data, colnames(ABC), opt(ABC, data$Total))
giving
[1] ok
Year Total A B C
1 2000 50 12 22 16
2 2001 52 13 22 17
3 2002 53 14 22 17
4 2003 57 15 23 19
5 2004 60 16 24 20
6 2005 61 17 24 20
which at least for this data gives the same result as (1).
Note
Input
data <- data.frame(
Year = seq(2000, 2005, by = 1),
Total = c(50, 52, 53, 57, 60, 61),
A = c(12, NA, NA, NA, NA, 17),
B = c(22, NA, NA, NA, NA, 24),
C = c(16, NA, NA, NA, NA, 20)
)