R codes for the paper About predictions in spatial autoregressive models: Optimal and almost optimal strategies

This document was generated directly from RStudio using Markdown tool. It presents the R codes used to obtain the simulation results included in the paper ``About predictions in spatial autoregressive models: Optimal and almost optimal strategies’’. The source of this page (with all R codes used) is available here. To cite this work, please use :

Goulard M., Laurent T., and Thomas-Agnan C. (2017). About predictions in spatial autoregressive models: Optimal and almost optimal strategies. Spatial Economic Analysis, 12.

Before starting

Install the required packages:

install.packages(c("RColorBrewer", "rgdal", "spdep"))

Load the packages :

require("rgdal")
require("spdep")
require("RColorBrewer")

Set working directory:

path.dir <- "/media/laurent/DATAPART1/site web/code/GLT"

1. Preparation of the data

Import the Midi-Pyrénées communes boundaries into R :

mapMAP <- readOGR(dsn = "contours", layer = "ADTCAN_region")

We convert the type of the identification units into numeric values:

mapMAP@data$CODE <- as.numeric(as.character(mapMAP@data$CODE))

The number of observations equals to $n$:

n <- nrow(mapMAP)

We consider two spatial weight matrices:

$W_1$ based on the 10-nearest neighbours and row-normalized. $W_1$ is relatively sparse ($96.5\%$ of null values).

coords <- coordinates(mapMAP)
W1.listw <- nb2listw(knn2nb(knearneigh(coords, 10)), 
 style = "W")
W1.mat <- listw2mat(W1.listw)

$W_2$ based on the weights proportionnal to the inverse distance with a distance threshold equal to 125km and row-normalized. $W_2$ is relatively dense ($40\%$ of null values)

W.nb <-  dnearneigh(coords, 0, 125000, longlat = FALSE)
dlist <- nbdists(W.nb, coords)
dlist.c <- lapply(dlist, function(x) 1/x)
W2.listw <- nb2listw(W.nb, glist = dlist.c, style = "W")
W2.mat <- listw2mat(W2.listw)

Vizualisation of the two matrices (not included in the article) :

2. Simulation framework for in-sample predictions

a. Simulation of a SAR process

We simulate three explanatory variables as follows. $X_1$ follows a gaussian distribution $\mathcal{N}(15,3)$, $X_2$ follows (up to a constant) a binomial distribution $\mathcal{B}(100,0.45)/100$ and $X_3$ follows a log-uniform distribution $log(\mathcal{U}_{[0,283]})$.

x1 <- rnorm(n, 15, 3)
x2 <- rbinom(n, 100, 0.45)/100
x3 <- log(round(runif(n, 1, n),0))
x <- cbind(rep(1, n), x1, x2, x3)

We use the following spatial autoregressive LAG regression model to generate the dependent variable

\[ \mathbf{Y}=(\mathbf{I}-\rho \mathbf{W})^{-1}(\beta_0\mathbf{1_n} + \mathbf{X_1} \beta_1+ \mathbf{X_2} \beta_2+ \mathbf{X_3} \beta_3 + \boldsymbol{\epsilon}) ~~\text{where $\boldsymbol{\epsilon} \sim \mathcal{N}(0,\sigma^2 \mathbf{I_n}),$} \]

where $\mathbf{1_n}= (1,\cdots,1)' \in \mathbb{R}^n$, with fixed $\beta=(\beta_0,~\beta_1,~\beta_2, ~\beta_3)'=(5,~0.25,~6,~1)'$,

beta0 <- c(5, 1/4, 6, 1)

The different choices of parameters

We consider the following parameter values in the simulation process :

$\rho=-0.4,~-0.2,~0.05,~0.2,~0.35,~0.5,~0.65,~0.8,~0.9$,

rho <- c(-0.4, -0.2, 0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.9)

$\sigma=1,~3$,

sigma <- c(1, 3)

$W=W_1$ or $W=W_2$.

For example, for $\rho=0.35$, $\sigma=1$ and $W_1$, the simulation of $Y$ is as follows:

n <- nrow(x)
A <- diag(1,n) - 0.35*W1.mat
Y0 <- x%*%beta0 + rnorm(n, 0, 1)
y <- solve(A, Y0)

We have 9 possible values for $\rho$, 2 possible values for $\sigma$ and 2 possible spatial weight matrices $W_1$ or $W_2$, hence we obtain 36 different possible combinations. We save these parameters in the file param-in.RData included in a user-choosen directory and which will be needed later for executing the simulation program:

list.insample.param <- list(rho = rho, 
 W = list(W1.listw, W2.listw), sigma = sigma)
if(!dir.exists(paste(path.dir, "parameters", sep = ""))) 
  dir.create(paste(path.dir, "parameters", sep = ""))
save(list.insample.param, beta0, x, 
  file=paste(path.dir, "parameters/param-in.RData", sep = ""))

For each of the combinations, we draw 1000 samples, estimate a SAR model with the lagsarlm function (to obtain $\hat\rho$, $\hat\beta$ and $\hat\sigma$) and compute the following predictors :

$\mathbf{\check{Y}^{TS}} = \mathbf{X} \boldsymbol{\hat\beta} + \hat\rho \mathbf{W}\mathbf{Y}$
$\mathbf{\check{Y}^{TC}} = (\mathbf{I}-\hat \rho \mathbf{W})^{-1} \mathbf{X} \boldsymbol{\hat\beta}$
$\mathbf{\check{Y}^{BP}} = (\mathbf{I}-\rho \mathbf{W})^{-1}\mathbf{X} \boldsymbol{\hat\beta} -\text{Diag}(\mathbf{\hat{Q}})^{-1}\mathbf{\tilde{\hat{Q}}}(\mathbf{Y}-(\mathbf{I}- \rho \mathbf{W})^{-1}\mathbf{X} \boldsymbol{\hat \beta})$

b. Computing the predictors with spdep

All the prediction formulas cited in our paper are implemented in the releases later than 0.5-92 of the spdep R package (Bivand and Piras, 2015), following the GSoC 2015 project by M. Gubri.

don <- data.frame(y, x1, x2, x3) 
sar.res <- spdep::lagsarlm(y~., data=don, W1.listw, 
 method="eigen") 
y_TC = predict.sarlm(sar.res, listw = W1.listw, 
 pred.type = "TC", all.data = T, zero.policy = F) 
y_TS =  predict.sarlm(sar.res, listw = W1.listw, 
 pred.type = "TS", all.data = T, zero.policy = F) 
y_BP = predict.sarlm(sar.res, listw = W1.listw, 
 pred.type = "BP", all.data=T, zero.policy=F)

c. Results

The in-sample prediction simulation codes are given here and can be executed using parallel processing. We summarize the results and obtain the following table:

comparaison of the MSE depending on $W_1$ or $W_2$ with $\sigma=1$ (Table 2 in the article):

rho	BP (W1)	TS (W1)	TC (W1)	BP (W2)	TS (W2)	TC (W2)
-0.4	0.9556	0.9761	1.0159	0.965	0.9758	0.9872
	(0.0876)	(0.0869)	(0.0919)	(0.0815)	(0.08)	(0.081)
-0.2	0.9711	0.9779	0.9917	0.9703	0.9783	0.9864
	(0.0839)	(0.0839)	(0.0861)	(0.0819)	(0.081)	(0.0817)
0.05	0.979	0.9801	0.9832	0.9738	0.9777	0.9824
	(0.0806)	(0.0804)	(0.0804)	(0.0838)	(0.0832)	(0.0834)
0.2	0.9795	0.9828	0.995	0.9754	0.9781	0.9818
	(0.0842)	(0.0844)	(0.0868)	(0.0833)	(0.083)	(0.0833)
0.35	0.975	0.9852	1.0257	0.9804	0.9823	0.9858
	(0.082)	(0.0824)	(0.09)	(0.0849)	(0.0848)	(0.0851)
0.5	0.9576	0.9801	1.0901	0.9818	0.9843	0.991
	(0.0829)	(0.0827)	(0.1038)	(0.083)	(0.0828)	(0.0839)
0.65	0.9474	0.9839	1.2386	0.9808	0.984	0.9965
	(0.0845)	(0.0849)	(0.1459)	(0.085)	(0.0854)	(0.0889)
0.8	0.9233	0.9805	1.6808	0.98	0.9849	1.0085
	(0.0854)	(0.0864)	(0.3106)	(0.0852)	(0.0855)	(0.0946)
0.9	0.9173	0.9897	2.8831	0.9801	0.9882	1.0363
	(0.0821)	(0.0834)	(0.9097)	(0.0828)	(0.0835)	(0.1133)

comparaison of the ratio of MSE (BP/TS and BP/TC) depending on $W_1$ or $W_2$ with $\sigma=1$ (Table 3 in the article):

rho	BP_TS (W1)	BP_TC (W1)	BP_TS (W2)	BP_TC (W2)
-0.4	0.9789	0.9406	0.9889	0.9775
-0.2	0.9931	0.9792	0.9919	0.9837
0.05	0.9989	0.9958	0.996	0.9912
0.2	0.9967	0.9845	0.9972	0.9934
0.35	0.9896	0.9505	0.9981	0.9945
0.5	0.977	0.8784	0.9975	0.9908
0.65	0.963	0.7649	0.9968	0.9842
0.8	0.9417	0.5493	0.995	0.9717
0.9	0.9268	0.3182	0.9918	0.9457

comparaison of the MSE depending on $W_1$ or $W_2$ with $\sigma=3$ (results not included in the article):

rho	BP (W1)	TS (W1)	TC (W1)	BP (W2)	TS (W2)	TC (W2)
-0.4	8.6065	8.8094	9.2024	8.6877	8.8092	8.9363
	(0.7922)	(0.7785)	(0.8198)	(0.7737)	(0.7516)	(0.76)
-0.2	8.763	8.8365	8.9911	8.7067	8.7907	8.882
	(0.751)	(0.7433)	(0.7557)	(0.7425)	(0.7204)	(0.7197)
0.05	8.7988	8.817	8.8643	8.7324	8.7852	8.8455
	(0.7367)	(0.735)	(0.737)	(0.7705)	(0.7587)	(0.7598)
0.2	8.8408	8.8834	9.0227	8.786	8.8219	8.8689
	(0.7623)	(0.7638)	(0.787)	(0.7799)	(0.7732)	(0.7726)
0.35	8.7581	8.858	9.242	8.7828	8.8112	8.8629
	(0.7838)	(0.7837)	(0.8477)	(0.7512)	(0.7486)	(0.7522)
0.5	8.6435	8.8419	9.7985	8.8552	8.8871	8.9589
	(0.7511)	(0.7499)	(0.9134)	(0.7649)	(0.7628)	(0.7753)
0.65	8.5424	8.8904	11.2388	8.8226	8.8616	8.9881
	(0.7327)	(0.7402)	(1.308)	(0.7667)	(0.7693)	(0.808)
0.8	8.3666	8.9007	15.4022	8.7824	8.8404	9.0922
	(0.7707)	(0.7947)	(2.8512)	(0.7438)	(0.7471)	(0.8504)
0.9	8.2405	8.919	26.7887	8.8314	8.9092	9.3311
	(0.7613)	(0.781)	(8.8693)	(0.7749)	(0.7775)	(0.9564)

comparaison of the MSE ratio BP/TS and BP/TC depending on $W_1$ or $W_2$ with $\sigma=3$ (results not included in the article):

rho	BP_TS (W1)	BP_TC (W1)	BP_TS (W2)	BP_TC (W2)
-0.4	0.977	0.9353	0.9862	0.9722
-0.2	0.9917	0.9746	0.9904	0.9803
0.05	0.9979	0.9926	0.994	0.9872
0.2	0.9952	0.9798	0.9959	0.9906
0.35	0.9887	0.9476	0.9968	0.991
0.5	0.9776	0.8821	0.9964	0.9884
0.65	0.9609	0.7601	0.9956	0.9816
0.8	0.94	0.5432	0.9934	0.9659
0.9	0.9239	0.3076	0.9913	0.9464

3. Simulation framework for out-of-sample predictions

a. Simulation process

We choose at random a given number of sites ($27$ or $54$) which will be declared out-of-sample (in $O$). We predict the $\mathbf{Y}$ variable on the out-of-sample locations based on the sample $S$ constituted by the remaining sites. We consider several situations depending upon the number of out-of-sample units and upon the aggregation level of the out-of-sample units. The corresponding configurations are:

27 out-of-sample units

(R codes)

set.seed(18) 
config1.27 <- sample(1:283, 27)

config2.27 <- c(72, 180, 269, 155, 275, 143, 259) 
for(p in 1:4)
 {set.seed(p)
  config2.27 <- c(config2.27, 
   sample(W1.listw$neighbours[[config2.27[p]]],5))
 }

config3.27 <- c(132, 59, 225)   
for(p in 1:3)
 {set.seed(p+1)
  config3.27 <- c(config3.27, 
   sample(W1.listw$neighbours[[config3.27[p]]],8))
 }

54 out-of-sample units

(R codes)

set.seed(15)
config1.54 <- sample(1:283,54)  

config2.54 <- c(19, 76, 137, 257, 214, 52, 39, 163, 168) 
for(p in 1:9)
 {set.seed(p+1)
  config2.54 <- c(config2.54, 
   sample(W1.listw$neighbours[[config2.54[p]]],5))
 }

config3.54 <- c(6, 145, 278, 229, 26, 158) 
for(p in 1:6)
 {set.seed(p)
  config3.54 <- c(config3.54, 
   sample(W1.listw$neighbours[[config3.54[p]]],8))
}

Graphical representation of the 27 and 54 out-of-sample units

(Figure 3 in the article)

b. Out-of-sample predictors formulae

$\mathbf{\hat Y_O^{BP}} = \mathbf{\hat Y_O^{TC}} - \mathbf{\hat{Q}_{OO}}^{-1}\mathbf{\hat{Q}_{OS}} \times (\mathbf{Y_S}-\mathbf{\hat Y_S^{TC}})$
$\mathbf{\hat Y_O^{TC}} = [(\mathbf{I} - \hat \rho \mathbf{W})^{-1}\mathbf{X} \boldsymbol{\hat\beta}]_O$
$\hat Y_o^{TS^{1}} = \mathbf{X_o}\boldsymbol{\hat\beta} + \hat\rho \mathbf{W_{oS}}\mathbf{Y_S}$
$\mathbf{\hat Y_O^{BP_{W}}}= \mathbf{\hat Y_{O}^{TC}}+\boldsymbol{\hat{\Sigma}_{OS}}\mathbf{W_{OS}'}(\mathbf{W_{OS}}\boldsymbol{\hat{\Sigma}_{SS}} \mathbf{W_{OS}'})^{-1}(\mathbf{W_{OS}}\mathbf{Y_S} - \mathbf{W_{OS}}\mathbf{\hat Y_{S}^{TC}})$
$\mathbf{\hat Y_O^{BP_N}}= \mathbf{\hat Y_O^{TC}}-\mathbf{\hat{Q}_{OO}}^{-1}\mathbf{\hat{Q}_{OJ}} \mathbf{(Y_J - \hat Y^{TC}_J )_J}$ for $J$ set of indices of neighbors of $O$
$\hat Y_o^{TC^1}$, row $o$ of
\[ \{ \mathbf{I_{n_S+1}} - \hat \rho \begin{pmatrix} \mathbf{W_{SS}}& \mathbf{W_{So}} \\ \mathbf{W_{oS}}& \mathbf{W_{oo}} \end{pmatrix}\}^{-1} \begin{pmatrix} \mathbf{X_S} \\ \mathbf{X_o} \end{pmatrix} \boldsymbol{\hat \beta}\]
$\hat Y_o^{BP^1} = \hat Y_o^{TC^1} - \hat{Q}_{oo}^{-1}\mathbf{\hat{Q}_{oS}}(\mathbf{Y_S} - \mathbf{\hat Y_S^{TC^1}})$
$\hat Y_o^{BP_W^1} = \hat Y_{o}^{TC^1}+\mathbf{\hat{\Sigma}_{oS}}\mathbf{W_{oS}'}(\mathbf{W_{oS}}\boldsymbol{\hat{\Sigma}_{SS}} \mathbf{W_{oS}'})^{-1}(\mathbf{W_{oS}}\mathbf{Y_S} - \mathbf{W_{oS}}\mathbf{\hat Y_{S}^{TC^1}})$
$BP_N^1$ & $\hat Y_o^{BP_N^1} = \hat Y_o^{TC}-\hat{Q}_{oo}^{-1}\mathbf{\hat{Q}_{oJ}} (\mathbf{Y_J} - \mathbf{\hat Y^{TC^1}_J} )_J$ for $J$ set of indices of neighbors of $o$

c. Computing the predictions with spdep

We define the indices of the out-of-sample units and build the $S$ and $O$ sample :

outsamp <- config1.54 
S.sample <- don[-outsamp, ]
O.sample <- don[outsamp, ]

We estimate a SAR model on the $S$ sample. For that, we first have to define the $W_S$ matrix:

WS.listw <- subset.listw(W1.listw, !(1:n %in% outsamp))
sar.res <- lagsarlm(y~., data = S.sample, WS.listw, 
 method = "eigen")

We predict the out-of-sample $O$ values of $Y$ by using predict.sarlm() function. Option pred.type= gives the predictors type :

y_BP = as.numeric(predict.sarlm(sar.res, listw = W1.listw, 
 newdata = O.sample, zero.policy = T, 
 pred.type = "BP", power = TRUE))
y_BP1 = as.numeric(predict.sarlm(sar.res, listw = W1.listw, 
 newdata = O.sample, zero.policy = T, 
 pred.type = "BP1", power = TRUE))
y_TC = as.numeric(predict.sarlm(sar.res, listw = W1.listw, 
 newdata = O.sample, zero.policy = T,
 pred.type = "TC", power = TRUE))
y_TS1 = as.numeric(predict.sarlm(sar.res, listw = W1.listw, 
 newdata = O.sample, zero.policy = T,
 pred.type = "TS1", power = TRUE))
y_BPW = as.numeric(predict.sarlm(sar.res, listw = W1.listw, 
 newdata = O.sample, zero.policy = T, 
 pred.type = "BPW", power = TRUE))
y_BPN = as.numeric(predict.sarlm(sar.res, listw = W1.listw, 
 newdata = O.sample, zero.policy = T, 
 pred.type = "BPN", power = TRUE))
y_BPW1 = as.numeric(predict.sarlm(sar.res, listw = W1.listw, 
 newdata = O.sample, zero.policy = T, 
 pred.type = "BPW1", power = TRUE))
y_BPN1 = as.numeric(predict.sarlm(sar.res, listw = W1.listw, 
 newdata = O.sample, zero.policy = T, 
 pred.type = "BPN1", power = TRUE))

d. The different choices of parameters in the simulations

In the simulations, we fix $\rho=0.35$. The different parameters we will test are:

2 possible values for $\sigma$ ($\sigma=1,~3$).
2 possible spatial weight matrices $W_1$ and $W_2$.
6 different configurations for out-of-sample units.
2 different methods for estimating the parameters: one using ML, one using EM.

Thus, we obtain 48 different possible combinations whose parameters are saved in the param-out.RData file which will be used to run the simulations:

list.param <- list(W = list(W1.listw, W2.listw), 
  sigma = sigma, config = list(config1.27, config2.27, 
  config3.27,config1.54, config2.54, config3.54),
  EM = c(FALSE, TRUE))
save(list.param, beta0, x, 
     file = paste(path.dir, "parameters/param-out.RData", 
     sep = ""))

e. EM or not EM: that is the question?

Among the different configurations presented previously, we will first focus on the estimation of the parameters to check if we can notice a strong difference when estimating by EM (see R codes uses here). We do not consider all the cases, but the following ones :

54 missing units: randomly distributed or strongly clustered
$\sigma=1,~3$
$W=W_1,~W_2$

$\sigma=1$, $W_1$

(Table 5 in the article)

	ML (random)	EM (random)	ML (clustered)	EM (clustered)
beta0	6.663	5.732	6.508	5.532
	(2.167)	(2.226)	(2.319)	(2.419)
beta1	0.25	0.251	0.25	0.251
	(0.024)	(0.023)	(0.024)	(0.024)
beta2	5.874	5.958	5.972	5.961
	(1.481)	(1.479)	(1.474)	(1.472)
beta3	1.004	1	1.008	0.999
	(0.062)	(0.062)	(0.07)	(0.069)
rho	0.284	0.32	0.287	0.328
	(0.084)	(0.085)	(0.089)	(0.094)
sigma2	0.991	1.202	0.991	1.104
	(0.097)	(3.238)	(0.095)	(0.33)

$\sigma=1$, $W_2$

(Table 5 in the article)

	ML (random)	EM (random)	ML (clustered)	EM (clustered)
beta0	11.185	10.191	11.113	10.054
	(8.253)	(6.752)	(7.679)	(6.886)
beta1	0.251	0.251	0.25	0.251
	(0.023)	(0.023)	(0.024)	(0.024)
beta2	5.99	5.986	5.973	5.975
	(1.476)	(1.475)	(1.479)	(1.481)
beta3	1	0.999	0.999	0.999
	(0.062)	(0.062)	(0.07)	(0.069)
rho	0.097	0.138	0.1	0.144
	(0.335)	(0.273)	(0.309)	(0.278)
sigma2	0.982	1.791	0.98	1.575
	(0.096)	(6.449)	(0.093)	(3.923)

$\sigma=3$, $W_1$

(Not included in the article)

	ML (random)	EM (random)	ML (clustered)	EM (clustered)
beta0	6.998	6.002	6.901	5.821
	(3.69)	(3.91)	(3.783)	(4.014)
beta1	0.253	0.253	0.252	0.253
	(0.071)	(0.07)	(0.073)	(0.073)
beta2	5.773	5.866	5.885	5.883
	(4.435)	(4.442)	(4.422)	(4.411)
beta3	1.003	0.998	1.005	0.998
	(0.186)	(0.186)	(0.209)	(0.207)
rho	0.271	0.31	0.272	0.317
	(0.113)	(0.125)	(0.115)	(0.129)
sigma2	8.898	9.951	8.88	9.244
	(0.874)	(8.96)	(0.852)	(0.998)

$\sigma=3$, $W_2$

(Not included in the article)

	ML (random)	EM (random)	ML (clustered)	EM (clustered)
beta0	14.736	13.318	14.187	13.346
	(11.906)	(9.947)	(11.155)	(10.41)
beta1	0.253	0.253	0.252	0.252
	(0.07)	(0.07)	(0.073)	(0.073)
beta2	5.905	5.905	5.894	5.894
	(4.412)	(4.422)	(4.431)	(4.439)
beta3	0.998	0.997	0.996	0.996
	(0.187)	(0.187)	(0.208)	(0.208)
rho	-0.047	0.011	-0.024	0.01
	(0.472)	(0.393)	(0.439)	(0.41)
sigma2	8.819	10.909	8.807	9.499
	(0.871)	(17.525)	(0.836)	(8.254)

Comments:

the parameters estimates do not vary a lot between ML and EM algorithm.
estimations of $\beta_1$, $\beta_2$ and $\beta_3$ are good whatever the choice for $W_1$ and $W_2$, $\sigma=1,~3$.
estimations of $\sigma$ is good.
when $W_2$ is dense, estimation of $\rho$ is not good, especially when $\sigma=3$

f. Simulation algorithms and results when $\rho = 0.35$

For each of the 48 combinations, we draw 1000 samples, estimate a SAR model with the lagsarlm (with or without EM algorithm) and we compute the previous predictors. The out-of-sample prediction simulation codes are given here and can be executed using parallel processing. We summarize the results and obtain the following table:

The 6 configurations with $\sigma=1$, $W_1$ and $EM=FALSE$ (Table 6 in the article)

predictors	config 1 (27)	config 2 (27)	config 3 (27)	config 1 (54)	config 2 (54)	config 3 (54)
BP	1.009	1.033	1.034	1.016	1.03	1.036
	(0.276)	(0.291)	(0.277)	(0.194)	(0.196)	(0.202)
BPn	1.009	1.034	1.034	1.017	1.031	1.036
	(0.276)	(0.292)	(0.277)	(0.194)	(0.196)	(0.203)
BP/BPn	100	99.9	100	99.9	99.9	100
BPw	1.011	1.036	1.035	1.017	1.031	1.037
	(0.277)	(0.292)	(0.277)	(0.194)	(0.196)	(0.203)
BP/BPw	99.8	99.7	99.9	99.9	99.9	99.9
BP1	1.011	1.042	1.061	1.019	1.04	1.053
	(0.276)	(0.293)	(0.288)	(0.194)	(0.196)	(0.209)
BP/BP1	99.8	99.1	97.5	99.7	99	98.4
BPn1	1.012	1.046	1.064	1.021	1.041	1.056
	(0.277)	(0.294)	(0.289)	(0.194)	(0.197)	(0.21)
BP/BPn1	99.7	98.8	97.2	99.5	98.9	98.1
BPw1	1.014	1.046	1.065	1.022	1.042	1.056
	(0.277)	(0.294)	(0.289)	(0.194)	(0.197)	(0.209)
BP/BPw1	99.5	98.8	97.1	99.4	98.8	98.1
TS1	1.023	1.048	1.067	1.029	1.046	1.055
	(0.278)	(0.292)	(0.289)	(0.196)	(0.198)	(0.209)
BP/TS1	98.6	98.6	96.9	98.7	98.5	98.2
TC	1.06	1.076	1.067	1.06	1.07	1.07
	(0.286)	(0.301)	(0.284)	(0.203)	(0.204)	(0.212)
BP/TC	95.2	96	96.9	95.8	96.3	96.8
	1.064	1.085	1.104	1.067	1.084	1.088
TC1	(0.286)	(0.3)	(0.3)	(0.204)	(0.206)	(0.217)
BP/TC1	94.8	95.2	93.7	95.2	95	95.2

Comments:

standard deviations are comparable across predictors.
$BPw$ and $BPn$ seem nearly equivalent to $BP$
$BP1$ is slightly less efficient than $BP$ and it seems less good when the predicted sites are clustered. It is always better than $TC$ and $TS$.
$TC$ is less efficient than $BP$, it seems better when the predicted sites are clustered.
$TS$ is less efficient than $BP$ but better than $TC$. It becomes less efficient when the predicted sites are clustered.
$TC1$ is close to $TC$, but a little bit less efficient when predicted sites are clustered.
$BPw1$ and $BPn1$ are slightly better than $TS$.

The 6 configurations with $\sigma=1$, $W_2$ and $EM=FALSE$ (Table 7 in the article)

It seems that when $W$ is denser, all the predictors are close to $BP$.

predictors	config 1 (27)	config 2 (27)	config 3 (27)	config 1 (54)	config 2 (54)	config 3 (54)
BP	1.018	1.033	1.02	1.02	1.026	1.025
	(0.275)	(0.285)	(0.269)	(0.195)	(0.195)	(0.199)
BPn	1.018	1.033	1.02	1.02	1.026	1.025
	(0.275)	(0.285)	(0.269)	(0.195)	(0.195)	(0.199)
BP/BPn	100	100	100	100	100	100
BPw	1.018	1.033	1.02	1.02	1.026	1.025
	(0.275)	(0.286)	(0.269)	(0.195)	(0.195)	(0.199)
BP/BPw	100	100	100	100	100	100
BP1	1.019	1.033	1.02	1.022	1.027	1.025
	(0.275)	(0.286)	(0.269)	(0.195)	(0.194)	(0.199)
BP/BP1	99.9	100	100	99.8	99.9	100
BPn1	1.019	1.033	1.02	1.022	1.027	1.025
	(0.275)	(0.286)	(0.269)	(0.195)	(0.194)	(0.199)
BP/BPn1	99.9	100	100	99.8	99.9	100
BPw1	1.019	1.033	1.02	1.022	1.027	1.025
	(0.275)	(0.286)	(0.269)	(0.195)	(0.194)	(0.199)
BP/BPw1	99.9	100	100	99.8	99.9	100
TS1	1.018	1.031	1.019	1.018	1.024	1.024
	(0.275)	(0.285)	(0.268)	(0.195)	(0.194)	(0.198)
BP/TS1	100	100.2	100.1	100.2	100.2	100.1
TC	1.019	1.033	1.02	1.019	1.026	1.024
	(0.275)	(0.285)	(0.268)	(0.195)	(0.194)	(0.198)
BP/TC	99.9	100	100	100.1	100	100.1
	1.019	1.032	1.02	1.019	1.025	1.024
TC1	(0.275)	(0.284)	(0.267)	(0.195)	(0.194)	(0.198)
BP/TC1	99.9	100.1	100	100.1	100.1	100.1

The 6 configurations with $\sigma=1$, $W_1$ and $EM=TRUE$ (Not included in the article)

It seems that the fact that $\beta$, $\rho$ and $\sigma$ are estimated with the EM algorithm do not affect the prediction. Indeed, the predictors are very close to that of the first situation.

predictors	config 1 (27)	config 2 (27)	config 3 (27)	config 1 (54)	config 2 (54)	config 3 (54)
BP	1.008	1.032	1.033	1.015	1.029	1.035
	(0.276)	(0.291)	(0.277)	(0.194)	(0.195)	(0.201)
BPn	1.009	1.033	1.033	1.015	1.029	1.035
	(0.276)	(0.291)	(0.277)	(0.194)	(0.195)	(0.202)
BP/BPn	99.9	99.9	100	100	100	100
BPw	1.01	1.034	1.034	1.016	1.03	1.036
	(0.277)	(0.292)	(0.277)	(0.194)	(0.196)	(0.202)
BP/BPw	99.8	99.8	99.9	99.9	99.9	99.9
BP1	1.011	1.043	1.066	1.019	1.042	1.06
	(0.277)	(0.293)	(0.29)	(0.194)	(0.197)	(0.21)
BP/BP1	99.7	98.9	96.9	99.6	98.8	97.6
BPn1	1.012	1.047	1.068	1.02	1.044	1.063
	(0.277)	(0.295)	(0.291)	(0.194)	(0.197)	(0.212)
BP/BPn1	99.6	98.6	96.7	99.5	98.6	97.4
BPw1	1.014	1.048	1.07	1.021	1.044	1.063
	(0.278)	(0.294)	(0.291)	(0.194)	(0.198)	(0.211)
BP/BPw1	99.4	98.5	96.5	99.4	98.6	97.4
TS1	1.021	1.047	1.072	1.025	1.045	1.057
	(0.278)	(0.291)	(0.291)	(0.195)	(0.197)	(0.209)
BP/TS1	98.7	98.6	96.4	99	98.5	97.9
TC	1.06	1.073	1.065	1.058	1.067	1.069
	(0.286)	(0.3)	(0.284)	(0.203)	(0.203)	(0.211)
BP/TC	95.1	96.2	97	95.9	96.4	96.8
	1.064	1.086	1.113	1.066	1.086	1.091
TC1	(0.287)	(0.3)	(0.305)	(0.204)	(0.206)	(0.218)
BP/TC1	94.7	95	92.8	95.2	94.8	94.9

The 6 configurations with $\sigma=3$, $W_1$ and $EM=FALSE$ (Not included in the article)

When $\sigma=3$, it does not change the rank either the ratio between $BP$ and other predictors.

predictors	config 1 (27)	config 2 (27)	config 3 (27)	config 1 (54)	config 2 (54)	config 3 (54)
BP	9.09	9.301	9.286	9.148	9.27	9.319
	(2.482)	(2.612)	(2.486)	(1.748)	(1.761)	(1.823)
BPn	9.092	9.308	9.286	9.152	9.273	9.323
	(2.483)	(2.614)	(2.486)	(1.748)	(1.762)	(1.825)
BP/BPn	100	99.9	100	100	100	100
BPw	9.104	9.322	9.293	9.157	9.279	9.331
	(2.491)	(2.617)	(2.483)	(1.747)	(1.764)	(1.827)
BP/BPw	99.8	99.8	99.9	99.9	99.9	99.9
BP1	9.093	9.324	9.352	9.152	9.29	9.382
	(2.485)	(2.619)	(2.502)	(1.743)	(1.759)	(1.852)
BP/BP1	100	99.8	99.3	100	99.8	99.3
BPn1	9.103	9.359	9.362	9.166	9.302	9.407
	(2.489)	(2.632)	(2.504)	(1.742)	(1.761)	(1.86)
BP/BPn1	99.9	99.4	99.2	99.8	99.7	99.1
BPw1	9.115	9.363	9.365	9.17	9.308	9.412
	(2.493)	(2.627)	(2.5)	(1.744)	(1.767)	(1.854)
BP/BPw1	99.7	99.3	99.2	99.8	99.6	99
TS1	9.187	9.363	9.355	9.22	9.322	9.382
	(2.496)	(2.609)	(2.498)	(1.758)	(1.766)	(1.842)
BP/TS1	98.9	99.3	99.3	99.2	99.4	99.3
TC	9.541	9.668	9.58	9.527	9.605	9.618
	(2.573)	(2.69)	(2.55)	(1.824)	(1.829)	(1.899)
BP/TC	95.3	96.2	96.9	96	96.5	96.9
	9.547	9.668	9.607	9.533	9.616	9.635
TC1	(2.573)	(2.677)	(2.57)	(1.826)	(1.827)	(1.905)
BP/TC1	95.2	96.2	96.7	96	96.4	96.7

Conclusion of this section

when $W$ is sparse, $BP_N$, $BP_W$, $BP_N^1$, $BP_W^1$ are very close to $BP$ and much better than $TC$, $TS$, $TC^1$, $TS^1$.
Neither the value of $\sigma$ nor the fact that coefficients are estimated with EM algorithm change the rank of the predictors
When $W$ is dense, all the predictors seem to be very similar.

g. Simulation algorithms and results when $\rho = -0.2$

We only present the results for the 6 configurations with $\sigma=1$, $W_1$ and $EM=FALSE$ (the table is not included in the article).

predictors	config 1 (27)	config 2 (27)	config 3 (27)	config 1 (54)	config 2 (54)	config 3 (54)
BP	1.026	1.034	1.023	1.024	1.026	1.03
	(0.281)	(0.287)	(0.271)	(0.197)	(0.195)	(0.202)
BPn	1.026	1.035	1.023	1.024	1.026	1.03
	(0.281)	(0.287)	(0.271)	(0.197)	(0.195)	(0.202)
BP/BPn	100	99.9	100	100	100	100
BPw	1.027	1.035	1.023	1.025	1.026	1.03
	(0.281)	(0.287)	(0.272)	(0.198)	(0.195)	(0.201)
BP/BPw	99.9	99.9	100	99.9	100	100
BP1	1.028	1.042	1.039	1.029	1.034	1.047
	(0.281)	(0.287)	(0.281)	(0.199)	(0.198)	(0.206)
BP/BP1	99.8	99.2	98.5	99.5	99.2	98.4
BPn1	1.028	1.041	1.039	1.029	1.034	1.047
	(0.281)	(0.287)	(0.281)	(0.199)	(0.198)	(0.206)
BP/BPn1	99.8	99.3	98.5	99.5	99.2	98.4
BPw1	1.029	1.04	1.04	1.03	1.034	1.047
	(0.28)	(0.287)	(0.281)	(0.2)	(0.198)	(0.206)
BP/BPw1	99.7	99.4	98.4	99.4	99.2	98.4
TS1	1.028	1.037	1.035	1.028	1.031	1.042
	(0.281)	(0.287)	(0.278)	(0.199)	(0.196)	(0.204)
BP/TS1	99.8	99.7	98.8	99.6	99.5	98.8
TC	1.033	1.041	1.027	1.031	1.032	1.036
	(0.283)	(0.289)	(0.272)	(0.199)	(0.196)	(0.2)
BP/TC	99.3	99.3	99.6	99.3	99.4	99.4
	1.035	1.044	1.036	1.034	1.036	1.045
TC1	(0.283)	(0.29)	(0.276)	(0.2)	(0.198)	(0.203)
BP/TC1	99.1	99	98.7	99	99	98.6

h. Mispecification of the weight matrix

In this section, we used the same simulation processes except that we choose one of the spatial weight matrix $W_1$ or $W_2$ for simulating $Y$ and we use the other spatial weight matrix for estimating and predicting.

The 6 configurations when $W_2$ is used for simulating and $W_1$ is used for estimating and predicting ($\sigma=1$ and $EM=FALSE$) (Not included in the article)

predictors	config 1 (27)	config 2 (27)	config 3 (27)	config 1 (54)	config 2 (54)	config 3 (54)
BP	1.022	1.04	1.02	1.027	1.023	1.024
	(0.279)	(0.284)	(0.271)	(0.196)	(0.196)	(0.197)
BPn	1.022	1.039	1.019	1.027	1.023	1.024
	(0.279)	(0.284)	(0.271)	(0.196)	(0.195)	(0.197)
BP/BPn	100	100.1	100.1	100	100	100
BPw	1.022	1.04	1.019	1.027	1.023	1.024
	(0.279)	(0.284)	(0.271)	(0.196)	(0.196)	(0.197)
BP/BPw	100	100	100.1	100	100	100
BP1	1.023	1.043	1.025	1.029	1.028	1.03
	(0.28)	(0.285)	(0.273)	(0.196)	(0.196)	(0.199)
BP/BP1	99.9	99.7	99.5	99.8	99.5	99.4
BPn1	1.023	1.043	1.025	1.029	1.028	1.03
	(0.28)	(0.285)	(0.272)	(0.196)	(0.196)	(0.199)
BP/BPn1	99.9	99.7	99.5	99.8	99.5	99.4
BPw1	1.023	1.043	1.025	1.029	1.028	1.03
	(0.279)	(0.285)	(0.272)	(0.196)	(0.196)	(0.199)
BP/BPw1	99.9	99.7	99.5	99.8	99.5	99.4
TS1	1.021	1.04	1.023	1.026	1.024	1.027
	(0.278)	(0.284)	(0.271)	(0.196)	(0.195)	(0.198)
BP/TS1	100.1	100	99.7	100.1	99.9	99.7
TC	1.02	1.038	1.019	1.025	1.021	1.024
	(0.277)	(0.284)	(0.27)	(0.196)	(0.195)	(0.197)
BP/TC	100.2	100.2	100.1	100.2	100.2	100
	1.021	1.039	1.022	1.026	1.023	1.026
TC1	(0.277)	(0.285)	(0.271)	(0.196)	(0.195)	(0.198)
BP/TC1	100.1	100.1	99.8	100.1	100	99.8

Comments:

$BP$, $BPw$, $TS$ and $TC$ are nearly similar and fit perfectly well with $Y$.
$BP1$, $BPw1$, $BPn1$ are nearly similar
When the data have been simulated with $W$ dense, one could use $W$ sparse or dense for predicting and all the predictors will yield a good fit.

The 6 configurations when $W_1$ is used for simulating, $W_2$ is used for estimating and predicting ($\sigma=1$ and $EM=FALSE$) (Not included in the article)

predictors	config 1 (27)	config 2 (27)	config 3 (27)	config 1 (54)	config 2 (54)	config 3 (54)
BP	1.045	1.071	1.063	1.052	1.055	1.064
	(0.285)	(0.299)	(0.287)	(0.204)	(0.204)	(0.209)
BPn	1.045	1.071	1.063	1.052	1.055	1.064
	(0.285)	(0.299)	(0.287)	(0.204)	(0.204)	(0.209)
BP/BPn	100	100	100	100	100	100
BPw	1.045	1.072	1.064	1.052	1.055	1.064
	(0.285)	(0.3)	(0.287)	(0.204)	(0.204)	(0.209)
BP/BPw	100	99.9	99.9	100	100	100
BP1	1.044	1.074	1.064	1.051	1.054	1.064
	(0.285)	(0.301)	(0.288)	(0.203)	(0.204)	(0.209)
BP/BP1	100.1	99.7	99.9	100.1	100.1	100
BPn1	1.043	1.074	1.063	1.05	1.054	1.064
	(0.285)	(0.3)	(0.287)	(0.203)	(0.204)	(0.209)
BP/BPn1	100.2	99.7	100	100.2	100.1	100
BPw1	1.048	1.079	1.066	1.054	1.057	1.067
	(0.286)	(0.302)	(0.287)	(0.204)	(0.204)	(0.209)
BP/BPw1	99.7	99.3	99.7	99.8	99.8	99.7
TS1	1.053	1.084	1.069	1.059	1.062	1.071
	(0.286)	(0.302)	(0.288)	(0.205)	(0.205)	(0.21)
BP/TS1	99.2	98.8	99.4	99.3	99.3	99.3
TC	1.081	1.094	1.079	1.078	1.076	1.081
	(0.292)	(0.302)	(0.29)	(0.21)	(0.21)	(0.212)
BP/TC	96.7	97.9	98.5	97.6	98	98.4
	1.075	1.106	1.082	1.078	1.08	1.085
TC1	(0.291)	(0.308)	(0.291)	(0.21)	(0.21)	(0.214)
BP/TC1	97.2	96.8	98.2	97.6	97.7	98.1

Comments:

When we estimate and predict with $W_2$ (whereas the ``true’’ spatial weight matrix is $W_1$), we can observe that the MSE are larger than the case we estimate and predict with $W_1$ (see table 1 in section Simulation algorithms and results)

If we look at the ranking, we can observe that there are more differences between the predictors:

$BP \simeq BPn$ < $TS$ < $TC$
$BP1 \simeq BPn1$ < $BPw1$

i. Influence of the number of missing values in the neighborhood of $O$ (Table 8 in the article)

We define the structure of the missing values depending on the number of neighbours.

  config.0.missing <- 220
  set.seed(12345)
  samp.neighbourhood <- sample(W1.listw$neighbours[[config.0.missing]])
  config.1.missing <- c(config.0.missing, samp.neighbourhood[1])
  config.2.missing <- c(config.1.missing, samp.neighbourhood[2])
  config.3.missing <- c(config.2.missing, samp.neighbourhood[3])
  config.4.missing <- c(config.3.missing, samp.neighbourhood[4])
  config.5.missing <- c(config.4.missing, samp.neighbourhood[5])
  config.6.missing <- c(config.5.missing, samp.neighbourhood[6])
  config.7.missing <- c(config.6.missing, samp.neighbourhood[7])
  config.8.missing <- c(config.7.missing, samp.neighbourhood[8])
  config.9.missing <- c(config.8.missing, samp.neighbourhood[9])

We plot in yellow the unit we want to predict and in red the neighborhood which will be supposed missing. It means that when using $W_1$ (the sparsest), and when considering 9 neighbours as missing, we will use only one neighbour to predict the yellow unit.

plot(mapMAP)
plot(mapMAP[config.9.missing,], col = "red", add = T)
plot(mapMAP[220,], col = "yellow", add = T)

We look at the prediction depending on the 2 spatial weight matrices $W_1$ and $W_2$

list.param <- list(W=list(W1.listw, W2.listw), sigma=1, 
  config = list(config.0.missing, config.1.missing, 
   config.2.missing, config.3.missing, config.4.missing, 
   config.5.missing, config.6.missing, config.7.missing,
   config.8.missing, config.9.missing),
  EM = FALSE)
save(list.param, beta0, x, 
     file=paste(path.dir, "parameters/param-missing.RData", 
      sep=""))

W	missing	BP	sd	BP/BPn	BP/BPw	BP/BP1	BP/BPn1	BP/BPw1	BP/TS1	BP/TC	BP/TC1
$W_1$	0	0.992	(1.443)	100	99.7	100	100	99.7	98.7	94.4	94.4
$W_1$	1	0.991	(1.445)	100	99.4	99.8	99.7	99.3	98.3	94.4	94
$W_1$	2	0.991	(1.445)	100	99.7	99.3	99.3	99.1	98	94.4	93.4
$W_1$	3	1.01	(1.47)	99.9	99.7	97.1	97.1	97.1	96.5	95.8	92.2
$W_1$	4	1.013	(1.487)	99.9	99.9	93.8	93.8	93.6	93.2	96	89.3
$W_1$	5	1.023	(1.495)	99.9	99.9	92.9	92.8	92.8	93.3	97.1	90.6
$W_1$	6	1.029	(1.5)	99.9	99.8	92.4	92.2	92.2	93.5	97.6	92.5
$W_1$	7	1.026	(1.495)	100	100	91	91	90.9	92.2	97.3	91.4
$W_1$	8	1.037	(1.502)	100.1	100	82.6	82.4	82.4	84.2	98.3	86.3
$W_1$	9	1.037	(1.503)	100	100	82.4	82.4	82.4	84.7	98.3	90
$W_2$	0	0.999	(1.45)	100	100	100	100	100	100	99.7	99.7
$W_2$	1	0.999	(1.449)	100	100	100	100	100	100	99.8	99.7
$W_2$	2	0.999	(1.449)	100	100	100	100	100	100	99.8	99.7
$W_2$	3	1.002	(1.453)	100	100	99.9	99.9	99.9	100	99.9	99.8
$W_2$	4	1.002	(1.457)	100	100	99.9	99.9	99.9	99.9	99.8	99.7
$W_2$	5	1.002	(1.458)	100	100	99.9	99.9	100	100	99.9	99.8
$W_2$	6	1.002	(1.457)	100	100	100	100	100	100	99.9	99.9
$W_2$	7	1.001	(1.455)	100	100.1	100	100	100.1	100	99.7	99.7
$W_2$	8	1.002	(1.455)	100	100	100	100	100.1	100	99.8	99.8
$W_2$	9	1.001	(1.454)	100	100	100	100	100	100	99.7	99.7

R codes for the paper About predictions in spatial autoregressive models: Optimal and almost optimal strategies

Before starting

1. Preparation of the data

2. Simulation framework for in-sample predictions

a. Simulation of a SAR process

The different choices of parameters

b. Computing the predictors with spdep

c. Results

3. Simulation framework for out-of-sample predictions

a. Simulation process

27 out-of-sample units

54 out-of-sample units

Graphical representation of the 27 and 54 out-of-sample units

b. Out-of-sample predictors formulae

c. Computing the predictions with spdep

d. The different choices of parameters in the simulations

e. EM or not EM: that is the question?

\(\sigma=1\), \(W_1\)

\(\sigma=1\), \(W_2\)

\(\sigma=3\), \(W_1\)

\(\sigma=3\), \(W_2\)

f. Simulation algorithms and results when \(\rho = 0.35\)

The 6 configurations with \(\sigma=1\), \(W_1\) and \(EM=FALSE\) (Table 6 in the article)

The 6 configurations with \(\sigma=1\), \(W_2\) and \(EM=FALSE\) (Table 7 in the article)

The 6 configurations with \(\sigma=1\), \(W_1\) and \(EM=TRUE\) (Not included in the article)

The 6 configurations with \(\sigma=3\), \(W_1\) and \(EM=FALSE\) (Not included in the article)

Conclusion of this section

g. Simulation algorithms and results when \(\rho = -0.2\)

h. Mispecification of the weight matrix

The 6 configurations when \(W_2\) is used for simulating and \(W_1\) is used for estimating and predicting (\(\sigma=1\) and \(EM=FALSE\)) (Not included in the article)

The 6 configurations when \(W_1\) is used for simulating, \(W_2\) is used for estimating and predicting (\(\sigma=1\) and \(EM=FALSE\)) (Not included in the article)

i. Influence of the number of missing values in the neighborhood of \(O\) (Table 8 in the article)

W	missing	BP	sd	BP/BPn	BP/BPw	BP/BP1	BP/BPn1	BP/BPw1	BP/TS1	BP/TC	BP/TC1
\(W_1\)	0	0.992	(1.443)	100	99.7	100	100	99.7	98.7	94.4	94.4
\(W_1\)	1	0.991	(1.445)	100	99.4	99.8	99.7	99.3	98.3	94.4	94
\(W_1\)	2	0.991	(1.445)	100	99.7	99.3	99.3	99.1	98	94.4	93.4
\(W_1\)	3	1.01	(1.47)	99.9	99.7	97.1	97.1	97.1	96.5	95.8	92.2
\(W_1\)	4	1.013	(1.487)	99.9	99.9	93.8	93.8	93.6	93.2	96	89.3
\(W_1\)	5	1.023	(1.495)	99.9	99.9	92.9	92.8	92.8	93.3	97.1	90.6
\(W_1\)	6	1.029	(1.5)	99.9	99.8	92.4	92.2	92.2	93.5	97.6	92.5
\(W_1\)	7	1.026	(1.495)	100	100	91	91	90.9	92.2	97.3	91.4
\(W_1\)	8	1.037	(1.502)	100.1	100	82.6	82.4	82.4	84.2	98.3	86.3
\(W_1\)	9	1.037	(1.503)	100	100	82.4	82.4	82.4	84.7	98.3	90
\(W_2\)	0	0.999	(1.45)	100	100	100	100	100	100	99.7	99.7
\(W_2\)	1	0.999	(1.449)	100	100	100	100	100	100	99.8	99.7
\(W_2\)	2	0.999	(1.449)	100	100	100	100	100	100	99.8	99.7
\(W_2\)	3	1.002	(1.453)	100	100	99.9	99.9	99.9	100	99.9	99.8
\(W_2\)	4	1.002	(1.457)	100	100	99.9	99.9	99.9	99.9	99.8	99.7
\(W_2\)	5	1.002	(1.458)	100	100	99.9	99.9	100	100	99.9	99.8
\(W_2\)	6	1.002	(1.457)	100	100	100	100	100	100	99.9	99.9
\(W_2\)	7	1.001	(1.455)	100	100.1	100	100	100.1	100	99.7	99.7
\(W_2\)	8	1.002	(1.455)	100	100	100	100	100.1	100	99.8	99.8
\(W_2\)	9	1.001	(1.454)	100	100	100	100	100	100	99.7	99.7