1 Prerequisites
2 Simulation study
3 Selection between the Gaussian and IT models
- 3.1 Computations of Test statistics
- 3.2 Examples

We provide the data and the R code used in the article “Multivariate Student versus multivariate Gaussian regression models : applications to finance and political economy” so that readers may reproduce all the figures, tables and statistics presented in the article with the R software.

If you use this code, please cite:

Nguyen T.H.A, Ruiz-Gazen, A., Thomas-Agnan C. and T. Laurent (2019). Multivariate Student versus Multivariate Gaussian Regression Models with Application to Finance. Journal of Risk and Financial Management, 12(1), 28.

1 Prerequisites

Required packages:

install.packages(c("compositions", "mvnfast", "quantmod", "plot3D", "sp"))

Loading packages:

require("compositions") # compositional data
require("mvnfast") # multivariate Student distribution 
require("quantmod")  # import financial data
require("plot3D") # plot distribution in 3D
require("sp") # spatial data

Information about the current R session :

sessionInfo()

## R version 3.5.2 (2018-12-20)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
##  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
##  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] sp_1.3-1            plot3D_1.1.1        quantmod_0.4-13    
##  [4] TTR_0.23-4          xts_0.11-2          zoo_1.8-4          
##  [7] mvnfast_0.2.5       compositions_1.40-2 bayesm_3.1-1       
## [10] energy_1.7-5        robustbase_0.93-3   tensorA_0.36.1     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0      knitr_1.21      magrittr_1.5    misc3d_0.8-4   
##  [5] lattice_0.20-38 stringr_1.3.1   tools_3.5.2     grid_3.5.2     
##  [9] xfun_0.4        htmltools_0.3.6 yaml_2.2.0      digest_0.6.18  
## [13] curl_3.2        evaluate_0.12   rmarkdown_1.11  stringi_1.2.4  
## [17] compiler_3.5.2  DEoptimR_1.0-8  boot_1.3-20

2 Simulation study

This section demonstrates how to obtain the results presented in section 3 of the article. We first present our functions which can be adapted to other frameworks different from our simulation process.

2.1 Simulation of \(Y\)

The function simu_multi_y() simulates a multivariate \(Y\) of the form \(Y=X\beta + \epsilon\) where \(\epsilon\) follows either a multivariate Gaussian (method_simulate = “N”), ot the Independent multivariate Student (method_simulate = “IT”) or the Uncorrelated multivariate Student (method_simulate = “UT”) distributions.

Input arguments are :

X, the matrix of explanatory variables of size \(n \times K\),
beta_true, the \(\beta\) matrix of size \(K\times L\) : \[\left(\begin{array}{cc} \beta_{11} & \ldots & \beta_{1L}\\ \beta_{21} & \ldots & \beta_{2L} \\ \vdots & & \vdots \\ \beta_{K1} & \ldots & \beta_{KL} \end{array}\right)\]
method_simulate, the simulation method (a character among “N”, “IT”, “UT”),
Sigma, a matrix of size \(L \times L\),
nu, degrees of freedom, for Student distribution only.

The function returns a matrix of size \(n \times L\). To load the function:

source(file("http://www.thibault.laurent.free.fr/code/jrfm/R/simu_multi_y.R"))

2.1.1 Examples

We consider the example used in the article: for each of the 3 distributions “N”, “IT”, and “UT”, we simulate a multivariate vector \(Y\) for different values of the dimension \(L\).

2.1.1.1 When \(L = 2\)

In our article, we choose \(Y\) of size \(1000 \times 2\) and one explanatory variable (plus the constant).

n_simu <- 1000
p_simu <- 2

L_simu <- 2

The explanatory variable follows a Gaussian distribution \(N(45,10)\):

set.seed(1234)
X_simu <- cbind(1, rnorm(n_simu, 45, 10))

The \(\beta\) matrix is \[\left(\begin{array}{cc} 2 & 4\\ 3 & -3\end{array}\right)\]

beta_true <- matrix(c(2, 4, 3, -3), 
                    nrow = p_simu, ncol = L_simu)

and the \(\Sigma\) matrix is \[\left(\begin{array}{cc} 2 & 0.5\\ 0.5 & 1\end{array}\right)\]

Sigma <- matrix(c(2, 0.5, 0.5, 1),
                nrow = L_simu, ncol = L_simu)

2.1.1.1.1 Simulation of toy data

The following simulated data will be used in section 4 of the article.

Simulation of a “N” distribution

set.seed(123)
y_N <- simu_multi_y(X = X_simu, beta_true = beta_true, method_simulate = "N",
                         Sigma = Sigma)

Simulation of an “IT” distribution with \(\nu = 3\)

y_IT_3 <- simu_multi_y(X = X_simu, beta_true = beta_true, method_simulate = "IT",
                         Sigma = Sigma, nu = 3)

Simulation of an “IT” distribution with \(\nu = 4\)

y_IT_4 <- simu_multi_y(X = X_simu, beta_true = beta_true, method_simulate = "IT",
                         Sigma = Sigma, nu = 4)

Simulation of an “UT” distribution with \(\nu = 3\)

y_UT <- simu_multi_y(X = X_simu, beta_true = beta_true, method_simulate = "UT",
                         Sigma = Sigma, nu = 3)

We plot the four distributions in 2D :

op <- par(mfrow = c(2, 2))
plot(y_N, xlab = "Y_1", ylab = "Y_2", main = "N distribution")
plot(y_IT_3, xlab = "Y_1", ylab = "Y_2", main = "IT distribution (nu = 3)")
plot(y_IT_4, xlab = "Y_1", ylab = "Y_2", main = "IT distribution (nu = 4)")
plot(y_UT, xlab = "Y_1", ylab = "Y_2", main = "UT distribution (nu = 3)")

par(op)

2.1.1.2 When \(L = 3\)

We simulate another sample of dimension \(1000 \times 3\).

L_3 <- 3

We keep the same explanatatory variable \(X\). However, the \(\beta\) matrix is now equal to : \[\left(\begin{array}{ccc} 2 & 4 & 1\\ 3 & -3 & 2\end{array}\right)\]

and the \(\Sigma\) matrix is \[\left(\begin{array}{ccc} 50 & 25 & 25\\ 25 & 50& 25 \\ 25 & 25 & 50\end{array}\right)\]

beta_true_3 <- matrix(c(2, 4, 3, -3, 1, 2), 
                    nrow = p_simu, ncol = L_3)
Sigma_3 <- matrix(0.5, nrow = L_3, ncol = L_3)
diag(Sigma_3) <- c(2, 1, 2)

Simulation of a “N” distribution

y_N_3d <- simu_multi_y(X = X_simu, beta_true = beta_true_3, method_simulate = "N",
                         Sigma = Sigma_3)

Simulation of an “IT” distribution

y_IT_3d <- simu_multi_y(X = X_simu, beta_true = beta_true_3, method_simulate = "IT",
                         Sigma = Sigma_3, nu = 3)

Simulation of an “UT” distribution

y_UT_3d <- simu_multi_y(X = X_simu, beta_true = beta_true_3, method_simulate = "UT",
                         Sigma = Sigma_3, nu = 3)

We plot the three distributions in 3D :

require("plot3D")
op <- par(mfrow = c(1, 3))
scatter3D(y_N_3d[, 1], y_N_3d[, 2], y_N_3d[, 3], colvar = NULL, pch = 1, 
          col = "royalblue", phi = 0, bty ="g")
scatter3D(y_IT_3d[, 1], y_IT_3d[, 2], y_IT_3d[, 3], colvar = NULL, pch = 1, 
          col = "royalblue", phi = 0, bty ="g")
scatter3D(y_UT_3d[, 1], y_UT_3d[, 2], y_UT_3d[, 3], colvar = NULL, pch = 1, 
          col = "royalblue", phi = 0, bty ="g")

par(op)

2.2 Estimation

2.2.1 Estimation of the parameters of an IT model

The function estimate_IT() estimates the coefficients associated to the “IT” model. The algorithm is based on Kelejian and Prucha (1984).

Input arguments are :

Y, a matrix of size \(n \times L\)
X, a matrix of explanatory variables of size \(n \times K\),
nu, a scalar, the degree of freedom of Student distribution,
eps_precision, a scalar, a tolerance threshold used in the iterative algorithm.

The function returns a list with arguments :

hat_beta, the vector of estimated parameters of size \(K\)
hat_sigma, the estimated covariance matrix of size \(L \times L\),
residuals, the matrix of residuals of size \(n \times L\),

To load the function:

source(file("http://www.thibault.laurent.free.fr/code/jrfm/R/estimate_IT.R"))

Examples:

When \(L=2\) :

res_IT <- estimate_IT(y_IT_3, X_simu, nu = 3, eps_precision = 10^(-8))

When \(L=3\) :

res_IT_3d <- estimate_IT(y_IT_3d, X_simu, nu = 3, eps_precision = 10^(-8))

2.3 Simulations

In this section, we present how to reproduce the tables included in the article. Users should first load the function simu_jrfm():

source(file("http://www.thibault.laurent.free.fr/code/jrfm/R/simu_jrfm.R"))

The function simu_jrfm() repeats B times the following procedure

1. simulation of a multivariate sample: 7 cases are treated (method_simulate = “N”, method_simulate = “UT” with nu_D = 3, method_simulate = “UT” with nu_D = 4, method_simulate = “UT” with nu_D = 5, method_simulate = “IT” with nu_D = 3, method_simulate = “IT” with nu_D = 4 and method_simulate = “IT” with nu_D = 5.
1. For each sample, we estimate using models: “N”, “UT” or IT" (with different degrees of freedom).

We have implemented parallel computations for getting a better performance. By using 50 cores, the computation time is equal to 24 hours.

NB: some results presented in this supplementary material slightly differ from the original tables presented in the article because of the random seeds used which were not exactly the same in the article and in the supplement material.

B_per_core <- 286 # number of replications per core 
myfun_par <- function(core) {
  simu_jrfm(core, X_simu, beta_true, Sigma, B_per_core, eps_precision = 10^(-8))
}

library("parallel")
P <- 35 # number of cores
cl <- makeCluster(P) 
clusterExport(cl, c("X_simu", "beta_true", "Sigma", "estimate_IT", 
                    "simu_multi_y", "simu_jrfm", "B_per_core"))
clusterEvalQ(cl, library("mvnfast"))
system.time(res_par <- clusterApply(cl, 1:P, fun = myfun_par)) 
stopCluster(cl)

We first aggregate the simulation results by biais and MSE:

table_mse_beta <- matrix(0, 4 * 6, 7) # intermediate table
table_biais_beta <- matrix(0, 4 * 6, 7) # intermediate table
table_mse_sigma <- matrix(0, 3 * 6, 7) # intermediate table
table_biais_sigma <- matrix(0, 3 * 6, 7) # intermediate table
biais_UT_UT_beta <- numeric(4) # intermediate vector
mse_UT_UT_beta <- numeric(4) # intermediate vector
biais_UT_UT_sigma <- numeric(3) # intermediate vector
mse_UT_UT_sigma <- numeric(3) # intermediate vector
# matrix of beta 
beta_true_matrix <- matrix(as.vector(beta_true), 4 * 6, 7)
# vector and matrix of sigma
true_s1 <- Sigma[1, 1]
true_s2 <- Sigma[2, 2]
true_rho <- Sigma[1, 2]/(sqrt(true_s1 * true_s2))
sigma_true <- c(true_rho, true_s1, true_s2)
sigma_true_matrix <- matrix(as.vector(sigma_true), 3 * 6, 7)
# compute the biais and MSE
for (k in 1:P) {
  # MSE 
  table_mse_beta <- table_mse_beta + res_par[[k]]$table_simu_beta_square - 
    2 * beta_true_matrix * res_par[[k]]$table_simu_beta
  mse_UT_UT_beta <- mse_UT_UT_beta + res_par[[k]]$UT_UT_beta_square - 
    2 * as.vector(beta_true) * res_par[[k]]$UT_UT_beta 
  table_mse_sigma <- table_mse_sigma + res_par[[k]]$table_simu_sigma_square - 
    2 * sigma_true_matrix * res_par[[k]]$table_simu_sigma
  mse_UT_UT_sigma <- mse_UT_UT_sigma + res_par[[k]]$UT_UT_sigma_square - 
    2 * as.vector(sigma_true) * res_par[[k]]$UT_UT_sigma 
  # biais
  table_biais_beta <- table_biais_beta + 
    (res_par[[k]]$table_simu_beta - B_per_core * beta_true_matrix)
  biais_UT_UT_beta <- biais_UT_UT_beta + 
    (res_par[[k]]$UT_UT_beta - B_per_core * as.vector(beta_true)) 
  table_biais_sigma <- table_biais_sigma + 
    (res_par[[k]]$table_simu_sigma - B_per_core * sigma_true_matrix)
  biais_UT_UT_sigma <- biais_UT_UT_sigma + 
    (res_par[[k]]$UT_UT_sigma - B_per_core * as.vector(sigma_true)) 
}
table_biais_beta <- table_biais_beta / (B_per_core * P)
biais_UT_UT_beta <- biais_UT_UT_beta / (B_per_core * P)
table_biais_sigma <- table_biais_sigma / (B_per_core * P)
biais_UT_UT_sigma <- biais_UT_UT_sigma / (B_per_core * P)
table_mse_beta <- table_mse_beta / (B_per_core * P) + beta_true_matrix^2
mse_UT_UT_beta <- mse_UT_UT_beta / (B_per_core * P) + as.vector(beta_true^2)
table_mse_sigma <- table_mse_sigma / (B_per_core * P) + sigma_true_matrix^2
mse_UT_UT_sigma <- mse_UT_UT_sigma / (B_per_core * P) + as.vector(sigma_true^2)

Then, we obtain the different tables and figures presented in the article:

table_2 <- matrix(0, 4 * 2, 6) # table 2 in the article
table_3 <- matrix(0, 4, 6) # table 3 in the article
table_4 <- matrix(0, 4 * 4, 7) # table 4 in the article
table_5 <- matrix(0, 3 * 2, 6) # table 5 in the article  
table_6 <- matrix(0, 3 * 4, 7) # table 6 in the article 
table_7 <- matrix(0, 3 * 4, 7) # table 7 in the article

2.3.1 Table 2

# DGP = "N"
table_2[1:4, 1] <- (100 * table_biais_beta[1:4, 1] / as.vector(beta_true)) 
table_2[1:4, 2] <- sqrt(table_mse_beta[1:4, 1] / table_mse_beta[1:4, 1]) 
table_2[5:8, 1] <- (100 * table_biais_beta[5:8, 1] / as.vector(beta_true))
table_2[5:8, 2] <- sqrt(table_mse_beta[5:8, 1] / table_mse_beta[1:4, 1]) 
# DGP = "UT"
table_2[1:4, 3] <- (100 * table_biais_beta[1:4, 2] / as.vector(beta_true)) 
table_2[1:4, 4] <- sqrt(table_mse_beta[1:4, 2] / table_mse_beta[1:4, 2]) 
table_2[5:8, 3] <- (100 * table_biais_beta[5:8, 2] / as.vector(beta_true))
table_2[5:8, 4] <- sqrt(table_mse_beta[5:8, 2] / table_mse_beta[1:4, 2]) 
# DGP = "IT"
table_2[1:4, 5] <- (100 * table_biais_beta[1:4, 5] / as.vector(beta_true)) 
table_2[1:4, 6] <- sqrt(table_mse_beta[1:4, 5] / table_mse_beta[5:8, 5]) 
table_2[5:8, 5] <- (100 * table_biais_beta[5:8, 5] / as.vector(beta_true))
table_2[5:8, 6] <- sqrt(table_mse_beta[5:8, 5] / table_mse_beta[5:8, 5]) 
round(table_2, 4)

##         [,1]   [,2]    [,3]   [,4]    [,5]   [,6]
## [1,] -0.0694 1.0000 -0.0636 1.0000 -0.0893 1.4765
## [2,]  0.0006 1.0000  0.0004 1.0000  0.0008 1.4803
## [3,] -0.0220 1.0000 -0.0108 1.0000 -0.0670 1.4640
## [4,] -0.0006 1.0000 -0.0001 1.0000 -0.0013 1.4616
## [5,] -0.0854 1.0386 -0.0853 1.0912 -0.0333 1.0000
## [6,]  0.0007 1.0373  0.0007 1.0906  0.0002 1.0000
## [7,] -0.0398 1.0671 -0.0177 1.0769 -0.0339 1.0000
## [8,] -0.0010 1.0670 -0.0002 1.0762 -0.0007 1.0000

xtable::xtable(table_2)

## % latex table generated in R 3.5.2 by xtable 1.8-3 package
## % Wed Feb 13 11:16:32 2019
## \begin{table}[ht]
## \centering
## \begin{tabular}{rrrrrrr}
##   \hline
##  & 1 & 2 & 3 & 4 & 5 & 6 \\ 
##   \hline
## 1 & -0.07 & 1.00 & -0.06 & 1.00 & -0.09 & 1.48 \\ 
##   2 & 0.00 & 1.00 & 0.00 & 1.00 & 0.00 & 1.48 \\ 
##   3 & -0.02 & 1.00 & -0.01 & 1.00 & -0.07 & 1.46 \\ 
##   4 & -0.00 & 1.00 & -0.00 & 1.00 & -0.00 & 1.46 \\ 
##   5 & -0.09 & 1.04 & -0.09 & 1.09 & -0.03 & 1.00 \\ 
##   6 & 0.00 & 1.04 & 0.00 & 1.09 & 0.00 & 1.00 \\ 
##   7 & -0.04 & 1.07 & -0.02 & 1.08 & -0.03 & 1.00 \\ 
##   8 & -0.00 & 1.07 & -0.00 & 1.08 & -0.00 & 1.00 \\ 
##    \hline
## \end{tabular}
## \end{table}

2.3.2 Table 3

table_3[, 1] <- table_biais_beta[1:4, 1]
table_3[, 2] <- table_mse_beta[1:4, 1]
table_3[, 3] <- biais_UT_UT_beta
table_3[, 4] <- mse_UT_UT_beta
table_3[, 5] <- table_biais_beta[5:8, 5]
table_3[, 6] <- table_mse_beta[5:8, 5]
round(table_3, 8)

##             [,1]       [,2]        [,3]       [,4]        [,5]       [,6]
## [1,] -0.00138779 0.04574210 -0.00127195 0.03717886 -0.00066525 0.01992857
## [2,]  0.00002415 0.00002175  0.00001470 0.00001757  0.00000990 0.00000948
## [3,] -0.00066148 0.02156859 -0.00032319 0.02045729 -0.00101639 0.00983604
## [4,]  0.00001865 0.00001023  0.00000394 0.00000962  0.00002137 0.00000471

xtable::xtable(table_3, digits = 7)

## % latex table generated in R 3.5.2 by xtable 1.8-3 package
## % Wed Feb 13 11:16:32 2019
## \begin{table}[ht]
## \centering
## \begin{tabular}{rrrrrrr}
##   \hline
##  & 1 & 2 & 3 & 4 & 5 & 6 \\ 
##   \hline
## 1 & -0.0013878 & 0.0457421 & -0.0012719 & 0.0371789 & -0.0006652 & 0.0199286 \\ 
##   2 & 0.0000241 & 0.0000218 & 0.0000147 & 0.0000176 & 0.0000099 & 0.0000095 \\ 
##   3 & -0.0006615 & 0.0215686 & -0.0003232 & 0.0204573 & -0.0010164 & 0.0098360 \\ 
##   4 & 0.0000187 & 0.0000102 & 0.0000039 & 0.0000096 & 0.0000214 & 0.0000047 \\ 
##    \hline
## \end{tabular}
## \end{table}

2.3.3 Table 4

for (j in 1:7) {
  if (j %in% c(1, 2, 3, 4)) {
    true_denom <- table_mse_beta[1:4, j]
  } else {
    if (j == 5) {
      true_denom <- table_mse_beta[5:8, 5]
    } else {
      if (j == 6) {
        true_denom <- table_mse_beta[9:12, 6]
      } else {
        true_denom <- table_mse_beta[13:16, 7]
      }
    }
  }
  table_4[1:4, j] <- sqrt(table_mse_beta[1:4, j] / true_denom) 
  table_4[5:8, j] <- sqrt(table_mse_beta[5:8, j] / true_denom) 
  table_4[9:12, j] <- sqrt(table_mse_beta[9:12, j] / true_denom) 
  table_4[13:16, j] <- sqrt(table_mse_beta[13:16, j] / true_denom)  
}
round(table_4, 4)

##         [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
##  [1,] 1.0000 1.0000 1.0000 1.0000 1.4765 1.2235 1.1378
##  [2,] 1.0000 1.0000 1.0000 1.0000 1.4803 1.2252 1.1390
##  [3,] 1.0000 1.0000 1.0000 1.0000 1.4640 1.2196 1.1304
##  [4,] 1.0000 1.0000 1.0000 1.0000 1.4616 1.2180 1.1319
##  [5,] 1.0386 1.0912 1.0879 1.0809 1.0000 1.0039 1.0092
##  [6,] 1.0373 1.0906 1.0880 1.0785 1.0000 1.0042 1.0093
##  [7,] 1.0671 1.0769 1.0973 1.0843 1.0000 1.0038 1.0096
##  [8,] 1.0670 1.0762 1.0934 1.0855 1.0000 1.0041 1.0095
##  [9,] 1.0159 1.0655 1.0637 1.0568 1.0044 1.0000 1.0017
## [10,] 1.0146 1.0648 1.0637 1.0547 1.0042 1.0000 1.0017
## [11,] 1.0429 1.0554 1.0712 1.0608 1.0045 1.0000 1.0017
## [12,] 1.0427 1.0545 1.0680 1.0618 1.0044 1.0000 1.0017
## [13,] 1.0020 1.0499 1.0488 1.0422 1.0120 1.0016 1.0000
## [14,] 1.0007 1.0491 1.0487 1.0403 1.0118 1.0015 1.0000
## [15,] 1.0281 1.0421 1.0549 1.0463 1.0125 1.0017 1.0000
## [16,] 1.0278 1.0411 1.0522 1.0471 1.0122 1.0014 1.0000

xtable::xtable(table_4, digits = 2)

## % latex table generated in R 3.5.2 by xtable 1.8-3 package
## % Wed Feb 13 11:16:32 2019
## \begin{table}[ht]
## \centering
## \begin{tabular}{rrrrrrrr}
##   \hline
##  & 1 & 2 & 3 & 4 & 5 & 6 & 7 \\ 
##   \hline
## 1 & 1.00 & 1.00 & 1.00 & 1.00 & 1.48 & 1.22 & 1.14 \\ 
##   2 & 1.00 & 1.00 & 1.00 & 1.00 & 1.48 & 1.23 & 1.14 \\ 
##   3 & 1.00 & 1.00 & 1.00 & 1.00 & 1.46 & 1.22 & 1.13 \\ 
##   4 & 1.00 & 1.00 & 1.00 & 1.00 & 1.46 & 1.22 & 1.13 \\ 
##   5 & 1.04 & 1.09 & 1.09 & 1.08 & 1.00 & 1.00 & 1.01 \\ 
##   6 & 1.04 & 1.09 & 1.09 & 1.08 & 1.00 & 1.00 & 1.01 \\ 
##   7 & 1.07 & 1.08 & 1.10 & 1.08 & 1.00 & 1.00 & 1.01 \\ 
##   8 & 1.07 & 1.08 & 1.09 & 1.09 & 1.00 & 1.00 & 1.01 \\ 
##   9 & 1.02 & 1.07 & 1.06 & 1.06 & 1.00 & 1.00 & 1.00 \\ 
##   10 & 1.01 & 1.06 & 1.06 & 1.05 & 1.00 & 1.00 & 1.00 \\ 
##   11 & 1.04 & 1.06 & 1.07 & 1.06 & 1.00 & 1.00 & 1.00 \\ 
##   12 & 1.04 & 1.05 & 1.07 & 1.06 & 1.00 & 1.00 & 1.00 \\ 
##   13 & 1.00 & 1.05 & 1.05 & 1.04 & 1.01 & 1.00 & 1.00 \\ 
##   14 & 1.00 & 1.05 & 1.05 & 1.04 & 1.01 & 1.00 & 1.00 \\ 
##   15 & 1.03 & 1.04 & 1.05 & 1.05 & 1.01 & 1.00 & 1.00 \\ 
##   16 & 1.03 & 1.04 & 1.05 & 1.05 & 1.01 & 1.00 & 1.00 \\ 
##    \hline
## \end{tabular}
## \end{table}

2.3.4 Figure 1

data_plot <- data.frame(
  nu_MLE = c(3, 4, 5, 10, 20),
  N_DGP = sqrt(table_mse_beta[4 + c(4, 8, 12, 16, 20), 1] / 
                 table_mse_beta[4, 1]),
  IT_DGP_3 = sqrt(table_mse_beta[4 + c(4, 8, 12, 16, 20), 5] / 
                    table_mse_beta[8, 5]),
  UT_DGP_3 = sqrt(table_mse_beta[4 + c(4, 8, 12, 16, 20), 2] / 
                    table_mse_beta[4, 2]),
  IT_DGP_4 = sqrt(table_mse_beta[4 + c(4, 8, 12, 16, 20), 6] / 
                    table_mse_beta[8, 6]),
  UT_DGP_4 = sqrt(table_mse_beta[4 + c(4, 8, 12, 16, 20), 3] / 
                    table_mse_beta[4, 3]),
  IT_DGP_5 = sqrt(table_mse_beta[4 + c(4, 8, 12, 16, 20), 7] / 
                    table_mse_beta[8, 7]),
  UT_DGP_5 = sqrt(table_mse_beta[4 + c(4, 8, 12, 16, 20), 4] / 
                    table_mse_beta[4, 4])
)

# pdf("./figures_article/RRMSE_beta12.pdf", width = 11, height = 5)
op <- par(mfrow = c(1, 3), las = 1, oma = c(0, 0, 0, 0), 
          mar = c(4, 3.1, 2, 1), mgp = c(1.5, 0.5, 0))
# nu_DGP = 3
plot(UT_DGP_3 ~ nu_MLE, data = data_plot, type = "l", lty = 1, 
     col = "red", ylim = c(0.99, 1.1), ylab = "RRMSE", 
     xlab = expression(paste(nu,"_"[MLE])), lwd = 2,
     main = expression(paste(nu,"_"[DGP], " = 3")))
lines(IT_DGP_3 ~ nu_MLE, data = data_plot, lwd = 2, lty = 2, col = "green")
lines(N_DGP ~ nu_MLE, data = data_plot, lwd = 2, lty = 3, col = "blue")
legend("topleft", lty = c(1, 2, 3), legend = c("UT DGP", "IT DGP", "N DGP"),
       col = c(2, 3, 4), cex = 1, title = "RRMSE")

# nu_DGP = 4
plot(UT_DGP_4 ~ nu_MLE, data = data_plot, type = "l", lty = 1, 
     col = "red", ylim = c(0.99, 1.1), lwd = 2, ylab = "RRMSE", 
     xlab = expression(paste(nu,"_"[MLE])), 
     main = expression(paste(nu,"_"[DGP], " = 4")))
lines(IT_DGP_4 ~ nu_MLE, data = data_plot, lwd = 2, lty = 2, col = "green")
lines(N_DGP ~ nu_MLE, data = data_plot, lwd = 2, lty = 3, col = "blue")
# nu_DGP = 5
plot(UT_DGP_5 ~ nu_MLE, data = data_plot, type = "l", lty = 1, 
     col = "red", ylim = c(0.99, 1.1), ylab = "RRMSE", lwd = 2,
     xlab = expression(paste(nu,"_"[MLE])), 
     main = expression(paste(nu,"_"[DGP], " = 5")))
lines(IT_DGP_5 ~ nu_MLE, data = data_plot, lwd = 2, lty = 2, col = "green")
lines(N_DGP ~ nu_MLE, data = data_plot, lwd = 2, lty = 3, col = "blue")

par(op)
# dev.off()

2.3.5 Table 5

table_5[1:3, 1] <- table_biais_sigma[1:3, 1]
table_5[1:3, 2] <- table_mse_sigma[1:3, 1]
table_5[1:3, 3] <- table_biais_sigma[1:3, 2]
table_5[1:3, 4] <- table_mse_sigma[1:3, 2]
table_5[1:3, 5] <- table_biais_sigma[1:3, 5]
table_5[1:3, 6] <- table_mse_sigma[1:3, 5]
table_5[4:6, 1] <- table_biais_sigma[4:6, 1]
table_5[4:6, 2] <- table_mse_sigma[4:6, 1]
table_5[4:6, 3] <- table_biais_sigma[4:6, 2]
table_5[4:6, 4] <- table_mse_sigma[4:6, 2]
table_5[4:6, 5] <- table_biais_sigma[4:6, 5]
table_5[4:6, 6] <- table_mse_sigma[4:6, 5]
round(table_5, 8)

##             [,1]       [,2]        [,3]         [,4]        [,5]
## [1,] -0.00048543 0.00094573 -0.00020770   0.00076745 -0.00399463
## [2,] -0.00419233 0.00832959 -0.10454945  58.34201430  0.00694206
## [3,] -0.00175035 0.00201474 -0.05171617  14.92682003 -0.01773836
## [4,] -0.00016997 0.00089412 -0.00021812   0.00090524 -0.00020299
## [5,]  1.99983352 4.05658928  1.80495938 244.87200768 -0.01433658
## [6,]  1.00047541 1.01557162  0.90598453  64.74820318 -0.00729557
##            [,6]
## [1,] 0.01103132
## [2,] 3.16721303
## [3,] 0.28511949
## [4,] 0.00106761
## [5,] 0.01542416
## [6,] 0.00394491

xtable::xtable(table_5, digits = 7)

## % latex table generated in R 3.5.2 by xtable 1.8-3 package
## % Wed Feb 13 11:16:33 2019
## \begin{table}[ht]
## \centering
## \begin{tabular}{rrrrrrr}
##   \hline
##  & 1 & 2 & 3 & 4 & 5 & 6 \\ 
##   \hline
## 1 & -0.0004854 & 0.0009457 & -0.0002077 & 0.0007675 & -0.0039946 & 0.0110313 \\ 
##   2 & -0.0041923 & 0.0083296 & -0.1045495 & 58.3420143 & 0.0069421 & 3.1672130 \\ 
##   3 & -0.0017504 & 0.0020147 & -0.0517162 & 14.9268200 & -0.0177384 & 0.2851195 \\ 
##   4 & -0.0001700 & 0.0008941 & -0.0002181 & 0.0009052 & -0.0002030 & 0.0010676 \\ 
##   5 & 1.9998335 & 4.0565893 & 1.8049594 & 244.8720077 & -0.0143366 & 0.0154242 \\ 
##   6 & 1.0004754 & 1.0155716 & 0.9059845 & 64.7482032 & -0.0072956 & 0.0039449 \\ 
##    \hline
## \end{tabular}
## \end{table}

2.3.6 Table 6

for (j in 1:7) {
  table_6[1:3, j] <- (100 * table_biais_sigma[1:3, j] / 
                        as.vector(sigma_true)) 
  table_6[4:6, j] <- (100 * table_biais_sigma[4:6, j] / 
                        as.vector(sigma_true))
  table_6[7:9, j] <- (100 * table_biais_sigma[7:9, j] / 
                        as.vector(sigma_true)) 
  table_6[10:12, j] <- (100 * table_biais_sigma[10:12, j] / 
                          as.vector(sigma_true))
}
round(table_6, 4)

##           [,1]    [,2]    [,3]    [,4]     [,5]     [,6]    [,7]
##  [1,]  -0.1373 -0.0587 -0.0587 -0.0587  -1.1299  -0.2389  0.0218
##  [2,]  -0.2096 -5.2275 -3.3430 -2.3063   0.3471  -0.0757 -0.1232
##  [3,]  -0.1750 -5.1716 -3.3315 -2.1978  -1.7738  -0.2975 -0.0926
##  [4,]  -0.0481 -0.0617 -0.0619 -0.0620  -0.0574  -0.0391 -0.0170
##  [5,]  99.9917 90.2480 93.8875 95.8022  -0.7168  32.7901 50.1205
##  [6,] 100.0475 90.5985 93.8965 96.0298  -0.7296  32.7921 50.1329
##  [7,]  -0.0546 -0.0616 -0.0626 -0.0627  -0.0571  -0.0377 -0.0088
##  [8,]  42.6194 35.7980 38.3244 39.6797 -24.6570  -0.2374 11.1812
##  [9,]  42.6553 36.0125 38.3360 39.8476 -24.6680  -0.2330 11.1887
## [10,]  -0.0588 -0.0628 -0.0624 -0.0624  -0.0571  -0.0361 -0.0045
## [11,]  24.7110 18.8548 21.0320 22.2335 -31.7525 -10.1290 -0.1443
## [12,]  24.7404 19.0213 21.0417 22.3782 -31.7607 -10.1254 -0.1383

xtable::xtable(table_6, digits = 2)

## % latex table generated in R 3.5.2 by xtable 1.8-3 package
## % Wed Feb 13 11:16:33 2019
## \begin{table}[ht]
## \centering
## \begin{tabular}{rrrrrrrr}
##   \hline
##  & 1 & 2 & 3 & 4 & 5 & 6 & 7 \\ 
##   \hline
## 1 & -0.14 & -0.06 & -0.06 & -0.06 & -1.13 & -0.24 & 0.02 \\ 
##   2 & -0.21 & -5.23 & -3.34 & -2.31 & 0.35 & -0.08 & -0.12 \\ 
##   3 & -0.18 & -5.17 & -3.33 & -2.20 & -1.77 & -0.30 & -0.09 \\ 
##   4 & -0.05 & -0.06 & -0.06 & -0.06 & -0.06 & -0.04 & -0.02 \\ 
##   5 & 99.99 & 90.25 & 93.89 & 95.80 & -0.72 & 32.79 & 50.12 \\ 
##   6 & 100.05 & 90.60 & 93.90 & 96.03 & -0.73 & 32.79 & 50.13 \\ 
##   7 & -0.05 & -0.06 & -0.06 & -0.06 & -0.06 & -0.04 & -0.01 \\ 
##   8 & 42.62 & 35.80 & 38.32 & 39.68 & -24.66 & -0.24 & 11.18 \\ 
##   9 & 42.66 & 36.01 & 38.34 & 39.85 & -24.67 & -0.23 & 11.19 \\ 
##   10 & -0.06 & -0.06 & -0.06 & -0.06 & -0.06 & -0.04 & -0.00 \\ 
##   11 & 24.71 & 18.85 & 21.03 & 22.23 & -31.75 & -10.13 & -0.14 \\ 
##   12 & 24.74 & 19.02 & 21.04 & 22.38 & -31.76 & -10.13 & -0.14 \\ 
##    \hline
## \end{tabular}
## \end{table}

2.3.7 Table 7

for (j in 1:7) {
  if (j %in% c(1, 2, 3, 4)) {
    true_denom <- table_mse_sigma[1:3, j]
  } else {
    if (j == 5) {
      true_denom <- table_mse_sigma[4:6, 5]
    } else {
      if (j == 6) {
        true_denom <- table_mse_sigma[7:9, 6]
      } else {
        true_denom <- table_mse_sigma[10:12, 7]
      }
    }
  }
  table_7[1:3, j] <- sqrt(table_mse_sigma[1:3, j] / true_denom) 
  table_7[4:6, j] <- sqrt(table_mse_sigma[4:6, j] / true_denom) 
  table_7[7:9, j] <- sqrt(table_mse_sigma[7:9, j] / true_denom) 
  table_7[10:12, j] <- sqrt(table_mse_sigma[10:12, j] / true_denom)  
}
round(table_7, 4)

##          [,1]   [,2]   [,3]   [,4]    [,5]   [,6]   [,7]
##  [1,]  1.0000 1.0000 1.0000 1.0000  3.2145 1.9079 1.4193
##  [2,]  1.0000 1.0000 1.0000 1.0000 14.3297 2.6537 1.6433
##  [3,]  1.0000 1.0000 1.0000 1.0000  8.5015 2.2413 1.7760
##  [4,]  0.9723 1.0861 1.0856 1.0853  1.0000 1.0020 1.0077
##  [5,] 22.0683 2.0487 2.1069 2.1629  1.0000 5.8929 9.1838
##  [6,] 22.4515 2.0827 2.1124 2.1571  1.0000 5.7678 9.1319
##  [7,]  0.9532 1.0645 1.0641 1.0641  1.0053 1.0000 1.0010
##  [8,]  9.4863 1.4561 1.4691 1.4813  4.0413 1.0000 2.3065
##  [9,]  9.6547 1.4752 1.4719 1.4781  3.9989 1.0000 2.2968
## [10,]  0.9406 1.0499 1.0500 1.0500  1.0139 1.0031 1.0000
## [11,]  5.5807 1.2702 1.2739 1.2768  5.1593 1.9850 1.0000
## [12,]  5.6826 1.2838 1.2757 1.2745  5.1033 1.9495 1.0000

xtable::xtable(table_7, digits = 2)

## % latex table generated in R 3.5.2 by xtable 1.8-3 package
## % Wed Feb 13 11:16:33 2019
## \begin{table}[ht]
## \centering
## \begin{tabular}{rrrrrrrr}
##   \hline
##  & 1 & 2 & 3 & 4 & 5 & 6 & 7 \\ 
##   \hline
## 1 & 1.00 & 1.00 & 1.00 & 1.00 & 3.21 & 1.91 & 1.42 \\ 
##   2 & 1.00 & 1.00 & 1.00 & 1.00 & 14.33 & 2.65 & 1.64 \\ 
##   3 & 1.00 & 1.00 & 1.00 & 1.00 & 8.50 & 2.24 & 1.78 \\ 
##   4 & 0.97 & 1.09 & 1.09 & 1.09 & 1.00 & 1.00 & 1.01 \\ 
##   5 & 22.07 & 2.05 & 2.11 & 2.16 & 1.00 & 5.89 & 9.18 \\ 
##   6 & 22.45 & 2.08 & 2.11 & 2.16 & 1.00 & 5.77 & 9.13 \\ 
##   7 & 0.95 & 1.06 & 1.06 & 1.06 & 1.01 & 1.00 & 1.00 \\ 
##   8 & 9.49 & 1.46 & 1.47 & 1.48 & 4.04 & 1.00 & 2.31 \\ 
##   9 & 9.65 & 1.48 & 1.47 & 1.48 & 4.00 & 1.00 & 2.30 \\ 
##   10 & 0.94 & 1.05 & 1.05 & 1.05 & 1.01 & 1.00 & 1.00 \\ 
##   11 & 5.58 & 1.27 & 1.27 & 1.28 & 5.16 & 1.99 & 1.00 \\ 
##   12 & 5.68 & 1.28 & 1.28 & 1.27 & 5.10 & 1.95 & 1.00 \\ 
##    \hline
## \end{tabular}
## \end{table}

3 Selection between the Gaussian and IT models

This section presents how to obtain the figures and statistics presented in section 4 of the article.

3.1 Computations of Test statistics

The function mahalo_gaussian_student() computes the Mahalanobis distances according to the assumptions made on the model. The input arguments are :

Y, a matrix of size \(n \times L\),
mu (NULL by default), the hypothesized mean, a vector of size \(L\),
Sigma (NULL by default), the hypothesized variance, a square matrix of size \(L\),
type (“Gaussian” by default), a character equal to “Gaussian” or “Student”
nu (NULL by default), the number of degrees of freedom when type is “Student”

The function returns a vector of size \(n\) with the Mahalanobis distance. To load the function:

source(file("http://www.thibault.laurent.free.fr/code/jrfm/R/mahalo_gaussian_student.R"))

3.2 Examples

3.2.1 Application to financial data

We first compute the returns of two indices.

library(quantmod) 
symbols <- c("MSFT", "IBM")
data_env <- new.env()
getSymbols(Symbols = symbols, env = data_env)

## [1] "MSFT" "IBM"

close_data <- do.call(merge, eapply(data_env, Cl))
MSFT <- close_data[, 1]
IBM <- close_data[, 2]
time_length <- length(MSFT)
return_MSFT <- as.numeric(log(MSFT[2:time_length])) - 
  as.numeric(log(MSFT[1:(time_length - 1)]))
return_IBM <- as.numeric(log(IBM[2:time_length])) - 
  as.numeric(log(IBM[1:(time_length - 1)]))
Y_finance <- cbind(return_MSFT, return_IBM)

We plot the data (Figure 2 in the article) :

plot(as.numeric(return_MSFT), 
     as.numeric(return_IBM),
     xlab = "MSFT return",
     ylab = "IBM retrun")

To obtain the result of the 4th column in Table 8, we perform tests under different assumptions.

3.2.1.1 Gaussian assumption

We test the gaussian assumption. To test the Gaussian distribution, we need to compute the squared Mahalanobis distance \[(Y - \mu)' \Sigma^{-1}(Y - \mu).\] When type = “Gaussian”, if \(\mu\) and \(\Sigma\) are unknown, the function mahalo_gaussian_student() will use the vector of empirical means for \(\mu\) and the empirical covariance matrix for \(\Sigma\).

test_N <- mahalo_gaussian_student(Y_finance, type = "Gaussian")
ks.test(test_N, "pchisq", ncol(Y_finance))

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  test_N
## D = 0.2182, p-value < 2.2e-16
## alternative hypothesis: two-sided

3.2.1.2 Student assumption (\(\mu=3\))

Then, we test the Student assumption. To test the Student distribution, we also need to compute the squared Mahalanobis distance \[(Y - \mu)' \Sigma^{-1}(Y - \mu).\] When type = “Student”, if \(\mu\) and \(\Sigma\) are unknown, the program will regress \(Y\) on \(X\) where \(X=1_n\), using the IT algorithm. In that case the matrix \((Y - \mu)\) will be replaced by the matrix of the residuals of the IT model and the matrix \(\Sigma\) by the estimated variance covariance matrix.

nu_3 <- 3
dM_nu_3 <- mahalo_gaussian_student(Y_finance, type = "Student", nu = nu_3)

We perform the corresponding test (see p. 13 in the article):

L_finance <- ncol(Y_finance) 
test_student_3 <- 1/L_finance * nu_3 / (nu_3 - 2) * dM_nu_3 
ks.test(test_student_3, "pf", L_finance, nu_3)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  test_student_3
## D = 0.0093001, p-value = 0.9547
## alternative hypothesis: two-sided

3.2.1.3 Student assumption (\(\mu=4\))

nu_4 <- 4
dM_nu_4 <- mahalo_gaussian_student(Y_finance, type = "Student", nu = nu_4)
test_student_4 <- 1/L_finance * nu_4 / (nu_4 - 2) * dM_nu_4
ks.test(test_student_4, "pf", L_finance, nu_4)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  test_student_4
## D = 0.02621, p-value = 0.03036
## alternative hypothesis: two-sided

3.2.1.4 Q-Q plots of the Mahalanobis distances

To obtain Figure 4 in the article:

n_financial <- nrow(Y_finance)

qtheon <- qchisq(seq(1/n_financial, 1 - 1/n_financial, 1/n_financial), df = 2)
qtheot3 <- qf(seq(1/n_financial, 1 - 1/n_financial, 1/n_financial), 2, 3)
qtheot4 <- qf(seq(1/n_financial, 1 - 1/n_financial, 1/n_financial), 2, 4)
# pdf("./figures_article/qqpFinance.pdf", width = 8, height = 3)
op <- par(mfrow = c(1, 3), las = 0, oma = c(0, 0, 0, 0), 
          mar = c(4, 3.1, 2, 1), mgp = c(1.5, 0.5, 0))

plot(sort(test_N)[-n_financial], qtheon,  
     xlab = "Normal empirical quantile", 
     ylab = "NTQ", pch = 21, xaxs = "i", yaxs = "i", 
     ylim = c(0, 200), xlim = c(0, 100))
abline(0, 1)

plot(sort(test_student_3)[-n_financial], qtheot3, 
     ylab = expression(paste("TTQ ( ", nu,"_"[DGP], " = 3)")), 
     xlab = expression(paste("IT (", nu,"_"[MLE], " = 3)", 
                             " empirical quantile")), 
     xaxs = "i", yaxs = "i", ylim = c(0, 200), xlim = c(0, 100))
abline(0, 1)

plot(sort(test_student_4)[-n_financial], qtheot4, 
     xlab = expression(paste("IT (", nu,"_"[MLE], " = 4)", 
                             " empirical quantile")), 
     ylab = expression(paste("TTQ (", nu, "_"[DGP], " = 4)")), 
     xaxs = "i", yaxs = "i", ylim = c(0, 200), xlim = c(0, 100))
abline(0, 1)

par(op)
# dev.off()

3.2.2 Application to Toy data

We compute the Gaussian and the IT estimators fot the three simulated data sets presented in the previous section.

3.2.2.1 Gaussian DGP

# model = "N"
beta_simu_dgp_N_N <- solve(crossprod(X_simu), t(X_simu) %*% y_N)
res_simu_dgp_N_N <- y_N - X_simu %*% beta_simu_dgp_N_N
Sigma_simu_dgp_N_N <- t(res_simu_dgp_N_N) %*% res_simu_dgp_N_N / n_simu
# model = "IT" (nu = 3)
simu_dgp_N_IT_3 <- estimate_IT(y_N, X_simu, nu = nu_3, eps_precision = 10^(-8))
res_simu_dgp_N_IT_3 <- simu_dgp_N_IT_3$residuals
Sigma_simu_dgp_N_IT_3 <- simu_dgp_N_IT_3$hat_Sigma
# model = "IT" (nu = 4)
simu_dgp_N_IT_4 <- estimate_IT(y_N, X_simu, nu = nu_4, eps_precision = 10^(-8))
res_simu_dgp_N_IT_4 <- simu_dgp_N_IT_4$residuals
Sigma_simu_dgp_N_IT_4 <- simu_dgp_N_IT_4$hat_Sigma

3.2.2.2 Student DGP (\(\nu=3\))

# model = "N"
beta_simu_dgp_IT_3_N <- solve(crossprod(X_simu), t(X_simu) %*% y_IT_3)
res_simu_dgp_IT_3_N <- y_IT_3 - X_simu %*% beta_simu_dgp_IT_3_N
Sigma_simu_dgp_IT_3_N <- t(res_simu_dgp_IT_3_N) %*% res_simu_dgp_IT_3_N / n_simu
# model = "IT" (nu = 3)
simu_dgp_IT_3_IT_3 <- estimate_IT(y_IT_3, X_simu, nu = nu_3, 
                                  eps_precision = 10^(-8))
res_simu_dgp_IT_3_IT_3 <- simu_dgp_IT_3_IT_3$residuals
Sigma_simu_dgp_IT_3_IT_3 <- simu_dgp_IT_3_IT_3$hat_Sigma
# model = "IT" (nu = 4)
simu_dgp_IT_3_IT_4 <- estimate_IT(y_IT_3, X_simu, nu = nu_4, 
                                  eps_precision = 10^(-8))
res_simu_dgp_IT_3_IT_4 <- simu_dgp_IT_3_IT_4$residuals
Sigma_simu_dgp_IT_3_IT_4 <- simu_dgp_IT_3_IT_4$hat_Sigma

3.2.2.3 Student DGP (\(\nu=4\))

# model = "N"
beta_simu_dgp_IT_4_N <- solve(crossprod(X_simu), t(X_simu) %*% y_IT_4)
res_simu_dgp_IT_4_N <- y_IT_4 - X_simu %*% beta_simu_dgp_IT_4_N
Sigma_simu_dgp_IT_4_N <- t(res_simu_dgp_IT_4_N) %*% res_simu_dgp_IT_4_N / n_simu
# model = "IT" (nu = 3)
simu_dgp_IT_4_IT_3 <- estimate_IT(y_IT_4, X_simu, nu = nu_3, 
                                  eps_precision = 10^(-8))
res_simu_dgp_IT_4_IT_3 <- simu_dgp_IT_4_IT_3$residuals
Sigma_simu_dgp_IT_4_IT_3 <- simu_dgp_IT_4_IT_3$hat_Sigma
# model = "IT" (nu = 4)
simu_dgp_IT_4_IT_4 <- estimate_IT(y_IT_4, X_simu, nu = nu_4, 
                                  eps_precision = 10^(-8))
res_simu_dgp_IT_4_IT_4 <- simu_dgp_IT_4_IT_4$residuals
Sigma_simu_dgp_IT_4_IT_4 <- simu_dgp_IT_4_IT_4$hat_Sigma

To obtain Figure 3 in the article:

# pdf("./figures_article/residSN.pdf", width = 8, height = 3)
op <- par(mfrow = c(1, 3), las = 1, oma = c(0, 0, 0, 0), 
          mar = c(4, 3.1, 2, 1), mgp = c(1.5, 0.5, 0))
plot(res_simu_dgp_N_N[,1], res_simu_dgp_N_N[, 2], 
     xlab = "Residuals in y1", ylab = "Residuals in y2", 
     xaxs = "i", yaxs = "i", col = "red", 
     main = expression(paste("Gaussian method")))
plot(res_simu_dgp_N_IT_3[,1], res_simu_dgp_N_IT_3[, 2], 
     xlab = "Residuals in y1", ylab = "", 
     xaxs = "i", yaxs = "i", col = "blue", 
     main = expression(paste("IT (", nu,"_"[MLE], " = 3)", " method")))
plot(res_simu_dgp_N_IT_4[,1], res_simu_dgp_N_IT_4[, 2], 
     xlab = "Residuals in y1", ylab = "", 
     xaxs = "i", yaxs = "i", col = "grey", 
     main = expression(paste("IT (", nu,"_"[MLE], " = 4)", " method")))

par(op)
# dev.off()

# pdf("./figures_article/residS3.pdf", width = 8, height = 3)
op <- par(mfrow = c(1, 3), las = 1, oma = c(0, 0, 0, 0), 
          mar = c(4, 3.1, 2, 1), mgp = c(1.5, 0.5, 0))
plot(res_simu_dgp_IT_3_N[, 1], res_simu_dgp_IT_3_N[, 2], 
     xlab = "Residuals in y1", ylab = "Residuals in y2", 
     xaxs = "i", yaxs = "i", col = "red", 
     main = expression(paste("Gaussian method")))
plot(res_simu_dgp_IT_3_IT_3[, 1], res_simu_dgp_IT_3_IT_3[, 2], 
     xlab = "Residuals in y1", ylab = "", 
     xaxs = "i", yaxs = "i", col = "blue", 
     main = expression(paste("IT (", nu,"_"[MLE], " = 3)", " method")))
plot(res_simu_dgp_IT_3_IT_4[, 1], res_simu_dgp_IT_3_IT_4[, 2], 
     xlab = "Residuals in y1", ylab = "", 
     xaxs = "i", yaxs = "i", col = "grey", 
     main = expression(paste("IT (", nu,"_"[MLE], " = 4)", " method")))

par(op)
# dev.off()

# pdf("./figures_article/residS4.pdf", width = 8, height = 3)
op <- par(mfrow = c(1, 3), las = 1, oma = c(0, 0, 0, 0), 
          mar = c(4, 3.1, 2, 1), mgp = c(1.5, 0.5, 0))
plot(res_simu_dgp_IT_4_N[, 1], res_simu_dgp_IT_4_N[, 2], 
     xlab = "Residuals in y1", ylab = "Residuals in y2", 
     xaxs = "i", yaxs = "i", col = "red", 
     main = expression(paste("Gaussian method")))
plot(res_simu_dgp_IT_4_IT_3[, 1], res_simu_dgp_IT_4_IT_3[, 2], 
     xlab = "Residuals in y1", ylab = "", 
     xaxs = "i", yaxs = "i", col = "blue", 
     main = expression(paste("IT (", nu,"_"[MLE], " = 3)", " method")))
plot(res_simu_dgp_IT_4_IT_4[, 1], res_simu_dgp_IT_4_IT_4[, 2], 
     xlab = "Residuals in y1", ylab = "", 
     xaxs = "i", yaxs = "i", col = "grey", 
     main = expression(paste("IT (", nu,"_"[MLE], " = 4)", " method")))

par(op)
# dev.off()

To obtain the result of the 1st row in Table 8, we perform tests under the Gaussian assumptions.

3.2.2.4 Gaussian assumption

3.2.2.4.1 “N” DGP

test_res_simu_dgp_N_N_N <- diag(crossprod(t(res_simu_dgp_N_N), 
                               solve(Sigma_simu_dgp_N_N, 
                                     t(res_simu_dgp_N_N))))
ks.test(test_res_simu_dgp_N_N_N, "pchisq", ncol(res_simu_dgp_N_N))

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  test_res_simu_dgp_N_N_N
## D = 0.021415, p-value = 0.7488
## alternative hypothesis: two-sided

3.2.2.4.2 “IT” DGP (\(\nu = 3\))

test_res_simu_dgp_IT_3_N_N <- diag(crossprod(t(res_simu_dgp_IT_3_N), 
                               solve(Sigma_simu_dgp_IT_3_N, 
                                     t(res_simu_dgp_IT_3_N))))
ks.test(test_res_simu_dgp_IT_3_N_N, "pchisq", ncol(res_simu_dgp_IT_3_N))

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  test_res_simu_dgp_IT_3_N_N
## D = 0.29333, p-value < 2.2e-16
## alternative hypothesis: two-sided

3.2.2.4.3 “IT” DGP (\(\nu = 4\))

test_res_simu_dgp_IT_4_N_N <- diag(crossprod(t(res_simu_dgp_IT_4_N), 
                               solve(Sigma_simu_dgp_IT_4_N, 
                                     t(res_simu_dgp_IT_4_N))))
ks.test(test_res_simu_dgp_IT_4_N_N, "pchisq", ncol(res_simu_dgp_IT_4_N))

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  test_res_simu_dgp_IT_4_N_N
## D = 0.13751, p-value < 2.2e-16
## alternative hypothesis: two-sided

To obtain the result of the 2nd row in Table 8, we perform tests under the Student assumptions with \(\nu = 3\).

3.2.2.5 Student assumption (\(\mu=3\))

3.2.2.5.1 “N” DGP

dM_res_simu_dgp_N_IT_3 <- crossprod(t(res_simu_dgp_N_IT_3), 
                               solve(Sigma_simu_dgp_N_IT_3, 
                                     t(res_simu_dgp_N_IT_3)))
test_res_simu_dgp_N_IT_3 <- 1/L_simu * nu_3 / (nu_3 - 2) * 
  diag(dM_res_simu_dgp_N_IT_3) 
ks.test(test_res_simu_dgp_N_IT_3, "pf", L_simu, nu_3)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  test_res_simu_dgp_N_IT_3
## D = 0.080203, p-value = 5.174e-06
## alternative hypothesis: two-sided

3.2.2.5.2 “IT” DGP (\(\nu = 3\))

dM_res_simu_dgp_IT_3_IT_3 <- crossprod(t(res_simu_dgp_IT_3_IT_3), 
                               solve(Sigma_simu_dgp_IT_3_IT_3, 
                                     t(res_simu_dgp_IT_3_IT_3)))
test_res_simu_dgp_IT_3_IT_3 <- 1/L_simu * nu_3 / (nu_3 - 2) * 
  diag(dM_res_simu_dgp_IT_3_IT_3) 
ks.test(test_res_simu_dgp_IT_3_IT_3, "pf", L_simu, nu_3)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  test_res_simu_dgp_IT_3_IT_3
## D = 0.012481, p-value = 0.9977
## alternative hypothesis: two-sided

3.2.2.5.3 “IT” DGP (\(\nu = 4\))

dM_res_simu_dgp_IT_4_IT_3 <- crossprod(t(res_simu_dgp_IT_4_IT_3), 
                               solve(Sigma_simu_dgp_IT_4_IT_3, 
                                     t(res_simu_dgp_IT_4_IT_3)))
test_res_simu_dgp_IT_4_IT_3 <- 1/L_simu * nu_3 / (nu_3 - 2) * 
  diag(dM_res_simu_dgp_IT_4_IT_3) 
ks.test(test_res_simu_dgp_IT_4_IT_3, "pf", L_simu, nu_3)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  test_res_simu_dgp_IT_4_IT_3
## D = 0.028679, p-value = 0.3832
## alternative hypothesis: two-sided

To obtain the result of the 3rd row in Table 8, we perform tests under the Student assumptions with \(\nu = 4\).

3.2.2.6 Student assumption (\(\mu=4\))

3.2.2.6.1 “N” DGP

dM_res_simu_dgp_N_IT_4 <- crossprod(t(res_simu_dgp_N_IT_4), 
                               solve(Sigma_simu_dgp_N_IT_4, 
                                     t(res_simu_dgp_N_IT_4)))
test_res_simu_dgp_N_IT_4 <- 1/L_simu * nu_4 / (nu_4 - 2) * 
  diag(dM_res_simu_dgp_N_IT_4) 
ks.test(test_res_simu_dgp_N_IT_4, "pf", L_simu, nu_4)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  test_res_simu_dgp_N_IT_4
## D = 0.060522, p-value = 0.001317
## alternative hypothesis: two-sided

3.2.2.6.2 “IT” DGP (\(\nu = 3\))

dM_res_simu_dgp_IT_3_IT_4 <- crossprod(t(res_simu_dgp_IT_3_IT_4), 
                               solve(Sigma_simu_dgp_IT_3_IT_4, 
                                     t(res_simu_dgp_IT_3_IT_4)))
test_res_simu_dgp_IT_3_IT_4 <- 1/L_simu * nu_4 / (nu_4 - 2) * 
  diag(dM_res_simu_dgp_IT_3_IT_4) 
ks.test(test_res_simu_dgp_IT_3_IT_4, "pf", L_simu, nu_4)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  test_res_simu_dgp_IT_3_IT_4
## D = 0.03102, p-value = 0.291
## alternative hypothesis: two-sided

3.2.2.6.3 “IT” DGP (\(\nu = 4\))

dM_res_simu_dgp_IT_4_IT_4 <- crossprod(t(res_simu_dgp_IT_4_IT_4), 
                               solve(Sigma_simu_dgp_IT_4_IT_4, 
                                     t(res_simu_dgp_IT_4_IT_4)))
test_res_simu_dgp_IT_4_IT_4 <- 1/L_simu * nu_4 / (nu_4 - 2) * 
  diag(dM_res_simu_dgp_IT_4_IT_4) 
ks.test(test_res_simu_dgp_IT_4_IT_4, "pf", L_simu, nu_4)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  test_res_simu_dgp_IT_4_IT_4
## D = 0.017758, p-value = 0.9107
## alternative hypothesis: two-sided

3.2.2.7 Q-Q plots of the Mahalanobis distances

To obtain Figure 5 in the article:

qtheon <- qchisq(seq(1/n_simu, 1 - 1/n_simu, 1/n_simu), df = 2)
qtheot3 <- qf(seq(1/n_simu, 1 - 1/n_simu, 1/n_simu), 2, 3)
qtheot4 <- qf(seq(1/n_simu, 1 - 1/n_simu, 1/n_simu), 2, 4)

# pdf("./figures_article/qqSimul.pdf", width = 9, height = 7)
op <- par(mfrow = c(3, 3), las = 0, oma = c(0, 0, 0, 0), 
          mar = c(4, 3.1, 2, 1), mgp = c(1.5, 0.5, 0))

plot(sort(test_res_simu_dgp_N_N_N)[-n_simu], qtheon,  
     xlab = "Normal empirical quantile", 
     ylab = "NTQ", pch = 21, xaxs = "i", yaxs = "i",
     ylim = c(0, 15))
abline(0,1)
plot(sort(test_res_simu_dgp_IT_3_N_N)[-n_simu], qtheon, ylab = "NTQ", 
     xlab = expression(paste("IT (", nu,"_"[MLE], " = 3)", 
                             " empirical quantile")),
     xaxs = "i", yaxs = "i", ylim = c(0, 15))
abline(0, 1)
plot(sort(test_res_simu_dgp_IT_4_N_N)[-n_simu], qtheon, ylab = "NTQ", 
     xlab = expression(paste("IT (", nu,"_"[MLE], " = 4)", 
                             " empirical quantile")), 
      xaxs = "i", yaxs = "i", ylim = c(0, 15))
abline(0, 1)

plot(sort(test_res_simu_dgp_N_IT_3)[-n_simu], qtheot3, 
     ylab = expression(paste("TTQ ( ", nu,"_"[DGP], " = 3)")), 
     pch = 21, xlab = "Normal empirical quantile",
     xaxs = "i", yaxs = "i", ylim = c(0, 150))
abline(0, 1)
plot(sort(test_res_simu_dgp_IT_3_IT_3)[-n_simu], qtheot3, 
     ylab = expression(paste("TTQ ( ", nu,"_"[DGP], " = 3)")), 
     xlab = expression(paste("IT (", nu,"_"[MLE], " = 3)", 
                             " empirical quantile")), 
     xaxs = "i", yaxs = "i", ylim = c(0, 150))
abline(0, 1)
plot(sort(test_res_simu_dgp_IT_4_IT_3)[-n_simu], qtheot3, 
     ylab = expression(paste("TTQ (", nu,"_"[DGP], " = 3)")), 
     xlab = expression(paste("IT (", nu,"_"[MLE], " = 4)", 
                             " empirical quantile")), 
      xaxs = "i", yaxs = "i", ylim = c(0, 150))
abline(0, 1)

plot(sort(test_res_simu_dgp_N_IT_4)[-n_simu], qtheot4,  
     xlab = "Normal empirical quantile", 
     ylab = expression(paste("TTQ (", nu,"_"[DGP], " = 4)")), 
     pch = 21, xaxs = "i", yaxs = "i", ylim = c(0, 65))
abline(0,1)
plot(sort(test_res_simu_dgp_IT_3_IT_4)[-n_simu], qtheot4, 
     xlab = expression(paste("IT (", nu,"_"[MLE], " = 3)",
                             " empirical quantile")), 
     ylab = expression(paste("TTQ (", nu,"_"[DGP], " = 4)")), 
      xaxs = "i", yaxs = "i", ylim = c(0, 65))
abline(0,1)
plot(sort(test_res_simu_dgp_IT_4_IT_4)[-n_simu], qtheot4, 
     xlab = expression(paste("IT (", nu,"_"[MLE], " = 4)", 
                             " empirical quantile")), 
     ylab = expression(paste("TTQ (", nu,"_"[DGP], " = 4)")), 
     xaxs = "i", yaxs = "i", ylim = c(0, 65))
abline(0,1)

par(op)
# dev.off()

Multivariate Student versus Multivariate Gaussian Regression Models with Application to Finance

Supplemental material

Thi Huong An Nguyen, Anne Ruiz-Gazen, Christine Thomas-Agnan, Thibault Laurent