1 Prerequisites
2 Preparation of the data
3 Descriptive statistics
4 CoDa regression
5 Comparaison between CoDa and Dirichlet regression
6 Predictions
- 6.1 Representing the predicted values with respect to a numeric variable: application with unemployment rate
- 6.2 Representing the predicted values with respect to a categorical variable: application with unemployment rate
7 Bibliography

We provide the data and the R code used in the article “Analyzing the impacts of socio-economic factors on French departmental elections with CoDa methods” so that readers may reproduce all the figures, tables and statistics presented in the article with the R software. The pdf version of the supplementary variable is available here.

If you use this code, please cite:

Nguyen T.H.A, Thomas-Agnan C., Laurent T. and A. Ruiz-Gazen (2020). Analyzing the impacts of socio-economic factors on French departmental elections with CoDa methods. \(\color{red}{\text{TSE WP}}\).

1 Prerequisites

Required packages:

install.packages(c("classInt", "compositions", "DirichletReg", "ggplot2",
                   "ggtern", "gridExtra", "RColorBrewer", "sp"))

Loading packages:

require("classInt") # transform numeric variable to categorical
require("compositions") # compositional data
require("DirichletReg") # Dirichlet regression model
require("ggplot2") # graph a la ggplot2
require("gridExtra") # to include several plot in one figure
require("RColorBrewer") # palette of colors
require("sp") # spatial data
require("ggtern") # ternary diagram with ggplot2

Information about the current R session :

sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
## LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
## 
## locale:
##  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
##  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
##  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggtern_3.3.0       sp_1.4-2           RColorBrewer_1.1-2 gridExtra_2.3     
## [5] ggplot2_3.3.2      DirichletReg_0.7-0 Formula_1.2-3      compositions_2.0-0
## [9] classInt_0.4-3    
## 
## loaded via a namespace (and not attached):
##  [1] zoo_1.8-8          tidyselect_1.1.0   xfun_0.15          purrr_0.3.4       
##  [5] lattice_0.20-41    latex2exp_0.4.0    colorspace_1.4-1   vctrs_0.3.2       
##  [9] generics_0.0.2     htmltools_0.5.0    yaml_2.2.1         rlang_0.4.7       
## [13] e1071_1.7-3        pillar_1.4.6       glue_1.4.1         withr_2.2.0       
## [17] lifecycle_0.2.0    plyr_1.8.6         robustbase_0.93-6  stringr_1.4.0     
## [21] munsell_0.5.0      gtable_0.3.0       evaluate_0.14      knitr_1.29        
## [25] miscTools_0.6-26   class_7.3-17       DEoptimR_1.0-8     proto_1.0.0       
## [29] Rcpp_1.0.5         KernSmooth_2.23-17 scales_1.1.1       maxLik_1.4-4      
## [33] tensorA_0.36.1     digest_0.6.25      stringi_1.4.6      dplyr_1.0.0       
## [37] grid_4.0.3         tools_4.0.3        sandwich_2.5-1     magrittr_1.5      
## [41] tibble_3.0.3       crayon_1.3.4       pkgconfig_2.0.3    MASS_7.3-53       
## [45] ellipsis_0.3.1     bayesm_3.1-4       rmarkdown_2.3      R6_2.4.1          
## [49] compiler_4.0.3

2 Preparation of the data

We load the data basis:

load(file("http://www.thibault.laurent.free.fr/code/jrfm/R/BdD_final_dep.RData"))

The data is presented as a Spatial object (see Pebesma and Bivand, 2005, for more informations). The vote shares correspond to the variables left, right, others. We print the first 5 rows of the data:

head(dep.2015.spdf@data[, c("NOM_DEPT", "left", "right", "others")], 5)

##          NOM_DEPT      left     right    others
## 40           JURA 0.3376844 0.4035081 0.2588076
## 43          LOIRE 0.3369003 0.3593964 0.3037033
## 77 SEINE-MARITIME 0.4236827 0.3119438 0.2643735
## 90          YONNE 0.2852518 0.3853767 0.3293715
## 69      HAUT-RHIN 0.1777443 0.5126041 0.3096516

We drop the departement of Paris which is not concerned by the election

dep.2015.spdf <- dep.2015.spdf[dep.2015.spdf@data$CODE_DEPT != "75", ]

We use the acomp class of object from the compositions package which provides the means to analyse compositions in the framework of the Aitchison Simplex. We do it for :

the dependent variable (the vote shares)

voteshare <- compositions::acomp(
  cbind(dep.2015.spdf@data[, c("left", "right", "others")]))

the compositional explanatory variables Age, Diploma and Employment :

# Age 
Age <- compositions::acomp(data.frame(
  age_1839 = dep.2015.spdf@data[, "age_1824"] + dep.2015.spdf@data[, "age_2540"],
  age_4064 = dep.2015.spdf@data[, "age_4055"] + dep.2015.spdf@data[, "age_5564"],
  age_65 = dep.2015.spdf@data[, "age_65"]))
# Diploma
Diploma <- compositions::acomp(data.frame(
  No_CAPBEP = dep.2015.spdf@data[, "no_diplom"] + dep.2015.spdf@data[, "capbep"],
  bac = dep.2015.spdf@data$bac,
  diplom_sup = dep.2015.spdf@data$diplom_sup))
# Employment
Employment <- compositions::acomp(dep.2015.spdf@data[, c("AZ", "BE", "FZ", "GU", "OQ")])

Then we use the isometric log ratio transform thanks to the ilr() function. We do it for:

the dependent variable (the vote shares). With our choice of contrast matrix, the first ilr coordinate opposes the Left wing to the geometric mean of the Right wing and the Extreme Right party and the second opposes the Right wing to the Extreme Right party:

V_votes <- rbind(c(2/sqrt(6), -1/sqrt(6), -1/sqrt(6)),
               c(0, 1/sqrt(2), -1/sqrt(2)))
y_ilr <- compositions::ilr(dep.2015.spdf@data[, c("left" , "right", "others")],
             V = t(V_votes))

the compositional explanatory variables Age (with our choice of contrast matrix, the first ilr coordinate opposes age_1839 to age_4064 and the second opposes age_65 to the geometric mean of age_1839 and age_4064), Diploma (with our choice of contrast matrix, the first ilr coordinate opposes the geometric mean of No_CAPBEP and bac to diplom_sup and the second opposes No_CAPBEP to bac) and Employment (with our choice of contrast matrix, the first ilr coordinate opposes OQ to the geometric mean of all other components, the second opposes GU to the geometric mean of AZ, BE, FZ, the third opposes FZ to the geometric mean of AZ and BE, the fourth opposes AZ to BE).

ilr_Age <- compositions::ilr(Age)
V_dip <- rbind(c(1/sqrt(6), 1/sqrt(6), -2/sqrt(6)),
               c(1/sqrt(2), -1/sqrt(2),  0))
ilr_Diploma <- compositions::ilr(Diploma, V = t(V_dip))
V_employ <- rbind(c(-1/2/sqrt(5), -1/2/sqrt(5), -1/2/sqrt(5), -1/2/sqrt(5), 2/sqrt(5)),
               c(-sqrt(3)/6, -sqrt(3)/6, -sqrt(3)/6, sqrt(3/4), 0),
               c(-sqrt(2/3)/2, -sqrt(2/3)/2,  sqrt(2/3), 0, 0),
               c(sqrt(1/2), -sqrt(1/2), 0, 0, 0))
ilr_Employment <- compositions::ilr(Employment, V = t(V_employ))

Finally, we create the data.frame that includes all the explanatory variables:

the ilr of the compositions Age, Diploma and Employment
the numeric variables unemp_rate, employ_evol, owner_rate, income_rate, foreign,

my_explan <- data.frame(ilr_Age, ilr_Diploma, ilr_Employment,
           as.matrix(dep.2015.spdf@data[, c("unemp_rate", "employ_evol", 
                                            "owner_rate", "income_rate", 
                                            "foreign")]))
colnames(my_explan) <- c("Age_ilr1", "Age_ilr2",  
                  "Diploma_ilr1", "Diploma_ilr2",
                  "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4", 
                  "unemp_rate", "employ_evol", "owner_rate", "income_rate", "foreign")

3 Descriptive statistics

3.1 Table 2

The following codes yields Table 2 in the article.

3.1.1 Descriptive statistics on the compositional variables

round(apply(voteshare, 2, mean), 3)

##   left  right others 
##  0.368  0.384  0.248

round(apply(Age, 2, mean), 3)

## age_1839 age_4064   age_65 
##    0.314    0.429    0.257

round(apply(Diploma, 2, mean), 3)

##  No_CAPBEP        bac diplom_sup 
##      0.589      0.169      0.242

round(apply(Employment, 2, mean), 3)

##    AZ    BE    FZ    GU    OQ 
## 0.044 0.100 0.048 0.432 0.375

3.1.2 Descriptive statistics on the numeric variables

sapply(my_explan[, c("unemp_rate", "employ_evol", "owner_rate", "income_rate", 
                     "foreign")], function(x) round(c(mean = mean(x), sd = sd(x)), 3))

##      unemp_rate employ_evol owner_rate income_rate foreign
## mean      0.117      -0.145      0.616       0.552   0.050
## sd        0.017       0.578      0.062       0.057   0.031

3.2 Figure 1

To obtain Figure 1 in the article, we use the following code:

names(voteshare) <- c("left", "right", "xr")
ariege <- as.numeric(voteshare[dep.2015.spdf$NOM_DEPT == "ARIEGE", ]) # Ariège, max Left
cantal <- as.numeric(voteshare[dep.2015.spdf$NOM_DEPT == "CANTAL", ]) # Cantal, max Right
bas_rhin <- as.numeric(voteshare[dep.2015.spdf$NOM_DEPT == "BAS-RHIN", ]) # Bas-Rhin, min Left
# ternary diagram ggplot2 style
p3 <- ggtern(data = as.data.frame(voteshare), aes(x = left, z = right, y = xr)) + 
  geom_point()  
g <- p3 + theme(legend.position = c(0, 1), legend.justification = c(1, 1)) + 
  theme_rgbw() +  
  labs(x = "LEFT", z = "RIGHT", y = "XR") + 
  geom_point(aes(x = ariege[1], z = ariege[2], y = ariege[3]), 
               shape = 22, color = "red", size = 2) +
  geom_point(aes(x = cantal[1], z = cantal[2], y = cantal[3] ), 
             shape = 21, color = "purple", size = 2) +
  geom_point(aes(x = bas_rhin[1], z = bas_rhin[2], y = bas_rhin[3]), 
             shape = 24, color = "blue", size = 2)
g
# dev.off()
# ggsave("./Figures/voteshares_3dep.eps", g, width = 6, height = 6)

3.3 Figure 2

To obtain Figure 2 in the article, we use the following code:

# calculate the geometric mean of diploma
Diploma_mean_ge  <-  as.numeric(mean(Diploma))
# ternary diagram ggplot2 style
p3 <- ggtern(data = as.data.frame(Diploma), aes(x = No_CAPBEP, y = diplom_sup, z = bac)) + 
  geom_point() 
p3 <- p3 + theme(legend.position = c(0, 1),
                 legend.justification = c(1, 1)) + 
  theme_rgbw() +  
  labs(x = "<BAC", y = "SUP", z = "BAC") + 
  geom_point(aes(x = Diploma_mean_ge[1]  , y = Diploma_mean_ge[3] , z = Diploma_mean_ge[2]), 
             shape = 17, color = c("red"), size = 2)

# calculate the geometric mean of Age
Age_mean_ge <- as.numeric(mean(Age))
p4 <- ggtern(data = as.data.frame(Age), aes(x = age_1839, y = age_65, z = age_4064)) + 
  geom_point() 
p4 <- p4 + theme(legend.position = c(0, 1), legend.justification = c(1, 1)) + 
  theme_rgbw() +  
  labs(x = "18-39", y = ">65", z = "40-64") + 
  geom_point(aes(x = Age_mean_ge[1]  , y = Age_mean_ge[3] , z = Age_mean_ge[2]), 
             shape = 17, color = c("red"), size = 2)

g <- grid.arrange(p3, p4, ncol = 2)
# ggsave("./Figures/comp_X.eps", g, width = 10, height = 5)

4 CoDa regression

We now fit a CoDa regression model describing the impacts of socio-economic factors on vote shares in the 2015 French departmental election. We include all explanatory variables from our data set in the regression model.

4.1 Modeling in the ilr coordinates space

model_complete <- lm(y_ilr ~ Age_ilr1 + Age_ilr2 + 
                       Diploma_ilr1 + Diploma_ilr2 + 
                       Employ_ilr1 + Employ_ilr2 + Employ_ilr3 + Employ_ilr4 + 
                       income_rate + unemp_rate + employ_evol + 
                       owner_rate + foreign, data = my_explan)

We obtain estimated parameters and their standard deviations (in brackets) presented in Table 3:

mod_ilr_1 <- lm(y_ilr[, 1] ~ Age_ilr1 + Age_ilr2 + 
                       Diploma_ilr1 + Diploma_ilr2 + 
                       Employ_ilr1 + Employ_ilr2 + Employ_ilr3 + Employ_ilr4 + 
                       income_rate + unemp_rate + employ_evol + owner_rate + foreign, 
                data = my_explan)
mod_ilr_2 <- lm(y_ilr[, 2] ~ Age_ilr1 + Age_ilr2 + 
                       Diploma_ilr1 + Diploma_ilr2 + 
                       Employ_ilr1 + Employ_ilr2 + Employ_ilr3 + Employ_ilr4 + 
                       income_rate + unemp_rate + employ_evol + owner_rate + foreign, 
                data = my_explan)
stargazer::stargazer(mod_ilr_1, 
                     mod_ilr_2,
                     coef = list(mod_ilr_1$coefficients, mod_ilr_2$coefficients),
                     p = list(coef(summary(mod_ilr_1))[, 4], coef(summary(mod_ilr_2))[, 4]),
                     digits = 3,
                     type = "html")


	Dependent variable:

	y_ilr[, 1]	y_ilr[, 2]
	(1)	(2)

Age_ilr1	0.577	0.903
	(1.050)	(0.729)

Age_ilr2	0.478	0.143
	(0.542)	(0.376)

Diploma_ilr1	0.091	1.124^***
	(0.573)	(0.398)

Diploma_ilr2	-2.605^***	-1.124
	(0.981)	(0.681)

Employ_ilr1	0.436	0.079
	(0.378)	(0.262)

Employ_ilr2	0.718^*	0.633^**
	(0.412)	(0.286)

Employ_ilr3	-0.859^**	-0.239
	(0.377)	(0.261)

Employ_ilr4	-0.085	0.106
	(0.141)	(0.098)

income_rate	-3.788^**	-0.916
	(1.486)	(1.031)

unemp_rate	-2.379	-20.664^***
	(3.344)	(2.321)

employ_evol	0.148	0.027
	(0.112)	(0.078)

owner_rate	1.670	-3.021^***
	(1.310)	(0.909)

foreign	2.696	-0.452
	(2.267)	(1.573)

Constant	1.638	4.523^***
	(1.744)	(1.211)


Observations	95	95
R²	0.396	0.699
Adjusted R²	0.300	0.651
Residual Std. Error (df = 81)	0.315	0.219
F Statistic (df = 13; 81)	4.093^***	14.467^***

Note:	p<0.1; p<0.05; p<0.01

4.2 Transforming the results back to the simplex

We now transform the parameters from the ilr space to the simplex (these results are not presented in the article). For classical (numeric) variables, it can be done in one step by applying the ilr-inv transformation (formula(3), line 1 in the article) :

coeff_simplex_num <- compositions::ilrInv(
  coefficients(model_complete)[c("(Intercept)", "income_rate", 
                                 "unemp_rate", "employ_evol", 
                                 "owner_rate", "foreign"), ], V = t(V_votes))

For compositional variables, we apply formula(3), line 2 in the article:

coeff_simplex_Age <- t(V_votes) %*% t(coefficients(model_complete)[c("Age_ilr1", "Age_ilr2"),]) %*% 
  t(ilrBase(D = 3))
coeff_simplex_Diploma <- t(V_votes) %*% t(coefficients(model_complete)[c("Diploma_ilr1", "Diploma_ilr2"),]) %*% 
  V_dip
coeff_simplex_Employ <- t(V_votes) %*% t(coefficients(model_complete)[c("Employ_ilr1", "Employ_ilr2",
                "Employ_ilr3", "Employ_ilr4"),]) %*% V_employ

coeff_simplex <- rbind(as.vector(coeff_simplex_num["(Intercept)", ]),
      t(coeff_simplex_Age),
      t(coeff_simplex_Diploma),     
      t(coeff_simplex_Employ), 
      as.matrix(coeff_simplex_num[c("income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign"), ])
      )

Finally, the estimates of the parameters in the simplex are:

dimnames(coeff_simplex)[[1]] <- c("(Intercept)", 
                          "Age18-39", "Age40-64", "Age>65",
                          "Dip<BAC", "DipBAC", "DipSUP",
                          "EmpAZ", "EmpBE", "EmpFZ", "EmpGU", "EmpOQ",
                          "income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign" 
                          )
dimnames(coeff_simplex)[[2]] <- c("Left", "Right", "XR") 
coeff_simplex

##                      Left         Right           XR
## (Intercept)  2.326682e-01  7.660536e-01  0.001278181
## Age18-39    -4.928355e-01 -2.462211e-01  0.739056578
## Age40-64     1.739983e-01  3.232336e-01 -0.497231909
## Age>65       3.188372e-01 -7.701254e-02 -0.241824669
## Dip<BAC     -1.473621e+00  4.993120e-01  0.974309151
## DipBAC       1.534263e+00  1.195657e-01 -1.653828839
## DipSUP      -6.064201e-02 -6.188777e-01  0.679519688
## EmpAZ       -1.184365e-02 -1.392683e-02  0.025770486
## EmpBE        8.645126e-02 -1.688948e-01  0.082443522
## EmpFZ       -8.215997e-01  1.312813e-01  0.690318339
## EmpGU        4.282503e-01  1.609525e-01 -0.589202881
## EmpOQ        3.187417e-01 -1.094122e-01 -0.209329466
## income_rate  3.955136e-03  2.141970e-01  0.781847877
## unemp_rate   2.447467e-08  2.034124e-13  0.999999976
## employ_evol  3.747540e-01  3.185137e-01  0.306732278
## owner_rate   4.740510e-01  7.239864e-03  0.518709138
## foreign      9.281809e-01  2.480719e-02  0.047011883

4.3 Compute the significance in the simplex

4.3.1 Bootstrap procedure

We apply a bootstrap procedure:

For \(b\) from \(1\) to \(B\) (we choose \(B=1000\) here)

We sample with replacement the original data set
We make the estimations in the simplex
We go back to the simplex to obtain the parameters in the simplex

We do inference on the bootstrapped parameters obtained for the numeric variable

B <- 1000
coeff_simplex_B <- data.frame(b = character(0),
                              var = character(0),
                              Left = numeric(0), 
                              Right = numeric(0), 
                              XR = numeric(0)) 
for (b in 1:B) {
  set.seed(b)
  id_b <- sample(1:nrow(my_explan), replace = T)
  model_complete_b <- lm(y_ilr[id_b, ] ~ Age_ilr1 + Age_ilr2 + 
                       Diploma_ilr1 + Diploma_ilr2 + 
                       Employ_ilr1 + Employ_ilr2 + Employ_ilr3 + Employ_ilr4 + 
                       income_rate + unemp_rate + employ_evol + 
                       owner_rate + foreign, data = my_explan[id_b, ])
  
  coeff_simplex_b <- compositions::ilrInv(coefficients(model_complete_b), V = t(V_votes))
  coeff_simplex_b <- rbind(coeff_simplex_b["(Intercept)", ],
      cbind(compositions::ilrInv(coeff_simplex_b[c("Age_ilr1", "Age_ilr2"), 1]),
            compositions::ilrInv(coeff_simplex_b[c("Age_ilr1", "Age_ilr2"), 2]),
            compositions::ilrInv(coeff_simplex_b[c("Age_ilr1", "Age_ilr2"), 3])),
       cbind(compositions::ilrInv(coeff_simplex_b[c("Diploma_ilr1", "Diploma_ilr2"), 1], V = t(V_dip)),
            compositions::ilrInv(coeff_simplex_b[c("Diploma_ilr1", "Diploma_ilr2"), 2], V = t(V_dip)),
            compositions::ilrInv(coeff_simplex_b[c("Diploma_ilr1", "Diploma_ilr2"), 3], V = t(V_dip))),     
       cbind(compositions::ilrInv(coeff_simplex_b[c("Employ_ilr1", "Employ_ilr2", 
                                    "Employ_ilr3", "Employ_ilr4"), 1], V = t(V_employ)),
            compositions::ilrInv(coeff_simplex_b[c("Employ_ilr1", "Employ_ilr2", 
                                   "Employ_ilr3", "Employ_ilr4"), 2], V = t(V_employ)),
            compositions::ilrInv(coeff_simplex_b[c("Employ_ilr1", "Employ_ilr2", 
                                   "Employ_ilr3", "Employ_ilr4"), 3], V = t(V_employ))), 
      coeff_simplex_b[c("income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign"), ]
      )
dimnames(coeff_simplex_b)[[2]] <- c("Left", "Right", "XR")
coeff_simplex_b <- as.data.frame(coeff_simplex_b)
coeff_simplex_b$b <- as.character(b)
coeff_simplex_b$var <- c("(Intercept)", 
                          "Age18-39", "Age40-64", "Age>65",
                          "Dip<BAC", "DipBAC", "DipSUP",
                          "EmpAZ", "EmpBE", "EmpFZ", "EmpGU", "EmpOQ",
                          "income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign" 
                          )
coeff_simplex_B <- rbind(coeff_simplex_B, coeff_simplex_b)
}
# we tidy the data
coeff_simplex_B_longer <- tidyr::pivot_longer(coeff_simplex_B,
                    cols = c("Left", "Right", "XR"),
                    names_to = "parti",
                    values_to = "coeff")

We plot the distribution fot the variables employ_evol, income_rate and unemp_rate:

income_rate

true_coeff <- data.frame(parti = c("Left", "Right", "XR"),
                         y = 0,
                         coeff = as.numeric(coeff_simplex["income_rate", ]))
ggplot2::ggplot(dplyr::filter(coeff_simplex_B_longer, var == "income_rate"), aes(x = coeff)) +
  geom_histogram() +
   geom_vline(aes(xintercept = 1/3), color = "blue", linetype = "dashed", size = 1) +
  facet_wrap(vars(parti), ncol = 3) + 
  geom_point(data = true_coeff, aes(x = coeff, y = y), col = "red", size = 4, shape = 4)

unemp_rate

employ_evol

We can also plot the ternary diagram of the boostrapped parameters for these three variables (figure 3 in the article):

We compute the \(p\)-values for each numeric variable

p_values_intercept <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_intercept <- 2 * pmin(p_values_intercept, 1 - p_values_intercept)
p_values_income_rate <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "Right"),
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_income_rate <- 2 * pmin(p_values_income_rate, 1 - p_values_income_rate)
p_values_unemp_rate <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_unemp_rate <- 2 * pmin(p_values_unemp_rate, 1 - p_values_unemp_rate)
p_values_employ_evol <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_employ_evol <- 2 * pmin(p_values_employ_evol, 1 - p_values_employ_evol)
p_values_owner_rate <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "Right"),
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_owner_rate <- 2 * pmin(p_values_owner_rate, 1 - p_values_owner_rate)
p_values_foreign <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_foreign <- 2 * pmin(p_values_foreign, 1 - p_values_foreign)
p_values <- rbind(p_values_intercept, p_values_income_rate, p_values_unemp_rate,
                  p_values_employ_evol, p_values_owner_rate, p_values_foreign)
signf <- ifelse(p_values < 0.01, "(***)",
       ifelse(p_values < 0.05, "(**)",
              ifelse(p_values < 0.1, "(*)", "")))
p_values <- matrix(paste0("(", p_values, ")"), nrow(p_values), ncol(p_values))

We finally print the parameters in the simplex and their significance (Table 4 in the article):

coefficients_numeric <- cbind(
  paste0(round(coeff_simplex[c("(Intercept)", "income_rate", "unemp_rate", 
                               "employ_evol", "owner_rate", "foreign"), 1], 4), signf[, 1]),
  paste0(round(coeff_simplex[c("(Intercept)", "income_rate", "unemp_rate", 
                               "employ_evol", "owner_rate", "foreign"), 2], 4), signf[, 2]),
  paste0(round(coeff_simplex[c("(Intercept)", "income_rate", "unemp_rate", 
                               "employ_evol", "owner_rate", "foreign"), 3], 4), signf[, 3])
)
colnames(coefficients_numeric) <- c("Left", "Right", "XR")
rownames(coefficients_numeric) <- c("(Intercept)", "income_rate", "unemp_rate", 
                               "employ_evol", "owner_rate", "foreign")
coefficients_numeric

##             Left        Right         XR           
## (Intercept) "0.2327"    "0.7661"      "0.0013(***)"
## income_rate "0.004(**)" "0.2142"      "0.7818"     
## unemp_rate  "0(***)"    "0(***)"      "1(***)"     
## employ_evol "0.3748"    "0.3185"      "0.3067"     
## owner_rate  "0.4741"    "0.0072(***)" "0.5187"     
## foreign     "0.9282"    "0.0248"      "0.047"

4.3.2 Simulate the beta in the ilr coordinates space

To get the pseudo-significance, another approach is to simulate the beta-star parameters in the ilr space with respect to their distribution. Then we go back back to the simplex to obtain their distribution in the simplex.

mean_beta_ilr_1 <- coefficients(mod_ilr_1)
var_beta_ilr_1 <- summary(mod_ilr_1)$sigma ^ 2 * solve(crossprod(cbind(1, 
          as.matrix(my_explan[, c("Age_ilr1", "Age_ilr2", 
                                "Diploma_ilr1", "Diploma_ilr2", 
                                "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4", 
                                "income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]))))
mean_beta_ilr_2 <- coefficients(mod_ilr_2)
var_beta_ilr_2 <- summary(mod_ilr_2)$sigma ^ 2 * solve(crossprod(cbind(1, 
          as.matrix(my_explan[, c("Age_ilr1", "Age_ilr2", 
                                "Diploma_ilr1", "Diploma_ilr2", 
                                "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4", 
                                "income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]))))
set.seed(1)
coeff_simplex_b <- compositions::ilrInv(
  cbind(
  as.vector(t(mvtnorm::rmvnorm(1000, mean_beta_ilr_1, var_beta_ilr_1))),
  as.vector(t(mvtnorm::rmvnorm(1000, mean_beta_ilr_2, var_beta_ilr_2)))
), V = t(V_votes))
coeff_simplex_B <- as.data.frame(coeff_simplex_b)
names(coeff_simplex_B) = c("Left", "Right", "XR")
coeff_simplex_B$var <- rep(c("(Intercept)", "Age_ilr1", "Age_ilr2", 
                                "Diploma_ilr1", "Diploma_ilr2", 
                                "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4", 
                                "income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign"),
                           times = 1000)
coeff_simplex_B_longer <- tidyr::pivot_longer(coeff_simplex_B,
                                              cols = 1:3,
                                              names_to = "parti",
                                              values_to = "coeff")

We plot the distribution for the variables employ_evol, income_rate and unemp_rate:

income_rate

true_coeff <- data.frame(parti = c("Left", "Right", "XR"),
                         y = 0,
                         coeff = as.numeric(coeff_simplex["income_rate", ]))
ggplot2::ggplot(dplyr::filter(coeff_simplex_B_longer, var == "income_rate"), aes(x = coeff)) +
  geom_histogram() +
   geom_vline(aes(xintercept = 1/3), color = "blue", linetype = "dashed", size = 1) +
  facet_wrap(vars(parti), ncol = 3) + 
  geom_point(data = true_coeff, aes(x = coeff, y = y), col = "red", size = 4, shape = 4)

unemp_rate

employ_evol

We can also plot the ternary diagram of the boostrapped parameters for these three variables:

We compute the \(p\)-values for each numeric variable

p_values_intercept <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_intercept <- 2 * pmin(p_values_intercept, 1 - p_values_intercept)
p_values_income_rate <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "Right"),
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_income_rate <- 2 * pmin(p_values_income_rate, 1 - p_values_income_rate)
p_values_unemp_rate <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_unemp_rate <- 2 * pmin(p_values_unemp_rate, 1 - p_values_unemp_rate)
p_values_employ_evol <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_employ_evol <- 2 * pmin(p_values_employ_evol, 1 - p_values_employ_evol)
p_values_owner_rate <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "Right"),
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_owner_rate <- 2 * pmin(p_values_owner_rate, 1 - p_values_owner_rate)
p_values_foreign <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_foreign <- 2 * pmin(p_values_foreign, 1 - p_values_foreign)
p_values <- rbind(p_values_intercept, p_values_income_rate, p_values_unemp_rate,
                  p_values_employ_evol, p_values_owner_rate, p_values_foreign)
signf <- ifelse(p_values < 0.01, "(***)",
       ifelse(p_values < 0.05, "(**)",
              ifelse(p_values < 0.1, "(*)", "")))
p_values <- matrix(paste0("(", p_values, ")"), nrow(p_values), ncol(p_values))

We finally print the parameters in the simplex and their significance:

coefficients_numeric <- cbind(
  paste0(round(coeff_simplex[c("(Intercept)", "income_rate", "unemp_rate", 
                               "employ_evol", "owner_rate", "foreign"), 1], 4), signf[, 1]),
  paste0(round(coeff_simplex[c("(Intercept)", "income_rate", "unemp_rate", 
                               "employ_evol", "owner_rate", "foreign"), 2], 4), signf[, 2]),
  paste0(round(coeff_simplex[c("(Intercept)", "income_rate", "unemp_rate", 
                               "employ_evol", "owner_rate", "foreign"), 3], 4), signf[, 3])
)
colnames(coefficients_numeric) <- c("Left", "Right", "XR")
rownames(coefficients_numeric) <- c("(Intercept)", "income_rate", "unemp_rate", 
                               "employ_evol", "owner_rate", "foreign")
coefficients_numeric

##             Left        Right         XR           
## (Intercept) "0.2327"    "0.7661"      "0.0013(***)"
## income_rate "0.004(**)" "0.2142"      "0.7818"     
## unemp_rate  "0(***)"    "0(***)"      "1(***)"     
## employ_evol "0.3748"    "0.3185"      "0.3067"     
## owner_rate  "0.4741"    "0.0072(***)" "0.5187"     
## foreign     "0.9282"    "0.0248"      "0.047"

4.4 Computing the semi-elasticities

To compute the approximate semi-elasticities with respect to the numeric variables, we proceed as follows:

we compute the fitted values in the ilr coordinates space
we increase the numeric variable by \(1\) unit (to be defined depending on the variable), and we compute the corresponding predicted value in the ilr coordinates space
we go back to the simplex and compute the differences between the predictions and the initial fitted values.

For the explanatory variables which are compositional, at step 2, when we increase a component of a composition, we remove a constant to all other components keeping the same total.

# fitted values
fitted_ilr <- predict(model_complete)
# go back to the simplex
fitted_simplex <- ilrInv(fitted_ilr, V = t(V_votes))
# we store all the elasticities
elasticities <- data.frame(left = numeric(0), 
                                   right = numeric(0),
                                   xr = numeric(0))
# Numeric variables 
for (k in c("income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")) {
  # prediction when increasing by one 1% the numeric variable
  my_explan_predict <- my_explan
  my_explan_predict[, k] <- my_explan_predict[, k] + .01
  predicted_ilr <- predict(model_complete, newdata = my_explan_predict)
  predicted_simplex <- ilrInv(predicted_ilr, V = t(V_votes))
  # differences between the shares
  diff_simplex_left <- predicted_simplex[, 1] - fitted_simplex[, 1]
  diff_simplex_right <- predicted_simplex[, 2] - fitted_simplex[, 2]
  diff_simplex_xr <- predicted_simplex[, 3] - fitted_simplex[, 3]
  elasticities <- rbind(elasticities,
    data.frame(variable = k,
               dep = row.names(my_explan),
               left = diff_simplex_left, 
               right = diff_simplex_right, 
               xr = diff_simplex_xr))
}

# CoDa explanatory variables 
# for Age
for (k in c("age_1839", "age_4064", "age_65")) {
  my_explan_predict <- my_explan
  Age_predict <- Age
  Age_predict[, k] <- Age_predict[, k] + 0.01
  Age_predict[, !(c("age_1839", "age_4064", "age_65") %in% k)] <- 
    Age_predict[, !(c("age_1839", "age_4064", "age_65") %in% k)] - 0.01 / 2
  
  my_explan_predict[, c("Age_ilr1", "Age_ilr2")] <- ilr(Age_predict)
    
  predicted_ilr <- predict(model_complete, newdata = my_explan_predict)
  predicted_simplex <- ilrInv(predicted_ilr, V = t(V_votes))
  # differences between the shares
  diff_simplex_left <- predicted_simplex[, 1] - fitted_simplex[, 1]
  diff_simplex_right <- predicted_simplex[, 2] - fitted_simplex[, 2]
  diff_simplex_xr <- predicted_simplex[, 3] - fitted_simplex[, 3]
  elasticities <- rbind(elasticities,
                            data.frame(
                             variable = k,
                             dep = row.names(my_explan),
                             left = diff_simplex_left, 
                             right = diff_simplex_right, 
                             xr = diff_simplex_xr))
}

# for Diploma
for (k in c("No_CAPBEP", "bac", "diplom_sup")) {
  my_explan_predict <- my_explan
  Diploma_predict <- Diploma
  Diploma_predict[, k] <- Diploma_predict[, k] + 0.01
  Diploma_predict[, !(c("No_CAPBEP", "bac", "diplom_sup") %in% k)] <- 
    Diploma_predict[, !(c("No_CAPBEP", "bac", "diplom_sup") %in% k)] - 0.01 / 2
  
  my_explan_predict[, c("Diploma_ilr1", "Diploma_ilr2")] <- 
    ilr(Diploma_predict, V = t(V_dip))
    
  predicted_ilr <- predict(model_complete, newdata = my_explan_predict)
  predicted_simplex <- ilrInv(predicted_ilr, V = t(V_votes))
  # differences between the shares
  diff_simplex_left <- predicted_simplex[, 1] - fitted_simplex[, 1]
  diff_simplex_right <- predicted_simplex[, 2] - fitted_simplex[, 2]
  diff_simplex_xr <- predicted_simplex[, 3] - fitted_simplex[, 3]
  elasticities <- rbind(elasticities,
                            data.frame(
                             variable = k,
                             dep = row.names(my_explan),
                             left = diff_simplex_left, 
                             right = diff_simplex_right, 
                             xr = diff_simplex_xr))
}

# for Employ
for (k in c("AZ", "BE", "FZ", "GU", "OQ")) {
  my_explan_predict <- my_explan
  Employment_predict <- Employment
  Employment_predict[, k] <- Employment_predict[, k] + 0.01
  Employment_predict[, !(c("AZ", "BE", "FZ", "GU", "OQ") %in% k)] <- 
    Employment_predict[, !(c("AZ", "BE", "FZ", "GU", "OQ") %in% k)] - 0.01 / 2
  
  my_explan_predict[, c("Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4")] <- 
    ilr(Employment_predict, V = t(V_employ))
    
  predicted_ilr <- predict(model_complete, newdata = my_explan_predict)
  predicted_simplex <- ilrInv(predicted_ilr, V = t(V_votes))
  # differences between the shares
  diff_simplex_left <- predicted_simplex[, 1] - fitted_simplex[, 1]
  diff_simplex_right <- predicted_simplex[, 2] - fitted_simplex[, 2]
  diff_simplex_xr <- predicted_simplex[, 3] - fitted_simplex[, 3]
  elasticities <- rbind(elasticities,
                            data.frame(
                             variable = k,
                             dep = row.names(my_explan),
                             left = diff_simplex_left, 
                             right = diff_simplex_right, 
                             xr = diff_simplex_xr))
}
# tidy the data
elasticities <- tidyr::pivot_longer(elasticities,
                                           cols = 3:5, names_to = "party", values_to = "score")

For one variable (for example the unemployment rate), we represent parallel boxplots of the semi-elasticities for each party.

par(las = 1)
boxplot(score ~ party,
  data = dplyr::filter(elasticities, variable == "unemp_rate"))

For each party, we can also plot the semi-elasticities on a map

dep.2015.spdf@data[, c("se_left", "se_right", "se_others")] <-
  tidyr::pivot_wider(dplyr::filter(elasticities, variable == "unemp_rate"), id_cols = 2,
                     names_from = "party", values_from = "score")[, c("left", "right", "xr")]
dep.2015.sf <- sf::st_as_sf(dep.2015.spdf)
dep.2015.sf <- tibble::as_tibble(dep.2015.sf)

library(sf)
library(ggspatial)     # spatial figures
ggplot(data = dep.2015.sf, aes(geometry = geometry)) +
  geom_sf(aes(fill = se_left), linetype = 2, size = 0.5) +
  coord_sf(crs = st_crs(2154)) +
  annotation_scale(location = "bl", width_hint = 0.5) +
  scale_fill_gradient2('Semi-elasticities \n on the left', 
                       low='#007acc', mid = '#ffffff', 
                       high = '#ff1a1a', midpoint = 0, 
                       limits = c(-0.06, 0.06)) +
  annotation_north_arrow(location = "bl", which_north = "true", 
                         pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
                         style = north_arrow_fancy_orienteering)

ggplot(data = dep.2015.sf, aes(geometry = geometry)) +
  geom_sf(aes(fill = se_right), linetype = 2, size = 0.5) +
  coord_sf(crs = st_crs(2154)) +
  annotation_scale(location = "bl", width_hint = 0.5) +
  scale_fill_gradient2('Semi-elasticities \n on the right', 
                       low='#007acc', mid = '#ffffff', 
                       high = '#ff1a1a', midpoint = 0, 
                       limits = c(-0.06, 0.06)) +
  annotation_north_arrow(location = "bl", which_north = "true", 
                         pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
                         style = north_arrow_fancy_orienteering)

ggplot(data = dep.2015.sf, aes(geometry = geometry)) +
  geom_sf(aes(fill = se_others), linetype = 2, size = 0.5) +
  coord_sf(crs = st_crs(2154)) +
  annotation_scale(location = "bl", width_hint = 0.5) +
  scale_fill_gradient2('Semi-elasticities \n on the XR', 
                       low='#007acc', mid = '#ffffff', 
                       high = '#ff1a1a', midpoint = 0, 
                       limits = c(-0.06, 0.06)) +
  annotation_north_arrow(location = "bl", which_north = "true", 
                         pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
                         style = north_arrow_fancy_orienteering)

5 Comparaison between CoDa and Dirichlet regression

We compare the fitted values of our CoDa regression model with the fitted values of a Dirichlet model.

We compute the Dirichlet regression model by using DirichletReg package (see Maier, 2020). We then compute the fitted values in the simplex space:

my_explan_Dir <- my_explan
my_explan_Dir$Y <- DR_data(voteshare)
dirichlet_model <- DirichReg(Y ~ Age_ilr1 + Age_ilr2 + 
                       Diploma_ilr1 + Diploma_ilr2 + 
                       Employ_ilr1 + Employ_ilr2 + Employ_ilr3 + Employ_ilr4 + 
                       income_rate + unemp_rate + employ_evol + owner_rate + foreign, 
                       data = my_explan_Dir)
fitted_dirichlet <- predict(dirichlet_model)

To get the fitted values of the CoDa regression model, we first compute the fitted values in the ilr coordinates space and then go back to the simplex:

fitted_coda <- ilrInv(fitted_ilr, V = t(V_votes))

Finally, we plot the scatter plot of the fitted values for each party (Figure 4 in the article):

postscript(file = "./Figures/fitted.eps", width = 12, height = 4.1, paper = "special", 
           horizontal = FALSE)
op <- par(mfrow = c(1, 3), las = 1, oma = c(0, 0, 0, 0), 
          mar = c(4, 4.1, 2, 1.2), mgp = c(2.8, 1, 0), cex.axis = 1.5,
          cex.lab = 1.5, cex.main = 1.6)
plot(fitted_dirichlet[, 1], fitted_coda[, 1], xlab = "Dirichlet fitted values", 
     ylab = "CoDa fitted values", main = "Left", xlim = c(0.05, 0.6), ylim = c(0.05, 0.6))
abline(a = 0, b = 1)
plot(fitted_dirichlet[, 2], fitted_coda[, 2], xlab = "Dirichlet fitted values", 
     ylab = "CoDa fitted values", main = "Right", xlim = c(0.05, 0.6), ylim = c(0.05, 0.6))
abline(a = 0, b = 1)
plot(fitted_dirichlet[, 3], fitted_coda[, 3], xlab = "Dirichlet fitted values", 
     ylab = "CoDa fitted values", main = "XR", xlim = c(0.05, 0.6), ylim = c(0.05, 0.6))
abline(a = 0, b = 1)
dev.off()

## png 
##   2

Comment: the fitted values of the two models appear to be very close.

We now compute the MSE of the CoDa regression model:

sum(colMeans(residuals(model_complete) ^ 2))

## [1] 0.1254843

We transform the fitted values of the Dirichlet model (in the simplex) to their counterpart in the ilr space and we compute the MSE:

sum(colMeans((ilr(fitted_dirichlet, V = t(V_votes)) - y_ilr) ^ 2))

## [1] 0.1812526

Comment: It appears that the MSE of the two models are comparable. It is slightly better for the CoDa model.

6 Predictions

6.1 Representing the predicted values with respect to a numeric variable: application with unemployment rate

We focus on the three departements: Ariege (“09”), Cantal (“15”) and Bas-Rhin (“67”). For each of these three departments, we fix all their characteristics except unemp_rate and we create a grid of fictive unemp_rate from 0 to \(25\%\) by increments of \(1\%\).

To compute the predictions for the fictive points, we proceed as follows:

we compute the predicted values in the ilr coordinates space corresponding to the fictive points
we go back to the simplex and compute the differences between the predictions and the initial fitted values for this particular departement.

covariate_fix_1 <- my_explan[
  dep.2015.spdf$NOM_DEPT == "ARIEGE",
  c("Age_ilr1", "Age_ilr2", 
  "Diploma_ilr1", "Diploma_ilr2",
  "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
  "income_rate", "employ_evol", "owner_rate", "foreign")]
covariate_fix_2 <- my_explan[
  dep.2015.spdf$NOM_DEPT == "CANTAL",
  c("Age_ilr1", "Age_ilr2", 
  "Diploma_ilr1", "Diploma_ilr2",
  "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
  "income_rate", "employ_evol", "owner_rate", "foreign")]
covariate_fix_3 <- my_explan[
  dep.2015.spdf$NOM_DEPT == "BAS-RHIN",
  c("Age_ilr1", "Age_ilr2", 
  "Diploma_ilr1", "Diploma_ilr2",
  "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
  "income_rate", "employ_evol", "owner_rate", "foreign")]

unemp_rate <- seq(0.00, 0.25, length.out = 95)
new_data1 = cbind(covariate_fix_1, unemp_rate)
new_data2 = cbind(covariate_fix_2, unemp_rate)
new_data3 = cbind(covariate_fix_3, unemp_rate)
# prediction
pred_y_unemp_rate_1  <- predict(model_complete, new = new_data1)
pred_y_unemp_rate_2  <- predict(model_complete, new = new_data2)
pred_y_unemp_rate_3  <- predict(model_complete, new = new_data3)

# #convert to the simplex
compPred_unemp_rate_s1 <- ilrInv(pred_y_unemp_rate_1, V = t(V_votes))
compPred_unemp_rate_s2 <- ilrInv(pred_y_unemp_rate_2, V = t(V_votes))
compPred_unemp_rate_s3 <- ilrInv(pred_y_unemp_rate_3, V = t(V_votes))

To get Figure 5 in the article, we use the following code:

# postscript("./Figures/image.eps", width = 12, height = 5, paper = "special", 
#           horizontal = FALSE)
op <- par(mfrow = c(1, 3), las = 1, oma = c(0, 0, 0, 0), 
          mar = c(4, 4.1, 2, 1.2), mgp = c(2.4, 1, 0), cex.axis = 1.5,
          cex.lab = 1.5, cex.main = 1.6)
plot(unemp_rate, compPred_unemp_rate_s1[, 1], 
     ylab = as.expression(bquote(hat(Y))), 
     xlab = "Unemployment rate", 
     pch = 21, xlim = c(0, 0.25), xaxs = "i", bg = 2, 
     ylim = c(0, 1), yaxs = "i", col = "red", type = "l", lwd = 4)
legend("topleft", legend = c("Right", "Left", "XR"), cex = 1.2, 
       fill = c("royalblue", "red", "violet"))
title("Ariège")
lines(unemp_rate, compPred_unemp_rate_s1[,2], 
      col = "royalblue", type = "l", lty = 2, lwd = 4)
lines(unemp_rate, compPred_unemp_rate_s1[,3], 
      col = "violet", type = "l", lty = 3, lwd = 4)
abline(v = min(my_explan$unemp_rate), col = "grey", lty = 2)
abline(v = max(my_explan$unemp_rate), col = "grey", lty = 2)

plot(unemp_rate, compPred_unemp_rate_s2[,1], ylab = "", xlab = "Unemployment rate", pch = 21,
     xlim = c(0, 0.25), xaxs = "i", bg = 2, ylim = c(0, 1), 
     yaxs = "i", col = "red", type = "l", lwd = 4)
title("Cantal")
lines(unemp_rate, compPred_unemp_rate_s2[,2], col = "royalblue", 
      type = "l", lty = 2, lwd = 4)
lines(unemp_rate, compPred_unemp_rate_s2[,3], col = "violet", 
      type = "l", lty = 3, lwd = 4)
abline(v = min(my_explan$unemp_rate), col = "grey", lty = 2)
abline(v = max(my_explan$unemp_rate), col = "grey", lty = 2)

plot(unemp_rate, compPred_unemp_rate_s3[,1], ylab = "", xlab = "Unemployment rate", pch = 21,
     xlim = c(0, 0.25), xaxs = "i", bg = 2, ylim = c(0, 1), 
     yaxs = "i", col = "red", type = "l", lwd = 4)
title("Bas-Rhin")
lines(unemp_rate, compPred_unemp_rate_s3[,2], col = "royalblue", type = "l", lty = 2, lwd = 4)
lines(unemp_rate, compPred_unemp_rate_s3[,3], col = "violet", type = "l", lty = 3, lwd = 4)
abline(v = min(my_explan$unemp_rate), col = "grey", lty = 2)
abline(v = max(my_explan$unemp_rate), col = "grey", lty = 2)

par(op)
# dev.off()

To get Figure 6 in the article, we first need to compile the following local function which permits to represent one particular composition on a ternary diagram :

source("http://www.thibault.laurent.free.fr/code/election_coda/plot_1_compo.R")

Then, we execute the following code:

colnames(compPred_unemp_rate_s1) <- c("Left", "Right", "XR")
colnames(compPred_unemp_rate_s2) <- c("Left", "Right", "XR")
colnames(compPred_unemp_rate_s3) <- c("Left", "Right", "XR")
# we prepare the nodes of the ternary diagram
A <- c(0, 0)
B <- c(1, 0)
C <- c(0.5, sqrt(3)/2)
# transform the predictions in categorical variables
my.palette <- brewer.pal(n = 9, name = "OrRd")
breaks_ilr1 <- classIntervals(new_data1$unemp_rate, n = 9, 
                 style = "quantile", intervalClosure = "left")
# postscript("./Figures/figure_4.eps", width = 12, height = 5, paper = "special", 
#           horizontal = FALSE)
op <- par(mfrow = c(1, 3), oma = c(0, 1, 2, 1), 
          mar = c(0, 0.5, 1.5, 0.5), cex.main = 1.6)
### Ternary diagram for Ariege
# transform the compositions in the cartesian coordinates
ariege_simplex_x <- compPred_unemp_rate_s1[, 1] * A[1] + 
  compPred_unemp_rate_s1[, 2] * B[1] + compPred_unemp_rate_s1[, 3] * C[1] 
ariege_simplex_y <- compPred_unemp_rate_s1[, 1] * A[2] +
  compPred_unemp_rate_s1[, 2] * B[2] + compPred_unemp_rate_s1[, 3] * C[2] 
# plot the ternary diagram
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F, 
     type = "n", xlab = "", ylab = "", asp = 1, main = "Ariège", 
     ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
legend("topleft", legend = paste0("[", round(breaks_ilr1$brks[1:9], 2), 
                                  ";", round(breaks_ilr1$brks[2:10], 2), "["), 
       cex = 1.4, fill = my.palette, title = "Unemployment rate", bty = "n")
legend("topright", legend = c("Observed", "Predicted"), 
       cex = 1.8, pch = c(15, 18), col = "green")
text(c(0, 1, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), 
     c("Left", "Right", "XR"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(ariege_simplex_x, ariege_simplex_y, pch = 16, 
       col = my.palette[findCols(breaks_ilr1)], cex = 1)
plot_1_compo(voteshare[22,], col = "green", bg="green", cex = 2, pch = 15)
plot_1_compo(compPred_unemp_rate_s1[53, ], col = "green", cex = 2.5, pch = 18)
lines(c(0.5, 0.5), c(0, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.25, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.75, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")

### Ternary diagram for Cantal
cantal_simplex_x <- compPred_unemp_rate_s2[, 1] * A[1] + 
  compPred_unemp_rate_s2[, 2] * B[1] + compPred_unemp_rate_s2[, 3] * C[1] 
cantal_simplex_y <- compPred_unemp_rate_s2[, 1] * A[2] + 
  compPred_unemp_rate_s2[, 2] * B[2] + compPred_unemp_rate_s2[, 3] * C[2] 

plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F, 
     type = "n", xlab = "", ylab = "", asp = 1, main = "Cantal", 
     ylim = c(-0.1, sqrt(3)/2), cex.main = 2)

text(c(0, 1, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), 
     c("Left", "Right", "XR"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")

points(cantal_simplex_x, cantal_simplex_y, pch = 16, 
       col = my.palette[findCols(breaks_ilr1)], cex = 1)

plot_1_compo(voteshare[16,], col = "green", bg="green", cex = 2, pch = 15)
plot_1_compo(compPred_unemp_rate_s2[32, ], col = "green", bg="green", cex = 2.5, pch = 18)

lines(c(0.5, 0.5), c(0, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.25, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.75, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")

### Ternary diagram for Bas-Rhin

basrhin_simplex_x <- compPred_unemp_rate_s3[, 1] * A[1] + 
  compPred_unemp_rate_s3[, 2] * B[1] + compPred_unemp_rate_s3[, 3] * C[1] 
basrhin_simplex_y <- compPred_unemp_rate_s3[, 1] * A[2] + 
  compPred_unemp_rate_s3[, 2] * B[2] + compPred_unemp_rate_s3[, 3] * C[2] 

plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F, 
     type = "n", xlab = "", ylab = "", asp = 1, main = "Bas-Rhin", 
     ylim = c(-0.1, sqrt(3)/2), cex.main = 2)

text(c(0, 1, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), 
     c("Left", "Right", "XR"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")

points(basrhin_simplex_x, basrhin_simplex_y, pch = 16, 
       col = my.palette[findCols(breaks_ilr1)], cex = 1)

plot_1_compo(voteshare[10,], col = "green", bg="green", cex = 2, pch = 15)
plot_1_compo(compPred_unemp_rate_s3[62, ], col = "green", bg="green", cex = 2.5, pch = 18)

lines(c(0.5, 0.5), c(0, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.25, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.75, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")

# dev.off()

6.2 Representing the predicted values with respect to a categorical variable: application with unemployment rate

All explanatory variables are fixed to the value of the given department except Diploma. We create of grid of fictive points in the Diploma triangle and compute the predicted shares at each point of this grid. However since it is impossible to plot a function from the simplex to the simplex, we choose to summarize the predicted shares by the winning party (corresponding to the highest share) and color the triangle in the Diploma space according to the winning party color.

# we fix the covariates
covariate_fix_1c <- my_explan[
  dep.2015.spdf$NOM_DEPT == "ARIEGE",
  c("Age_ilr1", "Age_ilr2", 
    "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
    "income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]
covariate_fix_2c <- my_explan[
  dep.2015.spdf$NOM_DEPT == "CANTAL",
  c("Age_ilr1", "Age_ilr2", 
    "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
    "income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]
covariate_fix_3c <- my_explan[
  dep.2015.spdf$NOM_DEPT == "BAS-RHIN",
  c("Age_ilr1", "Age_ilr2", 
    "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
    "income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]
# create a grid of shares for Diploma
x_seq <- seq(0.01, 0.99, 0.01)
x_Diploma <- data.frame(No_CAPBEP = rep(x_seq, 101), 
                     bac = rep(x_seq, each = 101))
x_Diploma$diplom_sup <- 1 - (x_Diploma$bac + x_Diploma$No_CAPBEP)
x_Diploma <- x_Diploma[x_Diploma$diplom_sup > 0, ]
# transform the grid in the ilr
x_Diploma_ilr <- ilr(x_Diploma, V = t(V_dip))
colnames(x_Diploma_ilr) = c("Diploma_ilr1", "Diploma_ilr2")
new_data1c <- cbind(covariate_fix_1c, x_Diploma_ilr)
new_data2c <- cbind(covariate_fix_2c, x_Diploma_ilr)
new_data3c <- cbind(covariate_fix_3c, x_Diploma_ilr)
# prediction
pred_y_dip_1  <- predict(model_complete, new = new_data1c)
pred_y_dip_2  <- predict(model_complete, new = new_data2c)
pred_y_dip_3  <- predict(model_complete, new = new_data3c)

# convert vers le simplex
compPred_dip_s1 <- ilrInv(pred_y_dip_1, V = t(V_votes))
compPred_dip_s2 <- ilrInv(pred_y_dip_2, V = t(V_votes))
compPred_dip_s3 <- ilrInv(pred_y_dip_3, V = t(V_votes))

To get Figure 7, we use the following code:

Diploma_simplex_x <- x_Diploma[, 1] * A[1] + x_Diploma[, 2] * B[1] + x_Diploma[, 3] * C[1] 
Diploma_simplex_y <- x_Diploma[, 1] * A[2] + x_Diploma[, 2] * B[2] + x_Diploma[, 3] * C[2] 
Diploma_depart_x <- Diploma[, 1] * A[1] + Diploma[, 2] * B[1] + Diploma[, 3] * C[1] 
Diploma_depart_y <- Diploma[, 1] * A[2] + Diploma[, 2] * B[2] + Diploma[, 3] * C[2] 

# postscript("./Figures/test.eps", width = 12, height = 5, paper = "special", 
#           horizontal = FALSE)
# Ternary diagram for Ariège
op <- par(mfrow = c(1, 3), oma = c(0, 1.5, 2, 1), mar = c(0, 0.7, 1.5, 0.7))
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F, 
     type = "n", xlab = "", ylab = "", asp = 1, main = "Ariège", 
     ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
legend("topleft", legend = c("Right", "Left", "XR"), cex = 1.5, 
       fill = c("royalblue", "red", "violet"))
text(c(0.05, 0.95, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), 
     c("<BAC", "BAC", "SUP"), pos = 3, 
     cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(Diploma_simplex_x, Diploma_simplex_y, pch = 16, 
       col = c("red", "royalblue", "violet")[apply(compPred_dip_s1, 1, which.max)], 
       cex = 1)
points(Diploma_depart_x, Diploma_depart_y, pch = 16, cex = 1.2)
# Ternary diagram for Cantal
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F, 
     type = "n", xlab = "", ylab = "", asp = 1, main = "Cantal", ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
text(c(0.05, 0.95, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), c("<BAC", "BAC", "SUP"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(Diploma_simplex_x, Diploma_simplex_y, pch = 16, 
       col = c("red", "royalblue", "violet")[apply(compPred_dip_s2, 1, which.max)], 
       cex = 1)
points(Diploma_depart_x, Diploma_depart_y, pch = 16, cex = 1.2)
# Ternary diagram for Bas-Rhin
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F, 
     type = "n", xlab = "", ylab = "", asp = 1, main = "Bas-Rhin", ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
text(c(0.05, 0.95, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), c("<BAC", "BAC", "SUP"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(Diploma_simplex_x, Diploma_simplex_y, pch = 16, 
       col = c("red", "royalblue", "violet")[apply(compPred_dip_s3, 1, which.max)], 
       cex = 1)
points(Diploma_depart_x, Diploma_depart_y, pch = 16, cex = 1.2)

# dev.off()

7 Bibliography

K. Gerald van den Boogaart, Raimon Tolosana-Delgado and Matevz Bren (2020). compositions: Compositional Data Analysis. R package version 2.0-0. https://CRAN.R-project.org/package=compositions
Maier, M. J. (2020). DirichletReg: Dirichlet Regression in R. R package version 0.7-0. http://dirichletreg.r-forge.r-project.org/
Pebesma, E.J., R.S. Bivand, 2005. Classes and methods for spatial data in R. R News 5 (2)*, https://cran.r-project.org/doc/Rnews/.

Analyzing the impacts of socio-economic factors on French departmental elections with CoDa methods (Supplementary material)

Thi Huong An Nguyen, Christine Thomas-Agnan, Thibault Laurent, Anne Ruiz-Gazen

Last update: 2020-10-19