We provide the data and the R code used in the article “Analyzing the impacts of socio-economic factors on French departmental elections with CoDa methods” so that readers may reproduce all the figures, tables and statistics presented in the article with the R software. The pdf version of the supplementary variable is available here.

If you use this code, please cite:

Nguyen T.H.A, Thomas-Agnan C., Laurent T. and A. Ruiz-Gazen (2020). Analyzing the impacts of socio-economic factors on French departmental elections with CoDa methods. \(\color{red}{\text{TSE WP}}\).

1 Prerequisites

Required packages:

Loading packages:

Information about the current R session :

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
## LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
## 
## locale:
##  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
##  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
##  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggtern_3.3.0       sp_1.4-2           RColorBrewer_1.1-2 gridExtra_2.3     
## [5] ggplot2_3.3.2      DirichletReg_0.7-0 Formula_1.2-3      compositions_2.0-0
## [9] classInt_0.4-3    
## 
## loaded via a namespace (and not attached):
##  [1] zoo_1.8-8          tidyselect_1.1.0   xfun_0.15          purrr_0.3.4       
##  [5] lattice_0.20-41    latex2exp_0.4.0    colorspace_1.4-1   vctrs_0.3.2       
##  [9] generics_0.0.2     htmltools_0.5.0    yaml_2.2.1         rlang_0.4.7       
## [13] e1071_1.7-3        pillar_1.4.6       glue_1.4.1         withr_2.2.0       
## [17] lifecycle_0.2.0    plyr_1.8.6         robustbase_0.93-6  stringr_1.4.0     
## [21] munsell_0.5.0      gtable_0.3.0       evaluate_0.14      knitr_1.29        
## [25] miscTools_0.6-26   class_7.3-17       DEoptimR_1.0-8     proto_1.0.0       
## [29] Rcpp_1.0.5         KernSmooth_2.23-17 scales_1.1.1       maxLik_1.4-4      
## [33] tensorA_0.36.1     digest_0.6.25      stringi_1.4.6      dplyr_1.0.0       
## [37] grid_4.0.3         tools_4.0.3        sandwich_2.5-1     magrittr_1.5      
## [41] tibble_3.0.3       crayon_1.3.4       pkgconfig_2.0.3    MASS_7.3-53       
## [45] ellipsis_0.3.1     bayesm_3.1-4       rmarkdown_2.3      R6_2.4.1          
## [49] compiler_4.0.3

2 Preparation of the data

We load the data basis:

The data is presented as a Spatial object (see Pebesma and Bivand, 2005, for more informations). The vote shares correspond to the variables left, right, others. We print the first 5 rows of the data:

##          NOM_DEPT      left     right    others
## 40           JURA 0.3376844 0.4035081 0.2588076
## 43          LOIRE 0.3369003 0.3593964 0.3037033
## 77 SEINE-MARITIME 0.4236827 0.3119438 0.2643735
## 90          YONNE 0.2852518 0.3853767 0.3293715
## 69      HAUT-RHIN 0.1777443 0.5126041 0.3096516

We drop the departement of Paris which is not concerned by the election

We use the acomp class of object from the compositions package which provides the means to analyse compositions in the framework of the Aitchison Simplex. We do it for :

Then we use the isometric log ratio transform thanks to the ilr() function. We do it for:

Finally, we create the data.frame that includes all the explanatory variables:

3 Descriptive statistics

3.1 Table 2

The following codes yields Table 2 in the article.

3.1.1 Descriptive statistics on the compositional variables

##   left  right others 
##  0.368  0.384  0.248
## age_1839 age_4064   age_65 
##    0.314    0.429    0.257
##  No_CAPBEP        bac diplom_sup 
##      0.589      0.169      0.242
##    AZ    BE    FZ    GU    OQ 
## 0.044 0.100 0.048 0.432 0.375

3.1.2 Descriptive statistics on the numeric variables

##      unemp_rate employ_evol owner_rate income_rate foreign
## mean      0.117      -0.145      0.616       0.552   0.050
## sd        0.017       0.578      0.062       0.057   0.031

4 CoDa regression

We now fit a CoDa regression model describing the impacts of socio-economic factors on vote shares in the 2015 French departmental election. We include all explanatory variables from our data set in the regression model.

4.1 Modeling in the ilr coordinates space

We obtain estimated parameters and their standard deviations (in brackets) presented in Table 3:

Dependent variable:
y_ilr[, 1] y_ilr[, 2]
(1) (2)
Age_ilr1 0.577 0.903
(1.050) (0.729)
Age_ilr2 0.478 0.143
(0.542) (0.376)
Diploma_ilr1 0.091 1.124***
(0.573) (0.398)
Diploma_ilr2 -2.605*** -1.124
(0.981) (0.681)
Employ_ilr1 0.436 0.079
(0.378) (0.262)
Employ_ilr2 0.718* 0.633**
(0.412) (0.286)
Employ_ilr3 -0.859** -0.239
(0.377) (0.261)
Employ_ilr4 -0.085 0.106
(0.141) (0.098)
income_rate -3.788** -0.916
(1.486) (1.031)
unemp_rate -2.379 -20.664***
(3.344) (2.321)
employ_evol 0.148 0.027
(0.112) (0.078)
owner_rate 1.670 -3.021***
(1.310) (0.909)
foreign 2.696 -0.452
(2.267) (1.573)
Constant 1.638 4.523***
(1.744) (1.211)
Observations 95 95
R2 0.396 0.699
Adjusted R2 0.300 0.651
Residual Std. Error (df = 81) 0.315 0.219
F Statistic (df = 13; 81) 4.093*** 14.467***
Note: p<0.1; p<0.05; p<0.01

4.2 Transforming the results back to the simplex

We now transform the parameters from the ilr space to the simplex (these results are not presented in the article). For classical (numeric) variables, it can be done in one step by applying the ilr-inv transformation (formula(3), line 1 in the article) :

For compositional variables, we apply formula(3), line 2 in the article:

Finally, the estimates of the parameters in the simplex are:

##                      Left         Right           XR
## (Intercept)  2.326682e-01  7.660536e-01  0.001278181
## Age18-39    -4.928355e-01 -2.462211e-01  0.739056578
## Age40-64     1.739983e-01  3.232336e-01 -0.497231909
## Age>65       3.188372e-01 -7.701254e-02 -0.241824669
## Dip<BAC     -1.473621e+00  4.993120e-01  0.974309151
## DipBAC       1.534263e+00  1.195657e-01 -1.653828839
## DipSUP      -6.064201e-02 -6.188777e-01  0.679519688
## EmpAZ       -1.184365e-02 -1.392683e-02  0.025770486
## EmpBE        8.645126e-02 -1.688948e-01  0.082443522
## EmpFZ       -8.215997e-01  1.312813e-01  0.690318339
## EmpGU        4.282503e-01  1.609525e-01 -0.589202881
## EmpOQ        3.187417e-01 -1.094122e-01 -0.209329466
## income_rate  3.955136e-03  2.141970e-01  0.781847877
## unemp_rate   2.447467e-08  2.034124e-13  0.999999976
## employ_evol  3.747540e-01  3.185137e-01  0.306732278
## owner_rate   4.740510e-01  7.239864e-03  0.518709138
## foreign      9.281809e-01  2.480719e-02  0.047011883

4.3 Compute the significance in the simplex

4.3.1 Bootstrap procedure

We apply a bootstrap procedure:

For \(b\) from \(1\) to \(B\) (we choose \(B=1000\) here)

  1. We sample with replacement the original data set

  2. We make the estimations in the simplex

  3. We go back to the simplex to obtain the parameters in the simplex

We do inference on the bootstrapped parameters obtained for the numeric variable

B <- 1000
coeff_simplex_B <- data.frame(b = character(0),
                              var = character(0),
                              Left = numeric(0), 
                              Right = numeric(0), 
                              XR = numeric(0)) 
for (b in 1:B) {
  set.seed(b)
  id_b <- sample(1:nrow(my_explan), replace = T)
  model_complete_b <- lm(y_ilr[id_b, ] ~ Age_ilr1 + Age_ilr2 + 
                       Diploma_ilr1 + Diploma_ilr2 + 
                       Employ_ilr1 + Employ_ilr2 + Employ_ilr3 + Employ_ilr4 + 
                       income_rate + unemp_rate + employ_evol + 
                       owner_rate + foreign, data = my_explan[id_b, ])
  
  coeff_simplex_b <- compositions::ilrInv(coefficients(model_complete_b), V = t(V_votes))
  coeff_simplex_b <- rbind(coeff_simplex_b["(Intercept)", ],
      cbind(compositions::ilrInv(coeff_simplex_b[c("Age_ilr1", "Age_ilr2"), 1]),
            compositions::ilrInv(coeff_simplex_b[c("Age_ilr1", "Age_ilr2"), 2]),
            compositions::ilrInv(coeff_simplex_b[c("Age_ilr1", "Age_ilr2"), 3])),
       cbind(compositions::ilrInv(coeff_simplex_b[c("Diploma_ilr1", "Diploma_ilr2"), 1], V = t(V_dip)),
            compositions::ilrInv(coeff_simplex_b[c("Diploma_ilr1", "Diploma_ilr2"), 2], V = t(V_dip)),
            compositions::ilrInv(coeff_simplex_b[c("Diploma_ilr1", "Diploma_ilr2"), 3], V = t(V_dip))),     
       cbind(compositions::ilrInv(coeff_simplex_b[c("Employ_ilr1", "Employ_ilr2", 
                                    "Employ_ilr3", "Employ_ilr4"), 1], V = t(V_employ)),
            compositions::ilrInv(coeff_simplex_b[c("Employ_ilr1", "Employ_ilr2", 
                                   "Employ_ilr3", "Employ_ilr4"), 2], V = t(V_employ)),
            compositions::ilrInv(coeff_simplex_b[c("Employ_ilr1", "Employ_ilr2", 
                                   "Employ_ilr3", "Employ_ilr4"), 3], V = t(V_employ))), 
      coeff_simplex_b[c("income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign"), ]
      )
dimnames(coeff_simplex_b)[[2]] <- c("Left", "Right", "XR")
coeff_simplex_b <- as.data.frame(coeff_simplex_b)
coeff_simplex_b$b <- as.character(b)
coeff_simplex_b$var <- c("(Intercept)", 
                          "Age18-39", "Age40-64", "Age>65",
                          "Dip<BAC", "DipBAC", "DipSUP",
                          "EmpAZ", "EmpBE", "EmpFZ", "EmpGU", "EmpOQ",
                          "income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign" 
                          )
coeff_simplex_B <- rbind(coeff_simplex_B, coeff_simplex_b)
}
# we tidy the data
coeff_simplex_B_longer <- tidyr::pivot_longer(coeff_simplex_B,
                    cols = c("Left", "Right", "XR"),
                    names_to = "parti",
                    values_to = "coeff")

We plot the distribution fot the variables employ_evol, income_rate and unemp_rate:

  • income_rate

  • unemp_rate

  • employ_evol

We can also plot the ternary diagram of the boostrapped parameters for these three variables (figure 3 in the article):

We compute the \(p\)-values for each numeric variable

p_values_intercept <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_intercept <- 2 * pmin(p_values_intercept, 1 - p_values_intercept)
p_values_income_rate <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "Right"),
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_income_rate <- 2 * pmin(p_values_income_rate, 1 - p_values_income_rate)
p_values_unemp_rate <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_unemp_rate <- 2 * pmin(p_values_unemp_rate, 1 - p_values_unemp_rate)
p_values_employ_evol <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_employ_evol <- 2 * pmin(p_values_employ_evol, 1 - p_values_employ_evol)
p_values_owner_rate <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "Right"),
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_owner_rate <- 2 * pmin(p_values_owner_rate, 1 - p_values_owner_rate)
p_values_foreign <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_foreign <- 2 * pmin(p_values_foreign, 1 - p_values_foreign)
p_values <- rbind(p_values_intercept, p_values_income_rate, p_values_unemp_rate,
                  p_values_employ_evol, p_values_owner_rate, p_values_foreign)
signf <- ifelse(p_values < 0.01, "(***)",
       ifelse(p_values < 0.05, "(**)",
              ifelse(p_values < 0.1, "(*)", "")))
p_values <- matrix(paste0("(", p_values, ")"), nrow(p_values), ncol(p_values))

We finally print the parameters in the simplex and their significance (Table 4 in the article):

##             Left        Right         XR           
## (Intercept) "0.2327"    "0.7661"      "0.0013(***)"
## income_rate "0.004(**)" "0.2142"      "0.7818"     
## unemp_rate  "0(***)"    "0(***)"      "1(***)"     
## employ_evol "0.3748"    "0.3185"      "0.3067"     
## owner_rate  "0.4741"    "0.0072(***)" "0.5187"     
## foreign     "0.9282"    "0.0248"      "0.047"

4.3.2 Simulate the beta in the ilr coordinates space

To get the pseudo-significance, another approach is to simulate the beta-star parameters in the ilr space with respect to their distribution. Then we go back back to the simplex to obtain their distribution in the simplex.

We plot the distribution for the variables employ_evol, income_rate and unemp_rate:

  • income_rate

  • unemp_rate

  • employ_evol

We can also plot the ternary diagram of the boostrapped parameters for these three variables:

We compute the \(p\)-values for each numeric variable

p_values_intercept <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_intercept <- 2 * pmin(p_values_intercept, 1 - p_values_intercept)
p_values_income_rate <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "Right"),
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_income_rate <- 2 * pmin(p_values_income_rate, 1 - p_values_income_rate)
p_values_unemp_rate <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_unemp_rate <- 2 * pmin(p_values_unemp_rate, 1 - p_values_unemp_rate)
p_values_employ_evol <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_employ_evol <- 2 * pmin(p_values_employ_evol, 1 - p_values_employ_evol)
p_values_owner_rate <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "Right"),
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_owner_rate <- 2 * pmin(p_values_owner_rate, 1 - p_values_owner_rate)
p_values_foreign <- c(length(which((coeff_simplex_B_longer[
  (coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "Left"), 
  "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "Right"), 
    "coeff"] > 1/3))) / B,
  length(which((coeff_simplex_B_longer[
    (coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "XR"), 
    "coeff"] > 1/3))) / B)
p_values_foreign <- 2 * pmin(p_values_foreign, 1 - p_values_foreign)
p_values <- rbind(p_values_intercept, p_values_income_rate, p_values_unemp_rate,
                  p_values_employ_evol, p_values_owner_rate, p_values_foreign)
signf <- ifelse(p_values < 0.01, "(***)",
       ifelse(p_values < 0.05, "(**)",
              ifelse(p_values < 0.1, "(*)", "")))
p_values <- matrix(paste0("(", p_values, ")"), nrow(p_values), ncol(p_values))

We finally print the parameters in the simplex and their significance:

##             Left        Right         XR           
## (Intercept) "0.2327"    "0.7661"      "0.0013(***)"
## income_rate "0.004(**)" "0.2142"      "0.7818"     
## unemp_rate  "0(***)"    "0(***)"      "1(***)"     
## employ_evol "0.3748"    "0.3185"      "0.3067"     
## owner_rate  "0.4741"    "0.0072(***)" "0.5187"     
## foreign     "0.9282"    "0.0248"      "0.047"

4.4 Computing the semi-elasticities

To compute the approximate semi-elasticities with respect to the numeric variables, we proceed as follows:

  1. we compute the fitted values in the ilr coordinates space

  2. we increase the numeric variable by \(1\) unit (to be defined depending on the variable), and we compute the corresponding predicted value in the ilr coordinates space

  3. we go back to the simplex and compute the differences between the predictions and the initial fitted values.

For the explanatory variables which are compositional, at step 2, when we increase a component of a composition, we remove a constant to all other components keeping the same total.

# fitted values
fitted_ilr <- predict(model_complete)
# go back to the simplex
fitted_simplex <- ilrInv(fitted_ilr, V = t(V_votes))
# we store all the elasticities
elasticities <- data.frame(left = numeric(0), 
                                   right = numeric(0),
                                   xr = numeric(0))
# Numeric variables 
for (k in c("income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")) {
  # prediction when increasing by one 1% the numeric variable
  my_explan_predict <- my_explan
  my_explan_predict[, k] <- my_explan_predict[, k] + .01
  predicted_ilr <- predict(model_complete, newdata = my_explan_predict)
  predicted_simplex <- ilrInv(predicted_ilr, V = t(V_votes))
  # differences between the shares
  diff_simplex_left <- predicted_simplex[, 1] - fitted_simplex[, 1]
  diff_simplex_right <- predicted_simplex[, 2] - fitted_simplex[, 2]
  diff_simplex_xr <- predicted_simplex[, 3] - fitted_simplex[, 3]
  elasticities <- rbind(elasticities,
    data.frame(variable = k,
               dep = row.names(my_explan),
               left = diff_simplex_left, 
               right = diff_simplex_right, 
               xr = diff_simplex_xr))
}

# CoDa explanatory variables 
# for Age
for (k in c("age_1839", "age_4064", "age_65")) {
  my_explan_predict <- my_explan
  Age_predict <- Age
  Age_predict[, k] <- Age_predict[, k] + 0.01
  Age_predict[, !(c("age_1839", "age_4064", "age_65") %in% k)] <- 
    Age_predict[, !(c("age_1839", "age_4064", "age_65") %in% k)] - 0.01 / 2
  
  my_explan_predict[, c("Age_ilr1", "Age_ilr2")] <- ilr(Age_predict)
    
  predicted_ilr <- predict(model_complete, newdata = my_explan_predict)
  predicted_simplex <- ilrInv(predicted_ilr, V = t(V_votes))
  # differences between the shares
  diff_simplex_left <- predicted_simplex[, 1] - fitted_simplex[, 1]
  diff_simplex_right <- predicted_simplex[, 2] - fitted_simplex[, 2]
  diff_simplex_xr <- predicted_simplex[, 3] - fitted_simplex[, 3]
  elasticities <- rbind(elasticities,
                            data.frame(
                             variable = k,
                             dep = row.names(my_explan),
                             left = diff_simplex_left, 
                             right = diff_simplex_right, 
                             xr = diff_simplex_xr))
}

# for Diploma
for (k in c("No_CAPBEP", "bac", "diplom_sup")) {
  my_explan_predict <- my_explan
  Diploma_predict <- Diploma
  Diploma_predict[, k] <- Diploma_predict[, k] + 0.01
  Diploma_predict[, !(c("No_CAPBEP", "bac", "diplom_sup") %in% k)] <- 
    Diploma_predict[, !(c("No_CAPBEP", "bac", "diplom_sup") %in% k)] - 0.01 / 2
  
  my_explan_predict[, c("Diploma_ilr1", "Diploma_ilr2")] <- 
    ilr(Diploma_predict, V = t(V_dip))
    
  predicted_ilr <- predict(model_complete, newdata = my_explan_predict)
  predicted_simplex <- ilrInv(predicted_ilr, V = t(V_votes))
  # differences between the shares
  diff_simplex_left <- predicted_simplex[, 1] - fitted_simplex[, 1]
  diff_simplex_right <- predicted_simplex[, 2] - fitted_simplex[, 2]
  diff_simplex_xr <- predicted_simplex[, 3] - fitted_simplex[, 3]
  elasticities <- rbind(elasticities,
                            data.frame(
                             variable = k,
                             dep = row.names(my_explan),
                             left = diff_simplex_left, 
                             right = diff_simplex_right, 
                             xr = diff_simplex_xr))
}

# for Employ
for (k in c("AZ", "BE", "FZ", "GU", "OQ")) {
  my_explan_predict <- my_explan
  Employment_predict <- Employment
  Employment_predict[, k] <- Employment_predict[, k] + 0.01
  Employment_predict[, !(c("AZ", "BE", "FZ", "GU", "OQ") %in% k)] <- 
    Employment_predict[, !(c("AZ", "BE", "FZ", "GU", "OQ") %in% k)] - 0.01 / 2
  
  my_explan_predict[, c("Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4")] <- 
    ilr(Employment_predict, V = t(V_employ))
    
  predicted_ilr <- predict(model_complete, newdata = my_explan_predict)
  predicted_simplex <- ilrInv(predicted_ilr, V = t(V_votes))
  # differences between the shares
  diff_simplex_left <- predicted_simplex[, 1] - fitted_simplex[, 1]
  diff_simplex_right <- predicted_simplex[, 2] - fitted_simplex[, 2]
  diff_simplex_xr <- predicted_simplex[, 3] - fitted_simplex[, 3]
  elasticities <- rbind(elasticities,
                            data.frame(
                             variable = k,
                             dep = row.names(my_explan),
                             left = diff_simplex_left, 
                             right = diff_simplex_right, 
                             xr = diff_simplex_xr))
}
# tidy the data
elasticities <- tidyr::pivot_longer(elasticities,
                                           cols = 3:5, names_to = "party", values_to = "score")

For one variable (for example the unemployment rate), we represent parallel boxplots of the semi-elasticities for each party.

For each party, we can also plot the semi-elasticities on a map

5 Comparaison between CoDa and Dirichlet regression

We compare the fitted values of our CoDa regression model with the fitted values of a Dirichlet model.

We compute the Dirichlet regression model by using DirichletReg package (see Maier, 2020). We then compute the fitted values in the simplex space:

To get the fitted values of the CoDa regression model, we first compute the fitted values in the ilr coordinates space and then go back to the simplex:

Finally, we plot the scatter plot of the fitted values for each party (Figure 4 in the article):

## png 
##   2

Comment: the fitted values of the two models appear to be very close.

We now compute the MSE of the CoDa regression model:

## [1] 0.1254843

We transform the fitted values of the Dirichlet model (in the simplex) to their counterpart in the ilr space and we compute the MSE:

## [1] 0.1812526

Comment: It appears that the MSE of the two models are comparable. It is slightly better for the CoDa model.

6 Predictions

6.1 Representing the predicted values with respect to a numeric variable: application with unemployment rate

We focus on the three departements: Ariege (“09”), Cantal (“15”) and Bas-Rhin (“67”). For each of these three departments, we fix all their characteristics except unemp_rate and we create a grid of fictive unemp_rate from 0 to \(25\%\) by increments of \(1\%\).

To compute the predictions for the fictive points, we proceed as follows:

  1. we compute the predicted values in the ilr coordinates space corresponding to the fictive points

  2. we go back to the simplex and compute the differences between the predictions and the initial fitted values for this particular departement.

To get Figure 5 in the article, we use the following code:

# postscript("./Figures/image.eps", width = 12, height = 5, paper = "special", 
#           horizontal = FALSE)
op <- par(mfrow = c(1, 3), las = 1, oma = c(0, 0, 0, 0), 
          mar = c(4, 4.1, 2, 1.2), mgp = c(2.4, 1, 0), cex.axis = 1.5,
          cex.lab = 1.5, cex.main = 1.6)
plot(unemp_rate, compPred_unemp_rate_s1[, 1], 
     ylab = as.expression(bquote(hat(Y))), 
     xlab = "Unemployment rate", 
     pch = 21, xlim = c(0, 0.25), xaxs = "i", bg = 2, 
     ylim = c(0, 1), yaxs = "i", col = "red", type = "l", lwd = 4)
legend("topleft", legend = c("Right", "Left", "XR"), cex = 1.2, 
       fill = c("royalblue", "red", "violet"))
title("Ariège")
lines(unemp_rate, compPred_unemp_rate_s1[,2], 
      col = "royalblue", type = "l", lty = 2, lwd = 4)
lines(unemp_rate, compPred_unemp_rate_s1[,3], 
      col = "violet", type = "l", lty = 3, lwd = 4)
abline(v = min(my_explan$unemp_rate), col = "grey", lty = 2)
abline(v = max(my_explan$unemp_rate), col = "grey", lty = 2)

plot(unemp_rate, compPred_unemp_rate_s2[,1], ylab = "", xlab = "Unemployment rate", pch = 21,
     xlim = c(0, 0.25), xaxs = "i", bg = 2, ylim = c(0, 1), 
     yaxs = "i", col = "red", type = "l", lwd = 4)
title("Cantal")
lines(unemp_rate, compPred_unemp_rate_s2[,2], col = "royalblue", 
      type = "l", lty = 2, lwd = 4)
lines(unemp_rate, compPred_unemp_rate_s2[,3], col = "violet", 
      type = "l", lty = 3, lwd = 4)
abline(v = min(my_explan$unemp_rate), col = "grey", lty = 2)
abline(v = max(my_explan$unemp_rate), col = "grey", lty = 2)

plot(unemp_rate, compPred_unemp_rate_s3[,1], ylab = "", xlab = "Unemployment rate", pch = 21,
     xlim = c(0, 0.25), xaxs = "i", bg = 2, ylim = c(0, 1), 
     yaxs = "i", col = "red", type = "l", lwd = 4)
title("Bas-Rhin")
lines(unemp_rate, compPred_unemp_rate_s3[,2], col = "royalblue", type = "l", lty = 2, lwd = 4)
lines(unemp_rate, compPred_unemp_rate_s3[,3], col = "violet", type = "l", lty = 3, lwd = 4)
abline(v = min(my_explan$unemp_rate), col = "grey", lty = 2)
abline(v = max(my_explan$unemp_rate), col = "grey", lty = 2)

To get Figure 6 in the article, we first need to compile the following local function which permits to represent one particular composition on a ternary diagram :

Then, we execute the following code:

colnames(compPred_unemp_rate_s1) <- c("Left", "Right", "XR")
colnames(compPred_unemp_rate_s2) <- c("Left", "Right", "XR")
colnames(compPred_unemp_rate_s3) <- c("Left", "Right", "XR")
# we prepare the nodes of the ternary diagram
A <- c(0, 0)
B <- c(1, 0)
C <- c(0.5, sqrt(3)/2)
# transform the predictions in categorical variables
my.palette <- brewer.pal(n = 9, name = "OrRd")
breaks_ilr1 <- classIntervals(new_data1$unemp_rate, n = 9, 
                 style = "quantile", intervalClosure = "left")
# postscript("./Figures/figure_4.eps", width = 12, height = 5, paper = "special", 
#           horizontal = FALSE)
op <- par(mfrow = c(1, 3), oma = c(0, 1, 2, 1), 
          mar = c(0, 0.5, 1.5, 0.5), cex.main = 1.6)
### Ternary diagram for Ariege
# transform the compositions in the cartesian coordinates
ariege_simplex_x <- compPred_unemp_rate_s1[, 1] * A[1] + 
  compPred_unemp_rate_s1[, 2] * B[1] + compPred_unemp_rate_s1[, 3] * C[1] 
ariege_simplex_y <- compPred_unemp_rate_s1[, 1] * A[2] +
  compPred_unemp_rate_s1[, 2] * B[2] + compPred_unemp_rate_s1[, 3] * C[2] 
# plot the ternary diagram
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F, 
     type = "n", xlab = "", ylab = "", asp = 1, main = "Ariège", 
     ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
legend("topleft", legend = paste0("[", round(breaks_ilr1$brks[1:9], 2), 
                                  ";", round(breaks_ilr1$brks[2:10], 2), "["), 
       cex = 1.4, fill = my.palette, title = "Unemployment rate", bty = "n")
legend("topright", legend = c("Observed", "Predicted"), 
       cex = 1.8, pch = c(15, 18), col = "green")
text(c(0, 1, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), 
     c("Left", "Right", "XR"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(ariege_simplex_x, ariege_simplex_y, pch = 16, 
       col = my.palette[findCols(breaks_ilr1)], cex = 1)
plot_1_compo(voteshare[22,], col = "green", bg="green", cex = 2, pch = 15)
plot_1_compo(compPred_unemp_rate_s1[53, ], col = "green", cex = 2.5, pch = 18)
lines(c(0.5, 0.5), c(0, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.25, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.75, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")

### Ternary diagram for Cantal
cantal_simplex_x <- compPred_unemp_rate_s2[, 1] * A[1] + 
  compPred_unemp_rate_s2[, 2] * B[1] + compPred_unemp_rate_s2[, 3] * C[1] 
cantal_simplex_y <- compPred_unemp_rate_s2[, 1] * A[2] + 
  compPred_unemp_rate_s2[, 2] * B[2] + compPred_unemp_rate_s2[, 3] * C[2] 

plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F, 
     type = "n", xlab = "", ylab = "", asp = 1, main = "Cantal", 
     ylim = c(-0.1, sqrt(3)/2), cex.main = 2)

text(c(0, 1, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), 
     c("Left", "Right", "XR"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")

points(cantal_simplex_x, cantal_simplex_y, pch = 16, 
       col = my.palette[findCols(breaks_ilr1)], cex = 1)

plot_1_compo(voteshare[16,], col = "green", bg="green", cex = 2, pch = 15)
plot_1_compo(compPred_unemp_rate_s2[32, ], col = "green", bg="green", cex = 2.5, pch = 18)

lines(c(0.5, 0.5), c(0, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.25, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.75, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")

### Ternary diagram for Bas-Rhin

basrhin_simplex_x <- compPred_unemp_rate_s3[, 1] * A[1] + 
  compPred_unemp_rate_s3[, 2] * B[1] + compPred_unemp_rate_s3[, 3] * C[1] 
basrhin_simplex_y <- compPred_unemp_rate_s3[, 1] * A[2] + 
  compPred_unemp_rate_s3[, 2] * B[2] + compPred_unemp_rate_s3[, 3] * C[2] 

plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F, 
     type = "n", xlab = "", ylab = "", asp = 1, main = "Bas-Rhin", 
     ylim = c(-0.1, sqrt(3)/2), cex.main = 2)

text(c(0, 1, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), 
     c("Left", "Right", "XR"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")

points(basrhin_simplex_x, basrhin_simplex_y, pch = 16, 
       col = my.palette[findCols(breaks_ilr1)], cex = 1)

plot_1_compo(voteshare[10,], col = "green", bg="green", cex = 2, pch = 15)
plot_1_compo(compPred_unemp_rate_s3[62, ], col = "green", bg="green", cex = 2.5, pch = 18)

lines(c(0.5, 0.5), c(0, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.25, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.75, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")

6.2 Representing the predicted values with respect to a categorical variable: application with unemployment rate

All explanatory variables are fixed to the value of the given department except Diploma. We create of grid of fictive points in the Diploma triangle and compute the predicted shares at each point of this grid. However since it is impossible to plot a function from the simplex to the simplex, we choose to summarize the predicted shares by the winning party (corresponding to the highest share) and color the triangle in the Diploma space according to the winning party color.

# we fix the covariates
covariate_fix_1c <- my_explan[
  dep.2015.spdf$NOM_DEPT == "ARIEGE",
  c("Age_ilr1", "Age_ilr2", 
    "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
    "income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]
covariate_fix_2c <- my_explan[
  dep.2015.spdf$NOM_DEPT == "CANTAL",
  c("Age_ilr1", "Age_ilr2", 
    "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
    "income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]
covariate_fix_3c <- my_explan[
  dep.2015.spdf$NOM_DEPT == "BAS-RHIN",
  c("Age_ilr1", "Age_ilr2", 
    "Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
    "income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]
# create a grid of shares for Diploma
x_seq <- seq(0.01, 0.99, 0.01)
x_Diploma <- data.frame(No_CAPBEP = rep(x_seq, 101), 
                     bac = rep(x_seq, each = 101))
x_Diploma$diplom_sup <- 1 - (x_Diploma$bac + x_Diploma$No_CAPBEP)
x_Diploma <- x_Diploma[x_Diploma$diplom_sup > 0, ]
# transform the grid in the ilr
x_Diploma_ilr <- ilr(x_Diploma, V = t(V_dip))
colnames(x_Diploma_ilr) = c("Diploma_ilr1", "Diploma_ilr2")
new_data1c <- cbind(covariate_fix_1c, x_Diploma_ilr)
new_data2c <- cbind(covariate_fix_2c, x_Diploma_ilr)
new_data3c <- cbind(covariate_fix_3c, x_Diploma_ilr)
# prediction
pred_y_dip_1  <- predict(model_complete, new = new_data1c)
pred_y_dip_2  <- predict(model_complete, new = new_data2c)
pred_y_dip_3  <- predict(model_complete, new = new_data3c)

# convert vers le simplex
compPred_dip_s1 <- ilrInv(pred_y_dip_1, V = t(V_votes))
compPred_dip_s2 <- ilrInv(pred_y_dip_2, V = t(V_votes))
compPred_dip_s3 <- ilrInv(pred_y_dip_3, V = t(V_votes))

To get Figure 7, we use the following code:

Diploma_simplex_x <- x_Diploma[, 1] * A[1] + x_Diploma[, 2] * B[1] + x_Diploma[, 3] * C[1] 
Diploma_simplex_y <- x_Diploma[, 1] * A[2] + x_Diploma[, 2] * B[2] + x_Diploma[, 3] * C[2] 
Diploma_depart_x <- Diploma[, 1] * A[1] + Diploma[, 2] * B[1] + Diploma[, 3] * C[1] 
Diploma_depart_y <- Diploma[, 1] * A[2] + Diploma[, 2] * B[2] + Diploma[, 3] * C[2] 

# postscript("./Figures/test.eps", width = 12, height = 5, paper = "special", 
#           horizontal = FALSE)
# Ternary diagram for Ariège
op <- par(mfrow = c(1, 3), oma = c(0, 1.5, 2, 1), mar = c(0, 0.7, 1.5, 0.7))
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F, 
     type = "n", xlab = "", ylab = "", asp = 1, main = "Ariège", 
     ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
legend("topleft", legend = c("Right", "Left", "XR"), cex = 1.5, 
       fill = c("royalblue", "red", "violet"))
text(c(0.05, 0.95, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), 
     c("<BAC", "BAC", "SUP"), pos = 3, 
     cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(Diploma_simplex_x, Diploma_simplex_y, pch = 16, 
       col = c("red", "royalblue", "violet")[apply(compPred_dip_s1, 1, which.max)], 
       cex = 1)
points(Diploma_depart_x, Diploma_depart_y, pch = 16, cex = 1.2)
# Ternary diagram for Cantal
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F, 
     type = "n", xlab = "", ylab = "", asp = 1, main = "Cantal", ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
text(c(0.05, 0.95, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), c("<BAC", "BAC", "SUP"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(Diploma_simplex_x, Diploma_simplex_y, pch = 16, 
       col = c("red", "royalblue", "violet")[apply(compPred_dip_s2, 1, which.max)], 
       cex = 1)
points(Diploma_depart_x, Diploma_depart_y, pch = 16, cex = 1.2)
# Ternary diagram for Bas-Rhin
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F, 
     type = "n", xlab = "", ylab = "", asp = 1, main = "Bas-Rhin", ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
text(c(0.05, 0.95, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), c("<BAC", "BAC", "SUP"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(Diploma_simplex_x, Diploma_simplex_y, pch = 16, 
       col = c("red", "royalblue", "violet")[apply(compPred_dip_s3, 1, which.max)], 
       cex = 1)
points(Diploma_depart_x, Diploma_depart_y, pch = 16, cex = 1.2)

7 Bibliography