We provide the data and the R code used in the article “Analyzing the impacts of socio-economic factors on French departmental elections with CoDa methods” so that readers may reproduce all the figures, tables and statistics presented in the article with the R software. The pdf version of the supplementary variable is available here.
If you use this code, please cite:
Nguyen T.H.A, Thomas-Agnan C., Laurent T. and A. Ruiz-Gazen (2020). Analyzing the impacts of socio-economic factors on French departmental elections with CoDa methods. \(\color{red}{\text{TSE WP}}\).
Required packages:
install.packages(c("classInt", "compositions", "DirichletReg", "ggplot2",
"ggtern", "gridExtra", "RColorBrewer", "sp"))
Loading packages:
require("classInt") # transform numeric variable to categorical
require("compositions") # compositional data
require("DirichletReg") # Dirichlet regression model
require("ggplot2") # graph a la ggplot2
require("gridExtra") # to include several plot in one figure
require("RColorBrewer") # palette of colors
require("sp") # spatial data
require("ggtern") # ternary diagram with ggplot2
Information about the current R session :
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
## LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
##
## locale:
## [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
## [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
## [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggtern_3.3.0 sp_1.4-2 RColorBrewer_1.1-2 gridExtra_2.3
## [5] ggplot2_3.3.2 DirichletReg_0.7-0 Formula_1.2-3 compositions_2.0-0
## [9] classInt_0.4-3
##
## loaded via a namespace (and not attached):
## [1] zoo_1.8-8 tidyselect_1.1.0 xfun_0.15 purrr_0.3.4
## [5] lattice_0.20-41 latex2exp_0.4.0 colorspace_1.4-1 vctrs_0.3.2
## [9] generics_0.0.2 htmltools_0.5.0 yaml_2.2.1 rlang_0.4.7
## [13] e1071_1.7-3 pillar_1.4.6 glue_1.4.1 withr_2.2.0
## [17] lifecycle_0.2.0 plyr_1.8.6 robustbase_0.93-6 stringr_1.4.0
## [21] munsell_0.5.0 gtable_0.3.0 evaluate_0.14 knitr_1.29
## [25] miscTools_0.6-26 class_7.3-17 DEoptimR_1.0-8 proto_1.0.0
## [29] Rcpp_1.0.5 KernSmooth_2.23-17 scales_1.1.1 maxLik_1.4-4
## [33] tensorA_0.36.1 digest_0.6.25 stringi_1.4.6 dplyr_1.0.0
## [37] grid_4.0.3 tools_4.0.3 sandwich_2.5-1 magrittr_1.5
## [41] tibble_3.0.3 crayon_1.3.4 pkgconfig_2.0.3 MASS_7.3-53
## [45] ellipsis_0.3.1 bayesm_3.1-4 rmarkdown_2.3 R6_2.4.1
## [49] compiler_4.0.3
We load the data basis:
The data is presented as a Spatial object (see Pebesma and Bivand, 2005, for more informations). The vote shares correspond to the variables left, right, others. We print the first 5 rows of the data:
## NOM_DEPT left right others
## 40 JURA 0.3376844 0.4035081 0.2588076
## 43 LOIRE 0.3369003 0.3593964 0.3037033
## 77 SEINE-MARITIME 0.4236827 0.3119438 0.2643735
## 90 YONNE 0.2852518 0.3853767 0.3293715
## 69 HAUT-RHIN 0.1777443 0.5126041 0.3096516
We drop the departement of Paris which is not concerned by the election
We use the acomp class of object from the compositions package which provides the means to analyse compositions in the framework of the Aitchison Simplex. We do it for :
# Age
Age <- compositions::acomp(data.frame(
age_1839 = dep.2015.spdf@data[, "age_1824"] + dep.2015.spdf@data[, "age_2540"],
age_4064 = dep.2015.spdf@data[, "age_4055"] + dep.2015.spdf@data[, "age_5564"],
age_65 = dep.2015.spdf@data[, "age_65"]))
# Diploma
Diploma <- compositions::acomp(data.frame(
No_CAPBEP = dep.2015.spdf@data[, "no_diplom"] + dep.2015.spdf@data[, "capbep"],
bac = dep.2015.spdf@data$bac,
diplom_sup = dep.2015.spdf@data$diplom_sup))
# Employment
Employment <- compositions::acomp(dep.2015.spdf@data[, c("AZ", "BE", "FZ", "GU", "OQ")])
Then we use the isometric log ratio transform thanks to the ilr() function. We do it for:
V_votes <- rbind(c(2/sqrt(6), -1/sqrt(6), -1/sqrt(6)),
c(0, 1/sqrt(2), -1/sqrt(2)))
y_ilr <- compositions::ilr(dep.2015.spdf@data[, c("left" , "right", "others")],
V = t(V_votes))
ilr_Age <- compositions::ilr(Age)
V_dip <- rbind(c(1/sqrt(6), 1/sqrt(6), -2/sqrt(6)),
c(1/sqrt(2), -1/sqrt(2), 0))
ilr_Diploma <- compositions::ilr(Diploma, V = t(V_dip))
V_employ <- rbind(c(-1/2/sqrt(5), -1/2/sqrt(5), -1/2/sqrt(5), -1/2/sqrt(5), 2/sqrt(5)),
c(-sqrt(3)/6, -sqrt(3)/6, -sqrt(3)/6, sqrt(3/4), 0),
c(-sqrt(2/3)/2, -sqrt(2/3)/2, sqrt(2/3), 0, 0),
c(sqrt(1/2), -sqrt(1/2), 0, 0, 0))
ilr_Employment <- compositions::ilr(Employment, V = t(V_employ))
Finally, we create the data.frame that includes all the explanatory variables:
the ilr of the compositions Age, Diploma and Employment
the numeric variables unemp_rate, employ_evol, owner_rate, income_rate, foreign,
my_explan <- data.frame(ilr_Age, ilr_Diploma, ilr_Employment,
as.matrix(dep.2015.spdf@data[, c("unemp_rate", "employ_evol",
"owner_rate", "income_rate",
"foreign")]))
colnames(my_explan) <- c("Age_ilr1", "Age_ilr2",
"Diploma_ilr1", "Diploma_ilr2",
"Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
"unemp_rate", "employ_evol", "owner_rate", "income_rate", "foreign")
The following codes yields Table 2 in the article.
## left right others
## 0.368 0.384 0.248
## age_1839 age_4064 age_65
## 0.314 0.429 0.257
## No_CAPBEP bac diplom_sup
## 0.589 0.169 0.242
## AZ BE FZ GU OQ
## 0.044 0.100 0.048 0.432 0.375
sapply(my_explan[, c("unemp_rate", "employ_evol", "owner_rate", "income_rate",
"foreign")], function(x) round(c(mean = mean(x), sd = sd(x)), 3))
## unemp_rate employ_evol owner_rate income_rate foreign
## mean 0.117 -0.145 0.616 0.552 0.050
## sd 0.017 0.578 0.062 0.057 0.031
To obtain Figure 1 in the article, we use the following code:
names(voteshare) <- c("left", "right", "xr")
ariege <- as.numeric(voteshare[dep.2015.spdf$NOM_DEPT == "ARIEGE", ]) # Ariège, max Left
cantal <- as.numeric(voteshare[dep.2015.spdf$NOM_DEPT == "CANTAL", ]) # Cantal, max Right
bas_rhin <- as.numeric(voteshare[dep.2015.spdf$NOM_DEPT == "BAS-RHIN", ]) # Bas-Rhin, min Left
# ternary diagram ggplot2 style
p3 <- ggtern(data = as.data.frame(voteshare), aes(x = left, z = right, y = xr)) +
geom_point()
g <- p3 + theme(legend.position = c(0, 1), legend.justification = c(1, 1)) +
theme_rgbw() +
labs(x = "LEFT", z = "RIGHT", y = "XR") +
geom_point(aes(x = ariege[1], z = ariege[2], y = ariege[3]),
shape = 22, color = "red", size = 2) +
geom_point(aes(x = cantal[1], z = cantal[2], y = cantal[3] ),
shape = 21, color = "purple", size = 2) +
geom_point(aes(x = bas_rhin[1], z = bas_rhin[2], y = bas_rhin[3]),
shape = 24, color = "blue", size = 2)
g
# dev.off()
# ggsave("./Figures/voteshares_3dep.eps", g, width = 6, height = 6)
To obtain Figure 2 in the article, we use the following code:
# calculate the geometric mean of diploma
Diploma_mean_ge <- as.numeric(mean(Diploma))
# ternary diagram ggplot2 style
p3 <- ggtern(data = as.data.frame(Diploma), aes(x = No_CAPBEP, y = diplom_sup, z = bac)) +
geom_point()
p3 <- p3 + theme(legend.position = c(0, 1),
legend.justification = c(1, 1)) +
theme_rgbw() +
labs(x = "<BAC", y = "SUP", z = "BAC") +
geom_point(aes(x = Diploma_mean_ge[1] , y = Diploma_mean_ge[3] , z = Diploma_mean_ge[2]),
shape = 17, color = c("red"), size = 2)
# calculate the geometric mean of Age
Age_mean_ge <- as.numeric(mean(Age))
p4 <- ggtern(data = as.data.frame(Age), aes(x = age_1839, y = age_65, z = age_4064)) +
geom_point()
p4 <- p4 + theme(legend.position = c(0, 1), legend.justification = c(1, 1)) +
theme_rgbw() +
labs(x = "18-39", y = ">65", z = "40-64") +
geom_point(aes(x = Age_mean_ge[1] , y = Age_mean_ge[3] , z = Age_mean_ge[2]),
shape = 17, color = c("red"), size = 2)
g <- grid.arrange(p3, p4, ncol = 2)
# ggsave("./Figures/comp_X.eps", g, width = 10, height = 5)
We now fit a CoDa regression model describing the impacts of socio-economic factors on vote shares in the 2015 French departmental election. We include all explanatory variables from our data set in the regression model.
model_complete <- lm(y_ilr ~ Age_ilr1 + Age_ilr2 +
Diploma_ilr1 + Diploma_ilr2 +
Employ_ilr1 + Employ_ilr2 + Employ_ilr3 + Employ_ilr4 +
income_rate + unemp_rate + employ_evol +
owner_rate + foreign, data = my_explan)
We obtain estimated parameters and their standard deviations (in brackets) presented in Table 3:
mod_ilr_1 <- lm(y_ilr[, 1] ~ Age_ilr1 + Age_ilr2 +
Diploma_ilr1 + Diploma_ilr2 +
Employ_ilr1 + Employ_ilr2 + Employ_ilr3 + Employ_ilr4 +
income_rate + unemp_rate + employ_evol + owner_rate + foreign,
data = my_explan)
mod_ilr_2 <- lm(y_ilr[, 2] ~ Age_ilr1 + Age_ilr2 +
Diploma_ilr1 + Diploma_ilr2 +
Employ_ilr1 + Employ_ilr2 + Employ_ilr3 + Employ_ilr4 +
income_rate + unemp_rate + employ_evol + owner_rate + foreign,
data = my_explan)
stargazer::stargazer(mod_ilr_1,
mod_ilr_2,
coef = list(mod_ilr_1$coefficients, mod_ilr_2$coefficients),
p = list(coef(summary(mod_ilr_1))[, 4], coef(summary(mod_ilr_2))[, 4]),
digits = 3,
type = "html")
Dependent variable: | ||
y_ilr[, 1] | y_ilr[, 2] | |
(1) | (2) | |
Age_ilr1 | 0.577 | 0.903 |
(1.050) | (0.729) | |
Age_ilr2 | 0.478 | 0.143 |
(0.542) | (0.376) | |
Diploma_ilr1 | 0.091 | 1.124*** |
(0.573) | (0.398) | |
Diploma_ilr2 | -2.605*** | -1.124 |
(0.981) | (0.681) | |
Employ_ilr1 | 0.436 | 0.079 |
(0.378) | (0.262) | |
Employ_ilr2 | 0.718* | 0.633** |
(0.412) | (0.286) | |
Employ_ilr3 | -0.859** | -0.239 |
(0.377) | (0.261) | |
Employ_ilr4 | -0.085 | 0.106 |
(0.141) | (0.098) | |
income_rate | -3.788** | -0.916 |
(1.486) | (1.031) | |
unemp_rate | -2.379 | -20.664*** |
(3.344) | (2.321) | |
employ_evol | 0.148 | 0.027 |
(0.112) | (0.078) | |
owner_rate | 1.670 | -3.021*** |
(1.310) | (0.909) | |
foreign | 2.696 | -0.452 |
(2.267) | (1.573) | |
Constant | 1.638 | 4.523*** |
(1.744) | (1.211) | |
Observations | 95 | 95 |
R2 | 0.396 | 0.699 |
Adjusted R2 | 0.300 | 0.651 |
Residual Std. Error (df = 81) | 0.315 | 0.219 |
F Statistic (df = 13; 81) | 4.093*** | 14.467*** |
Note: | p<0.1; p<0.05; p<0.01 |
We now transform the parameters from the ilr space to the simplex (these results are not presented in the article). For classical (numeric) variables, it can be done in one step by applying the ilr-inv transformation (formula(3), line 1 in the article) :
coeff_simplex_num <- compositions::ilrInv(
coefficients(model_complete)[c("(Intercept)", "income_rate",
"unemp_rate", "employ_evol",
"owner_rate", "foreign"), ], V = t(V_votes))
For compositional variables, we apply formula(3), line 2 in the article:
coeff_simplex_Age <- t(V_votes) %*% t(coefficients(model_complete)[c("Age_ilr1", "Age_ilr2"),]) %*%
t(ilrBase(D = 3))
coeff_simplex_Diploma <- t(V_votes) %*% t(coefficients(model_complete)[c("Diploma_ilr1", "Diploma_ilr2"),]) %*%
V_dip
coeff_simplex_Employ <- t(V_votes) %*% t(coefficients(model_complete)[c("Employ_ilr1", "Employ_ilr2",
"Employ_ilr3", "Employ_ilr4"),]) %*% V_employ
coeff_simplex <- rbind(as.vector(coeff_simplex_num["(Intercept)", ]),
t(coeff_simplex_Age),
t(coeff_simplex_Diploma),
t(coeff_simplex_Employ),
as.matrix(coeff_simplex_num[c("income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign"), ])
)
Finally, the estimates of the parameters in the simplex are:
dimnames(coeff_simplex)[[1]] <- c("(Intercept)",
"Age18-39", "Age40-64", "Age>65",
"Dip<BAC", "DipBAC", "DipSUP",
"EmpAZ", "EmpBE", "EmpFZ", "EmpGU", "EmpOQ",
"income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign"
)
dimnames(coeff_simplex)[[2]] <- c("Left", "Right", "XR")
coeff_simplex
## Left Right XR
## (Intercept) 2.326682e-01 7.660536e-01 0.001278181
## Age18-39 -4.928355e-01 -2.462211e-01 0.739056578
## Age40-64 1.739983e-01 3.232336e-01 -0.497231909
## Age>65 3.188372e-01 -7.701254e-02 -0.241824669
## Dip<BAC -1.473621e+00 4.993120e-01 0.974309151
## DipBAC 1.534263e+00 1.195657e-01 -1.653828839
## DipSUP -6.064201e-02 -6.188777e-01 0.679519688
## EmpAZ -1.184365e-02 -1.392683e-02 0.025770486
## EmpBE 8.645126e-02 -1.688948e-01 0.082443522
## EmpFZ -8.215997e-01 1.312813e-01 0.690318339
## EmpGU 4.282503e-01 1.609525e-01 -0.589202881
## EmpOQ 3.187417e-01 -1.094122e-01 -0.209329466
## income_rate 3.955136e-03 2.141970e-01 0.781847877
## unemp_rate 2.447467e-08 2.034124e-13 0.999999976
## employ_evol 3.747540e-01 3.185137e-01 0.306732278
## owner_rate 4.740510e-01 7.239864e-03 0.518709138
## foreign 9.281809e-01 2.480719e-02 0.047011883
We apply a bootstrap procedure:
For \(b\) from \(1\) to \(B\) (we choose \(B=1000\) here)
We sample with replacement the original data set
We make the estimations in the simplex
We go back to the simplex to obtain the parameters in the simplex
We do inference on the bootstrapped parameters obtained for the numeric variable
B <- 1000
coeff_simplex_B <- data.frame(b = character(0),
var = character(0),
Left = numeric(0),
Right = numeric(0),
XR = numeric(0))
for (b in 1:B) {
set.seed(b)
id_b <- sample(1:nrow(my_explan), replace = T)
model_complete_b <- lm(y_ilr[id_b, ] ~ Age_ilr1 + Age_ilr2 +
Diploma_ilr1 + Diploma_ilr2 +
Employ_ilr1 + Employ_ilr2 + Employ_ilr3 + Employ_ilr4 +
income_rate + unemp_rate + employ_evol +
owner_rate + foreign, data = my_explan[id_b, ])
coeff_simplex_b <- compositions::ilrInv(coefficients(model_complete_b), V = t(V_votes))
coeff_simplex_b <- rbind(coeff_simplex_b["(Intercept)", ],
cbind(compositions::ilrInv(coeff_simplex_b[c("Age_ilr1", "Age_ilr2"), 1]),
compositions::ilrInv(coeff_simplex_b[c("Age_ilr1", "Age_ilr2"), 2]),
compositions::ilrInv(coeff_simplex_b[c("Age_ilr1", "Age_ilr2"), 3])),
cbind(compositions::ilrInv(coeff_simplex_b[c("Diploma_ilr1", "Diploma_ilr2"), 1], V = t(V_dip)),
compositions::ilrInv(coeff_simplex_b[c("Diploma_ilr1", "Diploma_ilr2"), 2], V = t(V_dip)),
compositions::ilrInv(coeff_simplex_b[c("Diploma_ilr1", "Diploma_ilr2"), 3], V = t(V_dip))),
cbind(compositions::ilrInv(coeff_simplex_b[c("Employ_ilr1", "Employ_ilr2",
"Employ_ilr3", "Employ_ilr4"), 1], V = t(V_employ)),
compositions::ilrInv(coeff_simplex_b[c("Employ_ilr1", "Employ_ilr2",
"Employ_ilr3", "Employ_ilr4"), 2], V = t(V_employ)),
compositions::ilrInv(coeff_simplex_b[c("Employ_ilr1", "Employ_ilr2",
"Employ_ilr3", "Employ_ilr4"), 3], V = t(V_employ))),
coeff_simplex_b[c("income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign"), ]
)
dimnames(coeff_simplex_b)[[2]] <- c("Left", "Right", "XR")
coeff_simplex_b <- as.data.frame(coeff_simplex_b)
coeff_simplex_b$b <- as.character(b)
coeff_simplex_b$var <- c("(Intercept)",
"Age18-39", "Age40-64", "Age>65",
"Dip<BAC", "DipBAC", "DipSUP",
"EmpAZ", "EmpBE", "EmpFZ", "EmpGU", "EmpOQ",
"income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign"
)
coeff_simplex_B <- rbind(coeff_simplex_B, coeff_simplex_b)
}
# we tidy the data
coeff_simplex_B_longer <- tidyr::pivot_longer(coeff_simplex_B,
cols = c("Left", "Right", "XR"),
names_to = "parti",
values_to = "coeff")
We plot the distribution fot the variables employ_evol, income_rate and unemp_rate:
true_coeff <- data.frame(parti = c("Left", "Right", "XR"),
y = 0,
coeff = as.numeric(coeff_simplex["income_rate", ]))
ggplot2::ggplot(dplyr::filter(coeff_simplex_B_longer, var == "income_rate"), aes(x = coeff)) +
geom_histogram() +
geom_vline(aes(xintercept = 1/3), color = "blue", linetype = "dashed", size = 1) +
facet_wrap(vars(parti), ncol = 3) +
geom_point(data = true_coeff, aes(x = coeff, y = y), col = "red", size = 4, shape = 4)
We can also plot the ternary diagram of the boostrapped parameters for these three variables (figure 3 in the article):
We compute the \(p\)-values for each numeric variable
p_values_intercept <- c(length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "Left"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "Right"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "XR"),
"coeff"] > 1/3))) / B)
p_values_intercept <- 2 * pmin(p_values_intercept, 1 - p_values_intercept)
p_values_income_rate <- c(length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "Left"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "Right"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "XR"),
"coeff"] > 1/3))) / B)
p_values_income_rate <- 2 * pmin(p_values_income_rate, 1 - p_values_income_rate)
p_values_unemp_rate <- c(length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "Left"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "Right"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "XR"),
"coeff"] > 1/3))) / B)
p_values_unemp_rate <- 2 * pmin(p_values_unemp_rate, 1 - p_values_unemp_rate)
p_values_employ_evol <- c(length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "Left"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "Right"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "XR"),
"coeff"] > 1/3))) / B)
p_values_employ_evol <- 2 * pmin(p_values_employ_evol, 1 - p_values_employ_evol)
p_values_owner_rate <- c(length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "Left"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "Right"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "XR"),
"coeff"] > 1/3))) / B)
p_values_owner_rate <- 2 * pmin(p_values_owner_rate, 1 - p_values_owner_rate)
p_values_foreign <- c(length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "Left"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "Right"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "XR"),
"coeff"] > 1/3))) / B)
p_values_foreign <- 2 * pmin(p_values_foreign, 1 - p_values_foreign)
p_values <- rbind(p_values_intercept, p_values_income_rate, p_values_unemp_rate,
p_values_employ_evol, p_values_owner_rate, p_values_foreign)
signf <- ifelse(p_values < 0.01, "(***)",
ifelse(p_values < 0.05, "(**)",
ifelse(p_values < 0.1, "(*)", "")))
p_values <- matrix(paste0("(", p_values, ")"), nrow(p_values), ncol(p_values))
We finally print the parameters in the simplex and their significance (Table 4 in the article):
coefficients_numeric <- cbind(
paste0(round(coeff_simplex[c("(Intercept)", "income_rate", "unemp_rate",
"employ_evol", "owner_rate", "foreign"), 1], 4), signf[, 1]),
paste0(round(coeff_simplex[c("(Intercept)", "income_rate", "unemp_rate",
"employ_evol", "owner_rate", "foreign"), 2], 4), signf[, 2]),
paste0(round(coeff_simplex[c("(Intercept)", "income_rate", "unemp_rate",
"employ_evol", "owner_rate", "foreign"), 3], 4), signf[, 3])
)
colnames(coefficients_numeric) <- c("Left", "Right", "XR")
rownames(coefficients_numeric) <- c("(Intercept)", "income_rate", "unemp_rate",
"employ_evol", "owner_rate", "foreign")
coefficients_numeric
## Left Right XR
## (Intercept) "0.2327" "0.7661" "0.0013(***)"
## income_rate "0.004(**)" "0.2142" "0.7818"
## unemp_rate "0(***)" "0(***)" "1(***)"
## employ_evol "0.3748" "0.3185" "0.3067"
## owner_rate "0.4741" "0.0072(***)" "0.5187"
## foreign "0.9282" "0.0248" "0.047"
To get the pseudo-significance, another approach is to simulate the beta-star parameters in the ilr space with respect to their distribution. Then we go back back to the simplex to obtain their distribution in the simplex.
mean_beta_ilr_1 <- coefficients(mod_ilr_1)
var_beta_ilr_1 <- summary(mod_ilr_1)$sigma ^ 2 * solve(crossprod(cbind(1,
as.matrix(my_explan[, c("Age_ilr1", "Age_ilr2",
"Diploma_ilr1", "Diploma_ilr2",
"Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
"income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]))))
mean_beta_ilr_2 <- coefficients(mod_ilr_2)
var_beta_ilr_2 <- summary(mod_ilr_2)$sigma ^ 2 * solve(crossprod(cbind(1,
as.matrix(my_explan[, c("Age_ilr1", "Age_ilr2",
"Diploma_ilr1", "Diploma_ilr2",
"Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
"income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]))))
set.seed(1)
coeff_simplex_b <- compositions::ilrInv(
cbind(
as.vector(t(mvtnorm::rmvnorm(1000, mean_beta_ilr_1, var_beta_ilr_1))),
as.vector(t(mvtnorm::rmvnorm(1000, mean_beta_ilr_2, var_beta_ilr_2)))
), V = t(V_votes))
coeff_simplex_B <- as.data.frame(coeff_simplex_b)
names(coeff_simplex_B) = c("Left", "Right", "XR")
coeff_simplex_B$var <- rep(c("(Intercept)", "Age_ilr1", "Age_ilr2",
"Diploma_ilr1", "Diploma_ilr2",
"Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
"income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign"),
times = 1000)
coeff_simplex_B_longer <- tidyr::pivot_longer(coeff_simplex_B,
cols = 1:3,
names_to = "parti",
values_to = "coeff")
We plot the distribution for the variables employ_evol, income_rate and unemp_rate:
true_coeff <- data.frame(parti = c("Left", "Right", "XR"),
y = 0,
coeff = as.numeric(coeff_simplex["income_rate", ]))
ggplot2::ggplot(dplyr::filter(coeff_simplex_B_longer, var == "income_rate"), aes(x = coeff)) +
geom_histogram() +
geom_vline(aes(xintercept = 1/3), color = "blue", linetype = "dashed", size = 1) +
facet_wrap(vars(parti), ncol = 3) +
geom_point(data = true_coeff, aes(x = coeff, y = y), col = "red", size = 4, shape = 4)
We can also plot the ternary diagram of the boostrapped parameters for these three variables:
We compute the \(p\)-values for each numeric variable
p_values_intercept <- c(length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "Left"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "Right"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "(Intercept)") & (coeff_simplex_B_longer$parti == "XR"),
"coeff"] > 1/3))) / B)
p_values_intercept <- 2 * pmin(p_values_intercept, 1 - p_values_intercept)
p_values_income_rate <- c(length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "Left"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "Right"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "income_rate") & (coeff_simplex_B_longer$parti == "XR"),
"coeff"] > 1/3))) / B)
p_values_income_rate <- 2 * pmin(p_values_income_rate, 1 - p_values_income_rate)
p_values_unemp_rate <- c(length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "Left"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "Right"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "unemp_rate") & (coeff_simplex_B_longer$parti == "XR"),
"coeff"] > 1/3))) / B)
p_values_unemp_rate <- 2 * pmin(p_values_unemp_rate, 1 - p_values_unemp_rate)
p_values_employ_evol <- c(length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "Left"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "Right"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "employ_evol") & (coeff_simplex_B_longer$parti == "XR"),
"coeff"] > 1/3))) / B)
p_values_employ_evol <- 2 * pmin(p_values_employ_evol, 1 - p_values_employ_evol)
p_values_owner_rate <- c(length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "Left"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "Right"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "owner_rate") & (coeff_simplex_B_longer$parti == "XR"),
"coeff"] > 1/3))) / B)
p_values_owner_rate <- 2 * pmin(p_values_owner_rate, 1 - p_values_owner_rate)
p_values_foreign <- c(length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "Left"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "Right"),
"coeff"] > 1/3))) / B,
length(which((coeff_simplex_B_longer[
(coeff_simplex_B_longer$var == "foreign") & (coeff_simplex_B_longer$parti == "XR"),
"coeff"] > 1/3))) / B)
p_values_foreign <- 2 * pmin(p_values_foreign, 1 - p_values_foreign)
p_values <- rbind(p_values_intercept, p_values_income_rate, p_values_unemp_rate,
p_values_employ_evol, p_values_owner_rate, p_values_foreign)
signf <- ifelse(p_values < 0.01, "(***)",
ifelse(p_values < 0.05, "(**)",
ifelse(p_values < 0.1, "(*)", "")))
p_values <- matrix(paste0("(", p_values, ")"), nrow(p_values), ncol(p_values))
We finally print the parameters in the simplex and their significance:
coefficients_numeric <- cbind(
paste0(round(coeff_simplex[c("(Intercept)", "income_rate", "unemp_rate",
"employ_evol", "owner_rate", "foreign"), 1], 4), signf[, 1]),
paste0(round(coeff_simplex[c("(Intercept)", "income_rate", "unemp_rate",
"employ_evol", "owner_rate", "foreign"), 2], 4), signf[, 2]),
paste0(round(coeff_simplex[c("(Intercept)", "income_rate", "unemp_rate",
"employ_evol", "owner_rate", "foreign"), 3], 4), signf[, 3])
)
colnames(coefficients_numeric) <- c("Left", "Right", "XR")
rownames(coefficients_numeric) <- c("(Intercept)", "income_rate", "unemp_rate",
"employ_evol", "owner_rate", "foreign")
coefficients_numeric
## Left Right XR
## (Intercept) "0.2327" "0.7661" "0.0013(***)"
## income_rate "0.004(**)" "0.2142" "0.7818"
## unemp_rate "0(***)" "0(***)" "1(***)"
## employ_evol "0.3748" "0.3185" "0.3067"
## owner_rate "0.4741" "0.0072(***)" "0.5187"
## foreign "0.9282" "0.0248" "0.047"
To compute the approximate semi-elasticities with respect to the numeric variables, we proceed as follows:
we compute the fitted values in the ilr coordinates space
we increase the numeric variable by \(1\) unit (to be defined depending on the variable), and we compute the corresponding predicted value in the ilr coordinates space
we go back to the simplex and compute the differences between the predictions and the initial fitted values.
For the explanatory variables which are compositional, at step 2, when we increase a component of a composition, we remove a constant to all other components keeping the same total.
# fitted values
fitted_ilr <- predict(model_complete)
# go back to the simplex
fitted_simplex <- ilrInv(fitted_ilr, V = t(V_votes))
# we store all the elasticities
elasticities <- data.frame(left = numeric(0),
right = numeric(0),
xr = numeric(0))
# Numeric variables
for (k in c("income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")) {
# prediction when increasing by one 1% the numeric variable
my_explan_predict <- my_explan
my_explan_predict[, k] <- my_explan_predict[, k] + .01
predicted_ilr <- predict(model_complete, newdata = my_explan_predict)
predicted_simplex <- ilrInv(predicted_ilr, V = t(V_votes))
# differences between the shares
diff_simplex_left <- predicted_simplex[, 1] - fitted_simplex[, 1]
diff_simplex_right <- predicted_simplex[, 2] - fitted_simplex[, 2]
diff_simplex_xr <- predicted_simplex[, 3] - fitted_simplex[, 3]
elasticities <- rbind(elasticities,
data.frame(variable = k,
dep = row.names(my_explan),
left = diff_simplex_left,
right = diff_simplex_right,
xr = diff_simplex_xr))
}
# CoDa explanatory variables
# for Age
for (k in c("age_1839", "age_4064", "age_65")) {
my_explan_predict <- my_explan
Age_predict <- Age
Age_predict[, k] <- Age_predict[, k] + 0.01
Age_predict[, !(c("age_1839", "age_4064", "age_65") %in% k)] <-
Age_predict[, !(c("age_1839", "age_4064", "age_65") %in% k)] - 0.01 / 2
my_explan_predict[, c("Age_ilr1", "Age_ilr2")] <- ilr(Age_predict)
predicted_ilr <- predict(model_complete, newdata = my_explan_predict)
predicted_simplex <- ilrInv(predicted_ilr, V = t(V_votes))
# differences between the shares
diff_simplex_left <- predicted_simplex[, 1] - fitted_simplex[, 1]
diff_simplex_right <- predicted_simplex[, 2] - fitted_simplex[, 2]
diff_simplex_xr <- predicted_simplex[, 3] - fitted_simplex[, 3]
elasticities <- rbind(elasticities,
data.frame(
variable = k,
dep = row.names(my_explan),
left = diff_simplex_left,
right = diff_simplex_right,
xr = diff_simplex_xr))
}
# for Diploma
for (k in c("No_CAPBEP", "bac", "diplom_sup")) {
my_explan_predict <- my_explan
Diploma_predict <- Diploma
Diploma_predict[, k] <- Diploma_predict[, k] + 0.01
Diploma_predict[, !(c("No_CAPBEP", "bac", "diplom_sup") %in% k)] <-
Diploma_predict[, !(c("No_CAPBEP", "bac", "diplom_sup") %in% k)] - 0.01 / 2
my_explan_predict[, c("Diploma_ilr1", "Diploma_ilr2")] <-
ilr(Diploma_predict, V = t(V_dip))
predicted_ilr <- predict(model_complete, newdata = my_explan_predict)
predicted_simplex <- ilrInv(predicted_ilr, V = t(V_votes))
# differences between the shares
diff_simplex_left <- predicted_simplex[, 1] - fitted_simplex[, 1]
diff_simplex_right <- predicted_simplex[, 2] - fitted_simplex[, 2]
diff_simplex_xr <- predicted_simplex[, 3] - fitted_simplex[, 3]
elasticities <- rbind(elasticities,
data.frame(
variable = k,
dep = row.names(my_explan),
left = diff_simplex_left,
right = diff_simplex_right,
xr = diff_simplex_xr))
}
# for Employ
for (k in c("AZ", "BE", "FZ", "GU", "OQ")) {
my_explan_predict <- my_explan
Employment_predict <- Employment
Employment_predict[, k] <- Employment_predict[, k] + 0.01
Employment_predict[, !(c("AZ", "BE", "FZ", "GU", "OQ") %in% k)] <-
Employment_predict[, !(c("AZ", "BE", "FZ", "GU", "OQ") %in% k)] - 0.01 / 2
my_explan_predict[, c("Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4")] <-
ilr(Employment_predict, V = t(V_employ))
predicted_ilr <- predict(model_complete, newdata = my_explan_predict)
predicted_simplex <- ilrInv(predicted_ilr, V = t(V_votes))
# differences between the shares
diff_simplex_left <- predicted_simplex[, 1] - fitted_simplex[, 1]
diff_simplex_right <- predicted_simplex[, 2] - fitted_simplex[, 2]
diff_simplex_xr <- predicted_simplex[, 3] - fitted_simplex[, 3]
elasticities <- rbind(elasticities,
data.frame(
variable = k,
dep = row.names(my_explan),
left = diff_simplex_left,
right = diff_simplex_right,
xr = diff_simplex_xr))
}
# tidy the data
elasticities <- tidyr::pivot_longer(elasticities,
cols = 3:5, names_to = "party", values_to = "score")
For one variable (for example the unemployment rate), we represent parallel boxplots of the semi-elasticities for each party.
For each party, we can also plot the semi-elasticities on a map
dep.2015.spdf@data[, c("se_left", "se_right", "se_others")] <-
tidyr::pivot_wider(dplyr::filter(elasticities, variable == "unemp_rate"), id_cols = 2,
names_from = "party", values_from = "score")[, c("left", "right", "xr")]
dep.2015.sf <- sf::st_as_sf(dep.2015.spdf)
dep.2015.sf <- tibble::as_tibble(dep.2015.sf)
library(sf)
library(ggspatial) # spatial figures
ggplot(data = dep.2015.sf, aes(geometry = geometry)) +
geom_sf(aes(fill = se_left), linetype = 2, size = 0.5) +
coord_sf(crs = st_crs(2154)) +
annotation_scale(location = "bl", width_hint = 0.5) +
scale_fill_gradient2('Semi-elasticities \n on the left',
low='#007acc', mid = '#ffffff',
high = '#ff1a1a', midpoint = 0,
limits = c(-0.06, 0.06)) +
annotation_north_arrow(location = "bl", which_north = "true",
pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
style = north_arrow_fancy_orienteering)
ggplot(data = dep.2015.sf, aes(geometry = geometry)) +
geom_sf(aes(fill = se_right), linetype = 2, size = 0.5) +
coord_sf(crs = st_crs(2154)) +
annotation_scale(location = "bl", width_hint = 0.5) +
scale_fill_gradient2('Semi-elasticities \n on the right',
low='#007acc', mid = '#ffffff',
high = '#ff1a1a', midpoint = 0,
limits = c(-0.06, 0.06)) +
annotation_north_arrow(location = "bl", which_north = "true",
pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
style = north_arrow_fancy_orienteering)
ggplot(data = dep.2015.sf, aes(geometry = geometry)) +
geom_sf(aes(fill = se_others), linetype = 2, size = 0.5) +
coord_sf(crs = st_crs(2154)) +
annotation_scale(location = "bl", width_hint = 0.5) +
scale_fill_gradient2('Semi-elasticities \n on the XR',
low='#007acc', mid = '#ffffff',
high = '#ff1a1a', midpoint = 0,
limits = c(-0.06, 0.06)) +
annotation_north_arrow(location = "bl", which_north = "true",
pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
style = north_arrow_fancy_orienteering)
We compare the fitted values of our CoDa regression model with the fitted values of a Dirichlet model.
We compute the Dirichlet regression model by using DirichletReg package (see Maier, 2020). We then compute the fitted values in the simplex space:
my_explan_Dir <- my_explan
my_explan_Dir$Y <- DR_data(voteshare)
dirichlet_model <- DirichReg(Y ~ Age_ilr1 + Age_ilr2 +
Diploma_ilr1 + Diploma_ilr2 +
Employ_ilr1 + Employ_ilr2 + Employ_ilr3 + Employ_ilr4 +
income_rate + unemp_rate + employ_evol + owner_rate + foreign,
data = my_explan_Dir)
fitted_dirichlet <- predict(dirichlet_model)
To get the fitted values of the CoDa regression model, we first compute the fitted values in the ilr coordinates space and then go back to the simplex:
Finally, we plot the scatter plot of the fitted values for each party (Figure 4 in the article):
postscript(file = "./Figures/fitted.eps", width = 12, height = 4.1, paper = "special",
horizontal = FALSE)
op <- par(mfrow = c(1, 3), las = 1, oma = c(0, 0, 0, 0),
mar = c(4, 4.1, 2, 1.2), mgp = c(2.8, 1, 0), cex.axis = 1.5,
cex.lab = 1.5, cex.main = 1.6)
plot(fitted_dirichlet[, 1], fitted_coda[, 1], xlab = "Dirichlet fitted values",
ylab = "CoDa fitted values", main = "Left", xlim = c(0.05, 0.6), ylim = c(0.05, 0.6))
abline(a = 0, b = 1)
plot(fitted_dirichlet[, 2], fitted_coda[, 2], xlab = "Dirichlet fitted values",
ylab = "CoDa fitted values", main = "Right", xlim = c(0.05, 0.6), ylim = c(0.05, 0.6))
abline(a = 0, b = 1)
plot(fitted_dirichlet[, 3], fitted_coda[, 3], xlab = "Dirichlet fitted values",
ylab = "CoDa fitted values", main = "XR", xlim = c(0.05, 0.6), ylim = c(0.05, 0.6))
abline(a = 0, b = 1)
dev.off()
## png
## 2
Comment: the fitted values of the two models appear to be very close.
We now compute the MSE of the CoDa regression model:
## [1] 0.1254843
We transform the fitted values of the Dirichlet model (in the simplex) to their counterpart in the ilr space and we compute the MSE:
## [1] 0.1812526
Comment: It appears that the MSE of the two models are comparable. It is slightly better for the CoDa model.
We focus on the three departements: Ariege (“09”), Cantal (“15”) and Bas-Rhin (“67”). For each of these three departments, we fix all their characteristics except unemp_rate and we create a grid of fictive unemp_rate from 0 to \(25\%\) by increments of \(1\%\).
To compute the predictions for the fictive points, we proceed as follows:
we compute the predicted values in the ilr coordinates space corresponding to the fictive points
we go back to the simplex and compute the differences between the predictions and the initial fitted values for this particular departement.
covariate_fix_1 <- my_explan[
dep.2015.spdf$NOM_DEPT == "ARIEGE",
c("Age_ilr1", "Age_ilr2",
"Diploma_ilr1", "Diploma_ilr2",
"Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
"income_rate", "employ_evol", "owner_rate", "foreign")]
covariate_fix_2 <- my_explan[
dep.2015.spdf$NOM_DEPT == "CANTAL",
c("Age_ilr1", "Age_ilr2",
"Diploma_ilr1", "Diploma_ilr2",
"Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
"income_rate", "employ_evol", "owner_rate", "foreign")]
covariate_fix_3 <- my_explan[
dep.2015.spdf$NOM_DEPT == "BAS-RHIN",
c("Age_ilr1", "Age_ilr2",
"Diploma_ilr1", "Diploma_ilr2",
"Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
"income_rate", "employ_evol", "owner_rate", "foreign")]
unemp_rate <- seq(0.00, 0.25, length.out = 95)
new_data1 = cbind(covariate_fix_1, unemp_rate)
new_data2 = cbind(covariate_fix_2, unemp_rate)
new_data3 = cbind(covariate_fix_3, unemp_rate)
# prediction
pred_y_unemp_rate_1 <- predict(model_complete, new = new_data1)
pred_y_unemp_rate_2 <- predict(model_complete, new = new_data2)
pred_y_unemp_rate_3 <- predict(model_complete, new = new_data3)
# #convert to the simplex
compPred_unemp_rate_s1 <- ilrInv(pred_y_unemp_rate_1, V = t(V_votes))
compPred_unemp_rate_s2 <- ilrInv(pred_y_unemp_rate_2, V = t(V_votes))
compPred_unemp_rate_s3 <- ilrInv(pred_y_unemp_rate_3, V = t(V_votes))
To get Figure 5 in the article, we use the following code:
# postscript("./Figures/image.eps", width = 12, height = 5, paper = "special",
# horizontal = FALSE)
op <- par(mfrow = c(1, 3), las = 1, oma = c(0, 0, 0, 0),
mar = c(4, 4.1, 2, 1.2), mgp = c(2.4, 1, 0), cex.axis = 1.5,
cex.lab = 1.5, cex.main = 1.6)
plot(unemp_rate, compPred_unemp_rate_s1[, 1],
ylab = as.expression(bquote(hat(Y))),
xlab = "Unemployment rate",
pch = 21, xlim = c(0, 0.25), xaxs = "i", bg = 2,
ylim = c(0, 1), yaxs = "i", col = "red", type = "l", lwd = 4)
legend("topleft", legend = c("Right", "Left", "XR"), cex = 1.2,
fill = c("royalblue", "red", "violet"))
title("Ariège")
lines(unemp_rate, compPred_unemp_rate_s1[,2],
col = "royalblue", type = "l", lty = 2, lwd = 4)
lines(unemp_rate, compPred_unemp_rate_s1[,3],
col = "violet", type = "l", lty = 3, lwd = 4)
abline(v = min(my_explan$unemp_rate), col = "grey", lty = 2)
abline(v = max(my_explan$unemp_rate), col = "grey", lty = 2)
plot(unemp_rate, compPred_unemp_rate_s2[,1], ylab = "", xlab = "Unemployment rate", pch = 21,
xlim = c(0, 0.25), xaxs = "i", bg = 2, ylim = c(0, 1),
yaxs = "i", col = "red", type = "l", lwd = 4)
title("Cantal")
lines(unemp_rate, compPred_unemp_rate_s2[,2], col = "royalblue",
type = "l", lty = 2, lwd = 4)
lines(unemp_rate, compPred_unemp_rate_s2[,3], col = "violet",
type = "l", lty = 3, lwd = 4)
abline(v = min(my_explan$unemp_rate), col = "grey", lty = 2)
abline(v = max(my_explan$unemp_rate), col = "grey", lty = 2)
plot(unemp_rate, compPred_unemp_rate_s3[,1], ylab = "", xlab = "Unemployment rate", pch = 21,
xlim = c(0, 0.25), xaxs = "i", bg = 2, ylim = c(0, 1),
yaxs = "i", col = "red", type = "l", lwd = 4)
title("Bas-Rhin")
lines(unemp_rate, compPred_unemp_rate_s3[,2], col = "royalblue", type = "l", lty = 2, lwd = 4)
lines(unemp_rate, compPred_unemp_rate_s3[,3], col = "violet", type = "l", lty = 3, lwd = 4)
abline(v = min(my_explan$unemp_rate), col = "grey", lty = 2)
abline(v = max(my_explan$unemp_rate), col = "grey", lty = 2)
To get Figure 6 in the article, we first need to compile the following local function which permits to represent one particular composition on a ternary diagram :
Then, we execute the following code:
colnames(compPred_unemp_rate_s1) <- c("Left", "Right", "XR")
colnames(compPred_unemp_rate_s2) <- c("Left", "Right", "XR")
colnames(compPred_unemp_rate_s3) <- c("Left", "Right", "XR")
# we prepare the nodes of the ternary diagram
A <- c(0, 0)
B <- c(1, 0)
C <- c(0.5, sqrt(3)/2)
# transform the predictions in categorical variables
my.palette <- brewer.pal(n = 9, name = "OrRd")
breaks_ilr1 <- classIntervals(new_data1$unemp_rate, n = 9,
style = "quantile", intervalClosure = "left")
# postscript("./Figures/figure_4.eps", width = 12, height = 5, paper = "special",
# horizontal = FALSE)
op <- par(mfrow = c(1, 3), oma = c(0, 1, 2, 1),
mar = c(0, 0.5, 1.5, 0.5), cex.main = 1.6)
### Ternary diagram for Ariege
# transform the compositions in the cartesian coordinates
ariege_simplex_x <- compPred_unemp_rate_s1[, 1] * A[1] +
compPred_unemp_rate_s1[, 2] * B[1] + compPred_unemp_rate_s1[, 3] * C[1]
ariege_simplex_y <- compPred_unemp_rate_s1[, 1] * A[2] +
compPred_unemp_rate_s1[, 2] * B[2] + compPred_unemp_rate_s1[, 3] * C[2]
# plot the ternary diagram
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F,
type = "n", xlab = "", ylab = "", asp = 1, main = "Ariège",
ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
legend("topleft", legend = paste0("[", round(breaks_ilr1$brks[1:9], 2),
";", round(breaks_ilr1$brks[2:10], 2), "["),
cex = 1.4, fill = my.palette, title = "Unemployment rate", bty = "n")
legend("topright", legend = c("Observed", "Predicted"),
cex = 1.8, pch = c(15, 18), col = "green")
text(c(0, 1, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05),
c("Left", "Right", "XR"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(ariege_simplex_x, ariege_simplex_y, pch = 16,
col = my.palette[findCols(breaks_ilr1)], cex = 1)
plot_1_compo(voteshare[22,], col = "green", bg="green", cex = 2, pch = 15)
plot_1_compo(compPred_unemp_rate_s1[53, ], col = "green", cex = 2.5, pch = 18)
lines(c(0.5, 0.5), c(0, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.25, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.75, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")
### Ternary diagram for Cantal
cantal_simplex_x <- compPred_unemp_rate_s2[, 1] * A[1] +
compPred_unemp_rate_s2[, 2] * B[1] + compPred_unemp_rate_s2[, 3] * C[1]
cantal_simplex_y <- compPred_unemp_rate_s2[, 1] * A[2] +
compPred_unemp_rate_s2[, 2] * B[2] + compPred_unemp_rate_s2[, 3] * C[2]
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F,
type = "n", xlab = "", ylab = "", asp = 1, main = "Cantal",
ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
text(c(0, 1, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05),
c("Left", "Right", "XR"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(cantal_simplex_x, cantal_simplex_y, pch = 16,
col = my.palette[findCols(breaks_ilr1)], cex = 1)
plot_1_compo(voteshare[16,], col = "green", bg="green", cex = 2, pch = 15)
plot_1_compo(compPred_unemp_rate_s2[32, ], col = "green", bg="green", cex = 2.5, pch = 18)
lines(c(0.5, 0.5), c(0, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.25, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.75, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")
### Ternary diagram for Bas-Rhin
basrhin_simplex_x <- compPred_unemp_rate_s3[, 1] * A[1] +
compPred_unemp_rate_s3[, 2] * B[1] + compPred_unemp_rate_s3[, 3] * C[1]
basrhin_simplex_y <- compPred_unemp_rate_s3[, 1] * A[2] +
compPred_unemp_rate_s3[, 2] * B[2] + compPred_unemp_rate_s3[, 3] * C[2]
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F,
type = "n", xlab = "", ylab = "", asp = 1, main = "Bas-Rhin",
ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
text(c(0, 1, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05),
c("Left", "Right", "XR"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(basrhin_simplex_x, basrhin_simplex_y, pch = 16,
col = my.palette[findCols(breaks_ilr1)], cex = 1)
plot_1_compo(voteshare[10,], col = "green", bg="green", cex = 2, pch = 15)
plot_1_compo(compPred_unemp_rate_s3[62, ], col = "green", bg="green", cex = 2.5, pch = 18)
lines(c(0.5, 0.5), c(0, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.25, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")
lines(c(0.75, 0.5), c(0.4330127, 0.2886751), lty = 2, col = "darkgrey")
All explanatory variables are fixed to the value of the given department except Diploma. We create of grid of fictive points in the Diploma triangle and compute the predicted shares at each point of this grid. However since it is impossible to plot a function from the simplex to the simplex, we choose to summarize the predicted shares by the winning party (corresponding to the highest share) and color the triangle in the Diploma space according to the winning party color.
# we fix the covariates
covariate_fix_1c <- my_explan[
dep.2015.spdf$NOM_DEPT == "ARIEGE",
c("Age_ilr1", "Age_ilr2",
"Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
"income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]
covariate_fix_2c <- my_explan[
dep.2015.spdf$NOM_DEPT == "CANTAL",
c("Age_ilr1", "Age_ilr2",
"Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
"income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]
covariate_fix_3c <- my_explan[
dep.2015.spdf$NOM_DEPT == "BAS-RHIN",
c("Age_ilr1", "Age_ilr2",
"Employ_ilr1", "Employ_ilr2", "Employ_ilr3", "Employ_ilr4",
"income_rate", "unemp_rate", "employ_evol", "owner_rate", "foreign")]
# create a grid of shares for Diploma
x_seq <- seq(0.01, 0.99, 0.01)
x_Diploma <- data.frame(No_CAPBEP = rep(x_seq, 101),
bac = rep(x_seq, each = 101))
x_Diploma$diplom_sup <- 1 - (x_Diploma$bac + x_Diploma$No_CAPBEP)
x_Diploma <- x_Diploma[x_Diploma$diplom_sup > 0, ]
# transform the grid in the ilr
x_Diploma_ilr <- ilr(x_Diploma, V = t(V_dip))
colnames(x_Diploma_ilr) = c("Diploma_ilr1", "Diploma_ilr2")
new_data1c <- cbind(covariate_fix_1c, x_Diploma_ilr)
new_data2c <- cbind(covariate_fix_2c, x_Diploma_ilr)
new_data3c <- cbind(covariate_fix_3c, x_Diploma_ilr)
# prediction
pred_y_dip_1 <- predict(model_complete, new = new_data1c)
pred_y_dip_2 <- predict(model_complete, new = new_data2c)
pred_y_dip_3 <- predict(model_complete, new = new_data3c)
# convert vers le simplex
compPred_dip_s1 <- ilrInv(pred_y_dip_1, V = t(V_votes))
compPred_dip_s2 <- ilrInv(pred_y_dip_2, V = t(V_votes))
compPred_dip_s3 <- ilrInv(pred_y_dip_3, V = t(V_votes))
To get Figure 7, we use the following code:
Diploma_simplex_x <- x_Diploma[, 1] * A[1] + x_Diploma[, 2] * B[1] + x_Diploma[, 3] * C[1]
Diploma_simplex_y <- x_Diploma[, 1] * A[2] + x_Diploma[, 2] * B[2] + x_Diploma[, 3] * C[2]
Diploma_depart_x <- Diploma[, 1] * A[1] + Diploma[, 2] * B[1] + Diploma[, 3] * C[1]
Diploma_depart_y <- Diploma[, 1] * A[2] + Diploma[, 2] * B[2] + Diploma[, 3] * C[2]
# postscript("./Figures/test.eps", width = 12, height = 5, paper = "special",
# horizontal = FALSE)
# Ternary diagram for Ariège
op <- par(mfrow = c(1, 3), oma = c(0, 1.5, 2, 1), mar = c(0, 0.7, 1.5, 0.7))
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F,
type = "n", xlab = "", ylab = "", asp = 1, main = "Ariège",
ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
legend("topleft", legend = c("Right", "Left", "XR"), cex = 1.5,
fill = c("royalblue", "red", "violet"))
text(c(0.05, 0.95, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05),
c("<BAC", "BAC", "SUP"), pos = 3,
cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(Diploma_simplex_x, Diploma_simplex_y, pch = 16,
col = c("red", "royalblue", "violet")[apply(compPred_dip_s1, 1, which.max)],
cex = 1)
points(Diploma_depart_x, Diploma_depart_y, pch = 16, cex = 1.2)
# Ternary diagram for Cantal
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F,
type = "n", xlab = "", ylab = "", asp = 1, main = "Cantal", ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
text(c(0.05, 0.95, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), c("<BAC", "BAC", "SUP"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(Diploma_simplex_x, Diploma_simplex_y, pch = 16,
col = c("red", "royalblue", "violet")[apply(compPred_dip_s2, 1, which.max)],
cex = 1)
points(Diploma_depart_x, Diploma_depart_y, pch = 16, cex = 1.2)
# Ternary diagram for Bas-Rhin
plot(rbind(A, B, C), xaxt = "n", yaxt = "n", frame = F,
type = "n", xlab = "", ylab = "", asp = 1, main = "Bas-Rhin", ylim = c(-0.1, sqrt(3)/2), cex.main = 2)
text(c(0.05, 0.95, 1/2+0.07), c(0-0.06, -0.06, sqrt(3)/2-0.05), c("<BAC", "BAC", "SUP"), pos = 3, cex = 1.5)
lines(c(0, 1), c(0, 0), col = "black")
lines(c(0, 0.5), c(0, sqrt(3)/2), col = "black")
lines(c(1, 0.5), c(0, sqrt(3)/2), col = "black")
points(Diploma_simplex_x, Diploma_simplex_y, pch = 16,
col = c("red", "royalblue", "violet")[apply(compPred_dip_s3, 1, which.max)],
cex = 1)
points(Diploma_depart_x, Diploma_depart_y, pch = 16, cex = 1.2)
K. Gerald van den Boogaart, Raimon Tolosana-Delgado and Matevz Bren (2020). compositions: Compositional Data Analysis. R package version 2.0-0. https://CRAN.R-project.org/package=compositions
Maier, M. J. (2020). DirichletReg: Dirichlet Regression in R. R package version 0.7-0. http://dirichletreg.r-forge.r-project.org/
Pebesma, E.J., R.S. Bivand, 2005. Classes and methods for spatial data in R. R News 5 (2)*, https://cran.r-project.org/doc/Rnews/.