You can install this package with
devtools::install_github("rmnppt/simdr")You can load in the 2016 simd domain data as follows:
data("simd16_domains")Or you might want to view the more granular indicator data:
data("simd16_indicators")If you want to analyse the data in the way that the SIMD team does you can start by:
- Selecting the indicator variables belonging to a domain
- Transform them to be normally distributed
- Replace any missing values
Here is an example using education:
library(dplyr)
normalised_education <- simd16_indicators %>% # start with the raw data
select(Attendance, Attainment, Noquals, NEET, HESA) %>% # select relevant columns
mutate(Attendance = normalScores(Attendance, forwards = FALSE)) %>% # replace each column
mutate(Attainment = normalScores(Attainment, forwards = FALSE)) %>%
mutate(Noquals = normalScores(Noquals, forwards = TRUE)) %>%
mutate(NEET = normalScores(NEET, forwards = TRUE)) %>%
mutate(HESA = normalScores(HESA, forwards = FALSE)) %>%
mutate_all(funs(replaceMissing)) # replace missing valuesYou will notice that the above gives a warning, there is some missing data. You may want to fill in the missing values, so we include a utility (replaceMissing) to replace missing and infinite values with 0, the center of the new normal distribution.
Notice that when we call normalScores we can decide whether a high value indicates deprivation or not, see ?normalScores for more detail.
When combining the indicators to give a domain score, we need to apply a different weight to each. The weights are derived through factor analysis of the normalised indicator scores, and the proportional loadings on factor 1 serve as the weightings. We extract the loadings using the getFAWeights function as follows:
education_weights <- getFAWeights(normalised_education)Now that we have the normalised indicator scores and weights, we can combine them with the utility function combineWeightsAndNorms. Each normalised indicator variable is multiplied by its weight derived from factor analysis, as follows:
education_score <- combineWeightsAndNorms(education_weights, normalised_education)Finally we rank these weighted scores to generate the domain rank (1 = most deprived).
education_rank <- rank(-education_score)Find more information about the openSIMD project here:
blog: https://blogs.gov.scot/statistics/2017/05/23/opensimd/
documentation and examples: http://www.gov.scot/Topics/Statistics/SIMD/analysis/openSIMD
original repo: https://github.com/TheDataLabScotland/openSIMD