This document contains reproducible R- and indirectly Stata-code to replay steps taken of data preparation and analysis for the manuscript. The code comprises custom functions, data wrangling, analysis, and the creation of tables and figures. However, three limitations may make the reproduction difficult. Users need:

Access to the georeferenced survey data
Installation of the proprietary statistical software Stata
A license for the geocoder of the Federal Agency of Cartography and Geodesy (BKG) linked to one’s own IP range

A minor limitation is that the tables for submission were manually created as Excel spreadsheets based on the HTML-tables created below.

In the following, I will briefly describe each procedure in the separated code chunks to let users know why an individual step was taken.

SESSION AND ENVIRONMENT SETUP

The default buffer zone size in the manuscript is 500 meters. This size is defined here.

default_buffer_size <- 500

Packages

Almost any modern R code requires additional R packages. I have chosen the easypackages package to request packages and install them, if necessary.

# load the easypackages package
if (!require(easypackages)) { install.packages("easypackages") } 
library(easypackages)

easypackages::packages(
  "car",
  "dplyr",
  "furrr",
  "ggplot2",
  "haven",
  "htmlTable",
  "httr",
  "huxtable",
  "magrittr",
  "osmdata",
  "patchwork",
  "plyr",
  "raster",
  "readr",
  "RStata",
  "sf",
  "sp",
  "StefanJuenger/sdcR",
  "tibble",
  prompt = FALSE
)

cache_files <- 
  list.files(
    "./revise_and_resubmit_2/Land_Use_Disadvantages_Main_Analysis_cache/html/",
    full.names = TRUE) %>% 
  sub('\\.RData$', '', .) %>% 
  sub('\\.rdb$', '', .) %>% 
  sub('\\.rdx$', '', .) %>% 
  unique() %>% 
  .[-1]

for (i in cache_files) {
  lazyLoad(i)
}

The code below uses, amongst others,Stata’s mtable and mlincom functions to calculate and test average marginal effects. As Stata is accessed through R, we need to define where Stata and which version is installed.

options("RStata.StataPath" = "/path/to/stata/Stata-SE-15/StataSE-64")
options("RStata.StataVersion" = 15) 

stata_data_path = "./data_stata/"

FUNCTIONS

A large proportion of this document comprises custom functions. I divided them into ‘helper functions,’ which provide minor features and that other, more extensive functions use, and more elaborated ones, that offer spatial operations or interfaces to some Stata functions.

Helper Functions

In Figure 1, several layers of geospatial data are displayed with different color gradients. The following function creates these gradients on a layer-by-layer basis.

color_gradient <- function (x, palette) {
  
  get_palette <- suppressWarnings(
    colorRampPalette(RColorBrewer::brewer.pal(20, 
                                              palette))
  ) 
  
  get_palette(length(x))[order(order(x))]
}

For data protection reasons, I cannot give full information on the minimum and maximum values of the variables derived from the geospatial data. At least 20 observations should share a joined value in the lower and higher end of each distribution. These values were computed with the following function.

censor_variable <- function (data, variable, descending = FALSE) {
  
  variable <- rlang::enquo(variable)
  
  if (isFALSE(descending)) {
    data %>% 
      dplyr::arrange(!! variable) %>% 
      dplyr::slice(20) %>%
      dplyr::select(!! variable) %>%
      trunc(0)
  } else {
    data %>% 
      dplyr::arrange(dplyr::desc(!! variable)) %>% 
      dplyr::slice(20) %>%
      dplyr::select(!! variable) %>%
      trunc(0)
  }
}

Below are functions interfacing Stata within R. As these functions need a translation from common R formulas to Stata ones, the following function was required. Please note that it also provides a toggle to translate equations in the gsem format.

r_to_stata_formula <- function (r_formula, data_to_check, gsem = FALSE) {
  
  # stata_variable_type_checker
  stata_variable_type_checker <- function (variable) {
    if (is.numeric(data_to_check[[variable]])) {
      paste0("c.", variable)
    } else if (is.factor(data_to_check[[variable]])) {
      paste0("i.", variable)
    }
  }
  
  # basic formatting
  stata_formula <-
    r_formula %>% 
    gsub("[~]", "", .) %>% 
    gsub("[+]", "", .) %>% 
    gsub("[:]", "#", .) %>% 
    
    # seperate elements for further inspection
    strsplit(., split = "  ") %>% 
    unlist()
  
  # convert quadratic terms
  stata_formula <- 
    sapply(stata_formula, function (term) {
      if (stringr::str_detect(term, "[I(]")) {
        sub(
          "([I][(])(.+)([\\^][2][)])(.*)",
          "\\2#\\2\\4",
          term
        )
      } else {
        term
      }
    }, 
    USE.NAMES = FALSE
    )
  
  # define variable types based on data_to_check
  stata_formula <- 
    stata_formula[1] %>% # exclude dependent variable
    c(.,
      sapply(stata_formula[-1], function (term) {
        if (!stringr::str_detect(term, "#")) {
          stata_variable_type_checker(term)
        } else {
          terms <-
            strsplit(term, split = "#") %>%
            unlist()
          
          sapply(terms, function (term) {
            stata_variable_type_checker(term)
          }, 
          USE.NAMES = FALSE) %>%
            paste(collapse = "#")
        }
      }, 
      USE.NAMES = FALSE)
    )  %>%
    paste(collapse = " ")
  
  # add <- for gsem
  if (isTRUE(gsem)) {
    stata_formula <- 
      glue::glue(
        "{stringr::word(stata_formula, 1)} <- ", 
        "{stringr::word(stata_formula, 2, -1)}"
      )
  }
  
  return(stata_formula)
}

I created the tables in the manuscript with the huxtable package as it also provides nicely formatted regression tables of tidy data frames. Another convenient feature is that we can input custom glance information, as part of the broom package. To facilitate creating tidy data frames from my own Stata-to-R data format, I defined the following functions for the regression estimates…

create_tidy_override <- function (estimations, glance_add = NULL) {
  hux_obj <- 
    estimations[["estimation_results"]] %>% 
    huxtable::tidy_override(
      .,
      glance = 
        append(
          list(
            nobs = .$N[1],
            r.squared = .$r2[1]
          ),
          glance_add
        ),
      extend = TRUE
    ) 
  
  hux_obj$tidy_cols <- hux_obj$model
  
  hux_obj
}

…and the results from the linear combininations of the mlincom results.

create_tidy_override_mlincom <- function (estimations, glance_add = NULL) {
  hux_obj <- 
    estimations[["mlincom_results"]] %>% 
    huxtable::tidy_override(
      .,
      glance =  
        append(
          list(),
          glance_add
        ),
      extend = TRUE
    ) 
  
  hux_obj$tidy_cols <- hux_obj$model
  
  hux_obj
}

Estimates and corresponding p-values vary during the process of data analysis. To extract this information automatically for the tables, the function below accesses the Stata-to-R data objects.

extract_coefficient <- function (estimations, row_number, p_stars = TRUE) {
  coefficient <- estimations$estimate[row_number]
  p_value <- estimations$p.value[row_number]
  
  if (isTRUE(p_stars)) {
    stars <- dplyr::case_when(
      p_value <= .001 ~ " ***",
      p_value > .001 &  p_value <= .01 ~ " **",
      p_value > .01 &  p_value <= .05 ~ " *",
      p_value > .05 &  p_value <= .1 ~ " +"
    )
  } else {
    stars <- NULL
  }
  
  paste0(coefficient, stars)
}

Spatial Methods

Figure 1 makes use of the modern interface to simple features in R. Each geometry column in the simple feature data frames can be rotated, which is done with the following function.

rotate_data <- function(data, x_add = 0, y_add = 0) {
  
  shear_matrix <- function (x) { 
    matrix(c(2, 1.2, 0, 1), 2, 2) 
  }
  
  rotate_matrix <- function(x) { 
    matrix(c(cos(x), sin(x), -sin(x), cos(x)), 2, 2) 
  }
  
  data %>% 
    dplyr::mutate(
      geometry = 
        .$geometry * shear_matrix() * rotate_matrix(pi/20) + c(x_add, y_add)
    )
}

The measure of centrality in the manuscript is derived from the centroid of the ZIP code area in which municipal administrations are located. Information about ZIP codes is part of the list of municipalities of the Federal Statistical Office of Germany (destatis). I used the BKG geocoding service to retrieve the centroid geocoordinates for each municipality in the data using the function as defined here.

bkg_geocode_single_address <- function (
  street,
  house_number,
  zip_code,
  place,
  epsg = "3035"
) {
  
  # create POST string for WFS service
  POST_request_string <- paste0(
    '<?xml version="1.0" encoding="UTF-8" ?>
<wfs:GetFeature version="1.1.0" service="WFS" maxFeatures="1"
        xmlns:wfs="http://www.opengis.net/wfs"
        xmlns:ogc="http://www.opengis.net/ogc"
        xmlns:gdz="http://www.geodatenzentrum.de/ortsangabe"
        xmlns:gml="http://www.opengis.net/gml"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <wfs:Query typeName="gdz:Ortsangabe" srsName="EPSG:', epsg, '">
    <ogc:Filter>
      <ogc:And>
        <ogc:PropertyIsLike escapeChar="\" wildCard="*" singleChar="?">
          <ogc:PropertyName>strasse</ogc:PropertyName>
          <ogc:Literal>', street, '</ogc:Literal>
        </ogc:PropertyIsLike>
        <ogc:PropertyIsLike escapeChar="\" wildCard="*" singleChar="?">
          <ogc:PropertyName>haus</ogc:PropertyName>
          <ogc:Literal>', house_number, '</ogc:Literal>
        </ogc:PropertyIsLike>
        <ogc:PropertyIsLike escapeChar="\" wildCard="*" singleChar="?">
          <ogc:PropertyName>plz</ogc:PropertyName>
          <ogc:Literal>', zip_code, '</ogc:Literal>
        </ogc:PropertyIsLike>
        <ogc:PropertyIsLike escapeChar="\" wildCard="*" singleChar="?">
          <ogc:PropertyName>ort</ogc:PropertyName>
          <ogc:Literal>', place, '</ogc:Literal>
        </ogc:PropertyIsLike>
      </ogc:And>
    </ogc:Filter>
  </wfs:Query>
</wfs:GetFeature>'
  )
  
  # conduct actual request
  POST_request <- httr::POST(
    'https://sg.geodatenzentrum.de/wfs_geokodierung_bund?outputformat=json',
    body = POST_request_string
  )
  POST_sf <-
    sf::read_sf(POST_request) %>%
    dplyr::select(-bbox)
  
  POST_sf$gc_x <-
    POST_sf %>%
    sf::st_coordinates() %>%
    .[1]
  
  POST_sf$gc_y <-
    POST_sf %>%
    sf::st_coordinates() %>%
    .[2]
  
  POST_sf <-
    POST_sf %>%
    dplyr::mutate_at(dplyr::vars(-geometry), as.character)
  
  # return object
  return(POST_sf)
}

bkg_geocode <- function (
  data = NULL,
  street,
  house_number,
  zip_code,
  place,
  epsg = "3035"
) {
  
  if (is.null(data)) {
    bkg_geocode_single_address(
      street = street,
      house_number = house_number,
      zip_code = zip_code,
      place = place,
      epsg = epsg
    )
  } else {
    lapply(1:nrow(data), function (i) {
      
      message(paste0("Turning to row ", i, " from ", nrow(data)))
      
      bkg_geocode_single_address(
        street = data[[street]][i],
        house_number = data[[house_number]][i],
        zip_code = data[[zip_code]][i],
        place = data[[place]][i],
        epsg = epsg
      )
      
    }) %>%
      dplyr::bind_rows(.) %>%
      dplyr::bind_cols(
        data,
        .
      ) %>%
      sf::st_sf(crs = eval(parse(text = epsg)), sf_column_name = "geometry")
  }
}

In Figure 1, I locate the geocoordinate of a fictional survey respondent as the centroid of a 1 hectare (INSPIRE) grid cell. The following function can be used to convert any geocoordinate in EPSG:3035 to either a 1 hectare or 1 km² INSPIRE ID.

spt_create_inspire_ids <- function(
  data,
  type = c("1km", "100m"),
  column_name = "Gitter_ID_",
  combine = FALSE
) {
  
  if (sf::st_crs(data)$epsg != 3035) {
    data <- data %>% sf::st_transform(3035)
  }
  
  coordinate_pairs <-
    data %>%
    sf::st_coordinates() %>%
    tibble::as_tibble()
  
  id_name <-
    glue::glue("{column_name}{type}")
  
  loop_to_evaluate <-
    dplyr::case_when(
      type == "1km" ~
        glue::glue(
          "1kmN{substr(coordinate_pairs$Y %>% as.character(), 1, 4)}",
          "E{substr(coordinate_pairs$X %>% as.character(), 1, 4)}"
        ) %>%
        as.character,
      type == "100m" ~
        glue::glue(
          "100mN{substr(coordinate_pairs$Y %>% as.character(), 1, 5)}",
          "E{substr(coordinate_pairs$X %>% as.character(), 1, 5)}"
        ) %>%
        as.character()
    )
  
  expression_to_evaluate <-
    rlang::expr(!!rlang::sym(id_name) <- loop_to_evaluate)
  
  eval(expression_to_evaluate)
  
  if (isTRUE(combine)) {
    return(
      dplyr::bind_cols(data, !!id_name := get(id_name))
    )
  } else {
    return(get(id_name))
  }
}

The INSPIRE ID is a text string, which can, in turn, be used to create a centroid geocoordinate as displayed by the following function. We also need this function to retrieve geocoordinates from the georeferenced GGSS as it also only provides text strings for INSPIRE IDs.

spt_extract_inspire_x <- function (inspire_ids) {
  
  if (stringr::str_detect(inspire_ids[1], "1km")) {
    inspire_x <-
      substr(inspire_ids, 10, 13) %>%
      paste0(., "500") %>%
      as.numeric()
  }
  
  if (stringr::str_detect(inspire_ids[1], "100m")) {
    inspire_x <-
      substr(inspire_ids, 12, 16) %>%
      paste0(., "50") %>%
      as.numeric()
  }
  
  return(inspire_x)
}


spt_extract_inspire_y <- function (inspire_ids) {
  
  if (stringr::str_detect(inspire_ids[1], "1km")) {
    inspire_y <-
      substr(inspire_ids, 5, 8) %>%
      paste0(., "500") %>%
      as.numeric()
  }
  
  if (stringr::str_detect(inspire_ids[1], "100m")) {
    inspire_y <-
      substr(inspire_ids, 6, 10) %>%
      paste0(., "50") %>%
      as.numeric()
  }
  
  return(inspire_y)
}

We use two spatial linking methods: simple spatial joins or spatial overlays, and more elaborated GIS procedures such as the calculation of buffer zones. As they are applied multiple times, the following to functions provide easy access to these methods.

spl_simple_join <- function(data, shape_file, attribute) {
  data %>%
    sf::st_join(., shape_file, join = st_intersects) %>%
    sf::st_drop_geometry() %>%
    .[attribute]
}

spl_buffer <-
  function(data,
           raster_file,
           buffer_size) {
    
    message(
      paste0(
        "Extracting buffer values of size ", 
        buffer_size, 
        " from ", 
        names(raster_file)
      )
    )
    
    # target_value <-
    raster::extract(
      raster_file,
      data,
      fun = mean,
      buffer = buffer_size,
      na.rm = TRUE,
      small = FALSE
    )
    
    # return value
    # return(target_value)
  }

Stata Regressions

To test the slopes of the predictions and marginal effects (AMEs) in the manuscript, access to Stata was necessary. To my knowledge, R does not yet provide a test of second differences as described in Mize, 2019. Therefore, to apply post-estimation methods in Stata also the simple regressions have to be estimated in Stata. The following function provides an interface to the standard regress functions as well as the gsem function which is initially designed for Structural Equation Modeling but is used to compare AMEs across models.

stata_regress_robust <- function (
  dependent_variable,
  controls, 
  interaction_effects = NULL,
  data,
  clustervar = NULL,
  stata_names = NULL,
  vif = TRUE,
  stata_path = "./data_stata",
  echo = FALSE,
  stata_save = c("estimates", "coefficients")
) {
  
  if (length(controls) == 1) {
    variables <- controls[[1]]
    
    # specify possible interaction effect
    if (!is.null(interaction_effects)) {
      variables <- c(variables, interaction_effects %>% unlist())
    }
    
    # create model formula
    model_formula <- 
      as.formula(
        paste(
          dependent_variable, 
          paste(variables, collapse = " + "), 
          sep = " ~ "
        )
      )
    
    # run the model
    
    vif_call <-
      ifelse(isTRUE(vif), " \n vif", "")
    
    regression_call <- paste0(
      "regress ",
      r_to_stata_formula(model_formula, data_to_check = data),
      ", robust cluster(", clustervar, ")", 
      vif_call
    )
    
    savings <- 
      if(!is.null(stata_save)) {
        sapply(stata_save, function (to_save) {
          if (to_save == "estimates") {
            paste0(
              "estimates save ", stata_data_path, "estimates_", 
              stata_names, 
              ", replace \n"
            )
          } else if (to_save == "coefficients") {
            paste0(
              "regsave using ", stata_data_path, "coefficients_", 
              stata_names, 
              ", tstat pval ci replace"
            )
          }
        },
        USE.NAMES = FALSE)
      } else {
        ""
      }
    
    stata_src <- c(
      regression_call,
      savings
    )
    
  } else {
    
    dependent_variables_call <- 
      lapply(1:length(controls), function (i) {
        paste0(
          "clonevar ", dependent_variable, i, " = ", dependent_variable
        )
      })
    
    dependent_variables <- 
      lapply(1:length(controls), function (i) {
        glue::glue(
          "{dependent_variable}{i}"
        )
      })
    
    
    model_formulas <- 
      lapply(1:length(controls), function (i) {
        
        variables <- controls[[i]]
        
        # specify possible interaction effect
        if (!is.null(interaction_effects)) {
          variables <- c(variables, paste0(interaction_effects[[i]]))
        }
        
        # create model formula
        model_formula <-
          as.formula(
            paste(
              dependent_variables[[i]],
              paste(variables, collapse = " + "),
              sep = " ~ "
            )
          )
      })
    
    regression_call <- 
      lapply(1:length(controls), function (i) {
        glue::glue(
          "({r_to_stata_formula(model_formulas[[i]], data, gsem = TRUE)}) ///"
        )
      }) %>% 
      unlist() %>% 
      paste(collapse = "\n") %>% 
      paste0(
        "gsem ",. , "\n, nocapslatent vce(cluster ", clustervar, ")"
      )
    
    savings <- 
      if(!is.null(stata_save)) {
        sapply(stata_save, function (to_save) {
          if (to_save == "estimates") {
            paste0(
              "estimates save ", stata_data_path, "estimates_", 
              stata_names, 
              ", replace \n"
            )
          } else if (to_save == "coefficients") {
            paste0(
              "regsave using ", stata_data_path, "coefficients_", 
              stata_names, 
              ", tstat pval ci replace"
            )
          }
        },
        USE.NAMES = FALSE)
      } else {
        ""
      }
    
    stata_src <- c(
      dependent_variables_call,
      regression_call,
      savings
    ) %>% 
      unlist()
  }
  
  RStata::stata(
    stata_src,
    data.in = data,
    stata.echo = echo
  )
  
  # compile everything
  list(
    dependent_variable =
      dependent_variable,
    stata_paths = 
      c(
        paste0(stata_data_path, "estimates_", stata_names, ".ster"),
        paste0(stata_data_path, "coefficients_", stata_names, ".dta")
      ),
    estimation_results =
      if (length(controls) == 1) {
        haven::read_dta(
          paste0(
            stata_data_path, "coefficients_", stata_names, ".dta"
          )
        ) %>% 
          dplyr::transmute(
            term = var,
            estimate = coef,
            std.error = stderr,
            statistic = tstat,
            p.value = pval,
            conf.low = ci_lower,
            conf.high = ci_upper,
            N,
            r2
          )
      } else {
        haven::read_dta(
          paste0(
            stata_data_path, "coefficients_", stata_names, ".dta"
          )
        ) %>%
          dplyr::transmute(
            term = var,
            estimate = coef,
            std.error = stderr,
            statistic = tstat,
            p.value = pval,
            conf.low = ci_lower,
            conf.high = ci_upper,
            N
          )
      }
  )
}

Stata Margins

At heart of the analysis are post-estimation methods in the form of AMEs. The following function mimics Stata’s mtable function by accessing it through R. mlincom calculations can also be defined here.

stata_margins <- function (
  estimates,
  variables = NULL,
  dydx = NULL,
  at = NULL,
  over = NULL,
  additional_options = NULL,
  mlincom = NULL,
  matsize = 600,
  echo = FALSE,
  stata_names = NULL,
  data
) {
  matsize_call <-
    paste0("set matsize ", matsize)
  
  use_estimates <- paste0(
    "estimates use ", 
    estimates$stata_paths[1]
  )
  
  set_esample <- "estimates esample:"
  
  dydx_call <-
    ifelse(!is.null(dydx), paste0(" dydx(", dydx, ")"), "")
  
  at_call <-
    ifelse(!is.null(at), paste0(" ", at), "")
  
  over_call <-
    ifelse(!is.null(over), paste0(" over(", over, ")"), "")
  
  additional_options_call <-
    ifelse(!is.null(additional_options), paste0(" ", additional_options), "")
  
  mtable <- paste0(
    "mtable,",
    dydx_call, 
    at_call, 
    over_call,
    additional_options_call
    , " stat(est se p ci) post"
  )
  
  regsave_mtable <- 
    paste0(
      'regsave using "', stata_data_path, 
      stata_names, 
      '_mtable", tstat pval ci replace'
    )
  
  mlincom_call <-
    ifelse(
      is.null(mlincom),
      "",
      paste0(
        "tempname hdle\n",
        "postfile `hdle' est se z p lb ub using ", stata_data_path,  
        stata_names, "_mlincom.dta, replace\n",
        lapply(mlincom, function (i) { 
          paste0(
            "mlincom ", i, ", stats(all)\n",
            "post `hdle' (r(est)) (r(se)) (r(z)) (r(p)) (r(lb)) (r(ub))\n"
          )
        }) %>%
          do.call(rbind, .) %>%
          paste(collapse = "\n"),
        "postclose `hdle'" 
      )
    )
  
  stata_src <- paste(
    matsize_call,
    use_estimates, 
    set_esample,
    mtable,
    regsave_mtable,
    mlincom_call,
    sep = " \n "
  )
  
  RStata::stata(stata_src, data.in = data, stata.echo = echo)
  
  # compile everything
  list(
    dydx = dydx,
    over = over,
    at = at,
    stata_path_mtable = 
      regsave_mtable %>% 
      gsub(".*[\"](.+)[\"].*", "\\1", .) %>% 
      paste0(., ".dta"),
    stata_path_mlincom =
      if (is.null(mlincom)) {
        mlincom
      } else {
        paste0(stata_data_path, stata_names, "_mlincom.dta")
      },
    estimation_results = 
      haven::read_dta(
        regsave_mtable %>% 
          gsub(".*[\"](.+)[\"].*", "\\1", .) %>% 
          paste0(., ".dta")
      ) %>% 
      dplyr::transmute(
        term = var,
        estimate = coef,
        std.error = stderr,
        statistic = tstat,
        p.value = pval,
        conf.low = ci_lower,
        conf.high = ci_upper
      ),
    mlincom_results =
      if (is.null(mlincom)) {
        mlincom
      } else {
        haven::read_dta(
          paste0(stata_data_path, stata_names, "_mlincom.dta")
        ) %>% 
          dplyr::transmute(
            term = mlincom,
            estimate = est,
            std.error = se,
            statistic = z,
            p.value = p,
            conf.low = lb,
            conf.high = ub
          )
      }
  )
}

DATA PREPARATION

In this series of code chunks, all steps of data preparation are included. They comprise preparing the geospatial data and the survey data, as well as their spatial linking using GIS methods.

Geospatial and Macro Data

destatis

For the measure of centrality, we require information about the ZIP code of each municipal administration. This information is retrieved from the list of municipalities of destatis, which is added to a shapefile of municipality boundaries. This shapefile is also used to create the German region indicator used in the manuscript.

municipalties <-
  readr::read_csv2(
    "./data/31122017_Auszug_GV_manipulated.csv",
    col_types = cols(
      zip_code = col_character()
    )) %>% 
  dplyr::select(-contains("X")) %>% 
  dplyr::filter(!is.na(gem)) %>% 
  dplyr::mutate(
    RS = paste0(land, rb, kreis, vb, gem)
  )

municipalties_shapes <-
  sf::read_sf("./data/VG250_GEM.shp") %>% 
  sf::st_transform(3035) %>% 
  dplyr::left_join(
    municipalties %>% 
      dplyr::select(RS, zip_code),
    by = "RS"
  ) %>% 
  dplyr::mutate(
    RS_short = stringr::str_sub(RS, 1, 2) %>% as.numeric(),
    east = dplyr::if_else(RS_short <= 10, 0, 1) 
  )

ioer

The two land use indicators–soil sealing and green spaces–are from the IOER Monitor as described in the manuscript. The IOER provides access through a Web Coverage Service, which was used within QGIS. The raster files were exported from there and are imported in R here.

sealing_2015_100m <-
  raster::raster("./data/S40RG_2015_100m.tif") %>% 
  raster::readAll()

green_2016_100m <-
  raster::raster("./data/F01RG_2016_100m.tif") +
  raster::raster("./data/S08RG_2016_100m.tif") %>% 
  raster::readAll()

green_2018_100m <-
  raster::raster("./data/F01RG_2018_100m.tif") +
  raster::raster("./data/S08RG_2018_100m.tif") %>% 
  raster::readAll()

water_2016_100m <-
  raster::raster("./data/F11RG_2016_100m.tif") %>% 
  raster::readAll()

water_2018_100m <-
  raster::raster("./data/F11RG_2018_100m.tif") %>% 
  raster::readAll()

urban_green_2013_100m <- 
  raster::raster("./data/O01RG_2013_100m.tif") %>% 
  raster::readAll()

street_minus_ecotone_2016_100m <-
  raster::raster("./data/V02RG_2016_100m.tif") -
  raster::raster("./data/U30DG_2016_100m.tif")

street_minus_ecotone_2016_100m[street_minus_ecotone_2016_100m < 0] <- 0

street_minus_ecotone_2018_100m <-
  raster::raster("./data/V02RG_2018_100m.tif") -
  raster::raster("./data/U30DG_2018_100m.tif") %>% 
  raster::readAll()

street_minus_ecotone_2018_100m[street_minus_ecotone_2018_100m < 0] <- 0

map data

In Figure 1, data from the IOER, as well as data from the BKG are used. Geocoding of the fictional survey respondent’s address as well as the data cropping are done with the following code.

dresden_shape <-
  sf::read_sf("./data/VG250_GEM.shp") %>% 
  sf::st_transform(3035) %>% 
  dplyr::filter(GEN == "Dresden")

poi <-
  bkg_geocode(
    street = "Palaisplatz",
    house_number = "1",
    zip_code = "01097",
    place = "Dresden"
  ) %>% 
  dplyr::mutate(
    inspire_100m = spt_create_inspire_ids(data = ., type = "100m"),
    inspire_x = spt_extract_inspire_x(inspire_100m),
    inspire_y = spt_extract_inspire_y(inspire_100m)
  ) %>% 
  sf::st_drop_geometry() %>% 
  sf::st_as_sf(
    coords = c("inspire_x", "inspire_y"),
    crs = 3035
  ) %>% 
  dplyr::select(id, geometry)

poi_buffer <-
  poi %>% 
  sf::st_buffer(500)

poi_bbox <-
  sf::st_bbox(
    c(
      xmin = sf::st_coordinates(poi)[1] - 1000,
      xmax = sf::st_coordinates(poi)[1] + 1000,
      ymax = sf::st_coordinates(poi)[2] + 1000,
      ymin = sf::st_coordinates(poi)[2] - 1000
    ),
    crs = sf::st_crs(3035)
  )

poi_roads <-
  osmdata::getbb("Dresden") %>% 
  osmdata::opq(timeout = 25*100) %>%
  osmdata::add_osm_feature(
    "highway", 
    c("trunk", "primary", "secondary", "tertiary")
  )%>% 
  osmdata::osmdata_sf() %>% 
  .$osm_lines %>% 
  dplyr::select(osm_id) %>% 
  sf::st_transform(3035) %>%
  sf::st_crop(poi_bbox)

poi_elbe <-
  osmdata::getbb("Dresden") %>% 
  osmdata::opq(timeout = 25*100) %>%
  osmdata::add_osm_feature("waterway", "river") %>% 
  osmdata::osmdata_sf() %>% 
  .$osm_lines %>% 
  dplyr::select(osm_id) %>% 
  sf::st_transform(3035) %>%
  sf::st_crop(poi_bbox) %>% 
  sf::st_buffer(50) %>% 
  sf::st_crop(poi_bbox) 

poi_buildings <-
  osmdata::getbb("Dresden") %>% 
  osmdata::opq(timeout = 25*100) %>%
  osmdata::add_osm_feature(
    "building", 
    c(
      "apartments", "commcerical", "office", "cathredral", "church", "retail",
      "industrial", "warehouse", "hotel", "house", "civic", "government", 
      "public", "parking", "garages", "carport", "transportation", 
      "semidetached_house", "school",
      "conservatory"
    )
  ) %>% 
  osmdata::osmdata_sf() %>% 
  .$osm_polygons %>%
  dplyr::select(osm_id) %>% 
  sf::st_transform(3035) %>%
  sf::st_crop(poi_bbox)

poi_sealing <-
  raster::raster("./data/S40RG_2015_100m.tif") %>% 
  raster::crop(., dresden_shape) %>% 
  raster::rasterToPolygons() %>%
  as(., "sf") %>% 
  sf::st_transform(3035) %>% 
  sf::st_crop(., poi_bbox) %>% 
  dplyr::mutate(S40RG_2015_100m = ifelse(S40RG_2015_100m > 100, 100, S40RG_2015_100m))

poi_green <-
  (
    raster::raster("./data/F01RG_2018_100m.tif") +
      raster::raster("./data/S08RG_2018_100m.tif")
  ) %>% 
  raster::crop(., dresden_shape) %>% 
  raster::rasterToPolygons() %>%
  as(., "sf") %>% 
  sf::st_transform(3035) %>% 
  sf::st_crop(., poi_bbox) %>% 
  dplyr::mutate(layer = ifelse(layer > 100, 100, layer))

poi_scalebar_data <- 
  (poi_bbox + c(1000, 0, 0, -1950)) %>% 
  sf::st_as_sfc() %>%
  sf::st_sf()

poi_scalebar_data <-
  rbind(
    poi_scalebar_data %>% 
      {sf::st_bbox(.) + c(750, 0, -0, 0)} %>% 
      sf::st_as_sfc() %>%
      sf::st_sf(),
    poi_scalebar_data %>% 
      {sf::st_bbox(.) + c(500, 0, -250, 0)} %>% 
      sf::st_as_sfc() %>%
      sf::st_sf(),
    poi_scalebar_data %>% 
      {sf::st_bbox(.) + c(250, 0, -500, 0)} %>% 
      sf::st_as_sfc() %>%
      sf::st_sf(),
    poi_scalebar_data %>% 
      {sf::st_bbox(.) + c(0, 0, -750, 0)} %>%
      sf::st_as_sfc() %>%
      sf::st_sf()
  )

poi_compass_data <- 
  (poi_bbox + c(2000, 1000, 0, 0)) %>% 
  sf::st_as_sfc() %>%
  sf::st_sf()

Survey data

The georeferenced GGSS comprise two separated data files: the sensitive regional information containing the INSPIRE IDs, and the actual survey data. As they have to processed separately, they are also imported independently.

ggss geodata

Here we load the sensitive regional information and convert the INSPIRE IDs to geocoordinates.

ggss_geodata <-
  haven::read_dta("Y:/allbus_georef_tmp.dta") %>% 
  sjlabelled::remove_all_labels() %>% 
  tibble::as_tibble() %>% 
  dplyr::filter(year == 2016 | year == 2018) %>% 
  dplyr::select(
    respid, year, inspidha
  ) %>% 
  dplyr::mutate(
    x = spt_extract_inspire_x(inspidha),
    y = spt_extract_inspire_y(inspidha)
  ) %>% 
  sf::st_as_sf(., coords = c("x", "y"), crs = 3035)

ggss normal survey data

The following procedure loads the 2016 and 2018 GGSS survey data and stores them in one harmonized file.

common_ggss_variables <- 
  c("respid", "year", "age", "sex", "educ", "hhinc", "dh04")

ggss_2016_2018 <-
  rbind(
    haven::read_sav("./data/ZA5250_v2-1-0.sav") %>% 
      dplyr::mutate(year = 2016, sample_point = paste0(xs11, "_16")) %>% 
      dplyr::select(
        common_ggss_variables,
        dn01 = dn01a, fdm01 = fdm01a, mdm01 = mdm01a,
        sample_point
      ) %>% 
      sjlabelled::remove_all_labels(),
    haven::read_sav("./data/ZA5270_v2-0-0.sav") %>% 
      dplyr::mutate(year = 2018, sample_point = paste0(xs11, "_18")) %>% 
      dplyr::select(
        dplyr::all_of(common_ggss_variables), dn01, fdm01, mdm01, sample_point
      ) %>% 
      sjlabelled::remove_all_labels()
  ) %>% 
  dplyr::transmute(
    respid,
    year,
    age  = age %>%
      as.numeric(),
    gender = 
      sex %>% 
      car::recode('1 = 0; 2 = 1') %>% 
      as.factor(),
    edu  =
      educ %>%
      car::recode('1:2 = 1; 3 = 2; 4:5 = 3; 6:7 = NA') %>%
      as.factor() %>% 
      forcats::fct_recode(low = "1", middle = "2", high = "3"),
    hhsize  = 
      dh04 %>%
      as.numeric(),
    hh_eq_income = (hhinc / sqrt(hhsize)) / 1000,
    sample_point,
    citizenship = dn01,
    country_father = fdm01,
    country_mother = mdm01
  )

Computing the variable for migration background needs multiple steps following boolean requests on the data. This was done the following way.

ggss_2016_2018 <-
  ggss_2016_2018 %>% 
  dplyr::rowwise() %>%
  dplyr::mutate(
    migration_background_indicator =
      any(
        citizenship != 0,
        (country_father != 0 & country_father != 996),
        (country_mother != 0  & country_mother != 996)
      )
  ) %>%
  dplyr::mutate(
    migration_background =
      ifelse(
        isFALSE(migration_background_indicator),
        ifelse(
          citizenship == 0 &
            (country_father == 0 | country_father == 996) &
            (country_mother == 0 | country_mother == 996),
          "German",
          NA
        ),
        "Migrant"
      )
  ) %>% 
  dplyr::select(
    -citizenship, -country_father, -country_mother, 
    -migration_background_indicator
  ) %>% 
  ungroup()

ggss_2016_2018$migration_background <-
  ggss_2016_2018$migration_background %>%
  as.factor()

Spatial Linking

After the survey data were prepared, the sensitive regional information of the GGSS is spatially linked with the information from the geospatial data. The first batch of data is added in the following code chunk.

ggss_geodata_linked <-
  ggss_geodata %>% 
  dplyr::bind_cols(
    RS = { spl_simple_join(., municipalties_shapes, "RS") },
    east = { spl_simple_join(., municipalties_shapes, "east") },
    sealing_100 = { spl_buffer(., sealing_2015_100m, 25) },
    sealing_250 = { spl_buffer(., sealing_2015_100m, 250) },
    sealing_500 = { spl_buffer(., sealing_2015_100m, 500) },
    sealing_750 = { spl_buffer(., sealing_2015_100m, 750) },
    sealing_1000 = { spl_buffer(., sealing_2015_100m, 1000) },
    green_100 = {
      ifelse(
        .$year == "2016",
        spl_buffer(., green_2016_100m, 25),
        spl_buffer(., green_2018_100m, 25)
      )
    },
    green_250 = {
      ifelse(
        .$year == "2016",
        spl_buffer(., green_2016_100m, 250),
        spl_buffer(., green_2018_100m, 250)
      )
    },
    green_500 = {
      ifelse(
        .$year == "2016",
        spl_buffer(., green_2016_100m, 500),
        spl_buffer(., green_2018_100m, 500)
      )
    },
    green_750 = {
      ifelse(
        .$year == "2016",
        spl_buffer(., green_2016_100m, 750),
        spl_buffer(., green_2018_100m, 750)
      )
    },
    green_1000 = {
      ifelse(
        .$year == "2016",
        spl_buffer(., green_2016_100m, 1000),
        spl_buffer(., green_2018_100m, 1000)
      )
    },
    blue_500 = {
      ifelse(
        .$year == "2016",
        spl_buffer(., water_2016_100m, 500),
        spl_buffer(., water_2018_100m, 500)
      )
    },
    street_minus_ecotone_500 = {
      ifelse(
        .$year == "2016",
        spl_buffer(., street_minus_ecotone_2016_100m, 500),
        spl_buffer(., street_minus_ecotone_2018_100m, 500)
      )
    },
    urban_green_500 = { spl_buffer(., urban_green_2013_100m, 500) }
  ) %>% 
  dplyr::left_join(
    municipalties_shapes %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(RS, EWZ, zip_code),
    by = "RS"
  ) %>%
  
  dplyr::filter(EWZ != 0) %>% 
  
  # complete spatial linking
  tibble::as_tibble()

eval(
  parse(
    text = paste0(
      "ggss_geodata_linked$sealing <- ggss_geodata_linked$sealing_",
      default_buffer_size
    )
  )
)

eval(
  parse(
    text = paste0(
      "ggss_geodata_linked$green <- ggss_geodata_linked$green_",
      default_buffer_size
    )
  )
)

Calculating city centers requires geocoding of the ZIP code information, which is done in the following. After this information was retrieved, it is added to the data file created in the previous code chunk.

# requires access to services of bkg
zip_geocodes <-
  dplyr::left_join(
    ggss_geodata_linked %>% 
      dplyr::select(RS),
    municipalties_shapes %>% 
      sf::st_drop_geometry() %>% 
      dplyr::select(RS, zip_code, GEN),
    by = "RS"
  ) %>% 
  dplyr::distinct() %>% 
  bkg_geocode(
    data = .,
    street = "XXX",
    house_number = "XXX",
    zip_code = "zip_code",
    place = "GEN"
  ) %>% 
  dplyr::select(RS, GEN, geometry)

gem_distances <-
  dplyr::left_join(
    ggss_geodata_linked,
    zip_geocodes %>% 
      dplyr::rename(geometry_zip = geometry),
    by = "RS"
  ) %>% 
  dplyr::mutate(
    gem_distance = 
      sf::st_distance(
        geometry, geometry_zip, by_element = TRUE
      ) %>% 
      as.numeric()
  ) #%>% 
# dplyr::filter(gem_distance < 100000)

ggss_geodata_linked <-
  ggss_geodata_linked %>% 
  dplyr::left_join(
    gem_distances %>% 
      dplyr::select(respid, year, RS, gem_distance),
    by = c("respid", "year", "RS")
  ) %>% 
  dplyr::select(-inspidha, -geometry, -RS, -zip_code)

Finalizing

After the sensitive information is deleted from the spatially linked data file, it can be safely added to the survey data.

ggss_2016_2018_linked <-
  dplyr::left_join(
    ggss_2016_2018,
    ggss_geodata_linked %>% 
      dplyr::select(-urban_green_500), # remove urban green
    by = c("respid", "year")
  )

ggss_2016_2018_linked_urban_green <-
  dplyr::left_join(
    ggss_2016_2018,
    ggss_geodata_linked,
    by = c("respid", "year")
  )

Some last finalizations are conducted in the following. This includes creating the urbanicity indicator, data filtering, and deleting cases with missing data.

dat <-
  ggss_2016_2018_linked %>% 
  dplyr::mutate(
    urban =       
      EWZ %>%
      car::recode(
        '0:99999 = 0; 100000:Inf = 1'
      )
  ) %>% 
  dplyr::select(-respid) %>% 
  na.omit()

dat$year <- as.factor(dat$year)
dat$east <- as.factor(dat$east)
dat$urban <- as.factor(dat$urban)
dat$sample_point <- as.factor(dat$sample_point)

# urban green subsample
dat_urban_green <-
  ggss_2016_2018_linked_urban_green %>% 
  dplyr::mutate(
    urban =       
      EWZ %>%
      car::recode(
        '0:99999 = 0; 100000:Inf = 1'
      )
  ) %>% 
  dplyr::select(-respid) %>% 
  na.omit()

dat_urban_green$year <- as.factor(dat_urban_green$year)
dat_urban_green$east <- as.factor(dat_urban_green$east)
dat_urban_green$urban <- as.factor(dat_urban_green$urban)
dat_urban_green$sample_point <- as.factor(dat_urban_green$sample_point)

rm(
  list = 
    setdiff(
      ls(), 
      c(
        "dat", "dat_urban_green", "create_tidy_override", "r_to_stata_formula",
        "stata_data_path", "stata_margins", "stata_regress_robust"
      )
    )
)

ANALYSIS

In the analysis, several models with similar parameters are estimated. To avoid repetition, we define commonly used parameters here.

dependent_variables <- list(green = "green", sealing = "sealing")

common_control_variables <- 
  c(
    "age", "gender", "edu", "hhsize", "east", "year", "gem_distance", "EWZ",
    "migration_background"
  )

common_interaction_effects <- c("east:year")

control_variables =
  list(
    without_income = 
      common_control_variables,
    with_income =
      c(common_control_variables, "hh_eq_income", "I(hh_eq_income^2)"),
    with_income_interaction = 
      c(common_control_variables, "hh_eq_income", "I(hh_eq_income^2)")
  )

interaction_effects = 
  list(
    without_income =
      common_interaction_effects,
    with_income = 
      common_interaction_effects,
    with_income_interaction =
      c(
        common_interaction_effects,
        "migration_background:hh_eq_income",
        "migration_background:I(hh_eq_income^2)"
      )
  )

cluster_variable = "sample_point"

at_range <- seq(min(dat$hh_eq_income), max(dat$hh_eq_income), .1)

at_income <- 
  glue::glue(
    "at(hh_eq_income=({min(dat$hh_eq_income)}(.1){max(dat$hh_eq_income)}))"
  )

Soil Sealing

The models for soil sealing and green spaces are estimated separately. We start with soil sealing and then turn to green spaces.

Regression Estimation Models

The regression models estimated here are the basis for all post-estimation methods used below. They are also used in Table 2.

estimation_sealing_without_income <- 
  stata_regress_robust(
    dependent_variable = dependent_variables$sealing,
    controls = list(control_variables$without_income),
    interaction_effects = list(interaction_effects$without_income),
    stata_names = "estimation_sealing_without_income",
    clustervar = cluster_variable,
    data = dat
  )

estimation_sealing_with_income <- 
  stata_regress_robust(
    dependent_variable = dependent_variables$sealing,
    controls = list(control_variables$with_income),
    interaction_effects = list(interaction_effects$with_income),
    stata_names = "estimation_sealing_with_income",
    clustervar = cluster_variable,
    data = dat
  )

estimation_sealing_with_income_interaction <- 
  stata_regress_robust(
    dependent_variable = dependent_variables$sealing,
    controls = list(control_variables$with_income_interaction),
    interaction_effects =
      list(interaction_effects$with_income_interaction),
    stata_names = "estimation_sealing_with_income_interaction",
    clustervar = cluster_variable,
    data = dat
  )

estimations_sealing_joined <-
  stata_regress_robust(
    dependent_variable = dependent_variables$sealing,
    controls = control_variables,
    interaction_effects = interaction_effects,
    stata_names = "estimations_sealing_joined",
    clustervar = cluster_variable,
    data = dat,
    echo = FALSE
  )

Post Estimation Models

Simple Predictions

The lower row of Table 2 contains simple AMEs for migration background. They are calculated here.

predictions_sealing_migration_background <-
  dplyr::bind_rows(
    stata_margins(
      estimates = estimation_sealing_without_income,
      over = "migration_background", 
      stata_names = "tmp",
      data = dat
    ) %>% 
      .$estimation_results %>% 
      dplyr::mutate(
        model = "sealing_without_income",
        whom = c("German", "Migrant")
      ),
    stata_margins(
      estimates = estimation_sealing_with_income,
      over = "migration_background", 
      stata_names = "tmp",
      data = dat
    ) %>% 
      .$estimation_results %>% 
      dplyr::mutate(
        model = "sealing_with_income",
        whom = c("German", "Migrant")
      ),
    stata_margins(
      estimates = estimation_sealing_with_income_interaction,
      over = "migration_background", 
      stata_names = "tmp",
      data = dat
    ) %>% 
      .$estimation_results %>% 
      dplyr::mutate(
        model = "sealing_with_income_interaction",
        whom = c("German", "Migrant")
      )
  )

Between Models Margins Tests

Also, in Table 2, a between-models difference test of the AMEs is provided, which was calculated here.

margins_sealing_migration_background <- 
  stata_margins(
    estimates = estimations_sealing_joined, 
    dydx = "migration_background", 
    matsize = 2400,
    stata_names = "tmp",
    mlincom = c("(2 - 1)", "(3 - 2)", "(3 - 1)"),
    data = dat %>% 
      dplyr::mutate(
        sealing1 = sealing, # sealing_without_income
        sealing2 = sealing, # sealing_with_income
        sealing3 = sealing, # sealing_with_income_interaction
      )
  )

Interaction Prediction

To visualize the results, prediction models are used. Those of soil sealing are calculated here. The code chunk also includes preparing the data frame for ggplot to facilitate plotting of the results.

prediction_sealing_with_income_interaction  <-
  stata_margins(
    estimates = estimation_sealing_with_income_interaction,  
    over = "migration_background", 
    at = at_income,
    matsize = 2400,
    stata_names = "prediction_sealing_with_income_interaction",
    data = dat
  ) %>% 
  .$estimation_results %>% 
  dplyr::mutate(
    x = rep(at_range, each = 2),
    whom = rep(c("German", "Migrant"), times = length(at_range))
  ) %>% 
  dplyr::arrange(whom)

# add German data
prediction_sealing_with_income_interaction <-
  dplyr::bind_cols(
    prediction_sealing_with_income_interaction,
    coef_germ =
      c(rep(NA, length(at_range)),
        rep(
          prediction_sealing_with_income_interaction %>%
            dplyr::filter(whom == "German") %>%
            .$estimate, 1)
      ),
    lower_germ =
      c(rep(NA, length(at_range)),
        rep(
          prediction_sealing_with_income_interaction %>%
            dplyr::filter(whom == "German") %>%
            .$conf.low, 1)
      ),
    upper_germ =
      c(rep(NA, length(at_range)),
        rep(
          prediction_sealing_with_income_interaction %>%
            dplyr::filter(whom == "German") %>%
            .$conf.high, 1)
      )
  )

Average Marginal Effect

Another type of visualization is the plotting of the AME of migration background across the range of income. The calculation was straightforward with the following code.

AME_sealing_with_income_interaction <- 
  stata_margins(
    estimates = estimation_sealing_with_income_interaction, 
    dydx = "migration_background", 
    at = at_income,
    matsize = 2400,
    stata_names = "AME_sealing_with_income_interaction",
    data = dat
  ) %>% 
  .$estimation_results %>% 
  dplyr::slice(-c(1:length(at_range))) %>% 
  dplyr::mutate(x = at_range)

Test of Second Differences

An essential test in the manuscript is the test of second differences, as described in Mize, 2019. This test provides statistical significance levels for income slope difference between Germans and migrants.

second_differences_sealing <- 
  stata_margins(
    estimates = estimation_sealing_with_income_interaction, 
    over = "migration_background",
    at =  "at(hh_eq_income=(1)) at(hh_eq_income=(4))",
    mlincom = c("(3 - 1)", "(4 - 2)", "(3 - 1) - (4 - 2)"),
    matsize = 2400,
    stata_names = "second_differences_sealing",
    echo = FALSE,
    data = dat
  )

second_differences_dydx_sealing <- 
  stata_margins(
    estimates = estimation_sealing_with_income_interaction, 
    dydx = "hh_eq_income",
    over = "migration_background",
    mlincom = c("(2 - 1)"),
    matsize = 2400,
    stata_names = "second_differences_sealing",
    echo = FALSE,
    data = dat
  )

Green Spaces

The following code repeats the analysis from before, only with the land use indicator of green spaces. Therefore, it is not commented on.

Regression Estimation Models

estimation_green_without_income <- 
  stata_regress_robust(
    dependent_variable = dependent_variables$green,
    controls = list(control_variables$without_income),
    interaction_effects = list(interaction_effects$without_income),
    stata_names = "estimation_green_without_income",
    clustervar = cluster_variable,
    data = dat
  )

estimation_green_with_income <- 
  stata_regress_robust(
    dependent_variable = dependent_variables$green,
    controls = list(control_variables$with_income),
    interaction_effects = list(interaction_effects$with_income),
    stata_names = "estimation_green_with_income",
    clustervar = cluster_variable,
    data = dat
  )

estimation_green_with_income_interaction <- 
  stata_regress_robust(
    dependent_variable = dependent_variables$green,
    controls = list(control_variables$with_income_interaction),
    interaction_effects =
      list(interaction_effects$with_income_interaction),
    stata_names = "estimation_green_with_income_interaction",
    clustervar = cluster_variable,
    data = dat
  )

estimations_green_joined <-
  stata_regress_robust(
    dependent_variable = dependent_variables$green,
    controls = control_variables,
    interaction_effects = interaction_effects,
    stata_names = "estimations_green_joined",
    clustervar = cluster_variable,
    data = dat,
    echo = FALSE
  )

Post Estimation Models

Simple Predictions

predictions_green_migration_background <-
  dplyr::bind_rows(
    stata_margins(
      estimates = estimation_green_without_income,
      over = "migration_background", 
      stata_names = "tmp",
      data = dat
    ) %>% 
      .$estimation_results %>% 
      dplyr::mutate(
        model = "green_without_income",
        whom = c("German", "Migrant")
      ),
    stata_margins(
      estimates = estimation_green_with_income,
      over = "migration_background", 
      stata_names = "tmp",
      data = dat
    ) %>% 
      .$estimation_results %>% 
      dplyr::mutate(
        model = "green_with_income",
        whom = c("German", "Migrant")
      ),
    stata_margins(
      estimates = estimation_green_with_income_interaction,
      over = "migration_background", 
      stata_names = "tmp",
      data = dat
    ) %>% 
      .$estimation_results %>% 
      dplyr::mutate(
        model = "green_with_income_interaction",
        whom = c("German", "Migrant")
      )
  )

Between Models Margins Tests

margins_green_migration_background <- 
  stata_margins(
    estimates = estimations_green_joined, 
    dydx = "migration_background", 
    matsize = 2400,
    stata_names = "tmp",
    mlincom = c("(2 - 1)", "(3 - 2)", "(3 - 1)"),
    data = dat %>% 
      dplyr::mutate(
        green1 = green, # green_without_income
        green2 = green, # green_with_income
        green3 = green, # green_with_income_interaction
      )
  )

Interaction Prediction

prediction_green_with_income_interaction  <-
  stata_margins(
    estimates = estimation_green_with_income_interaction, 
    over = "migration_background", 
    at = at_income,
    matsize = 2400,
    stata_names = "prediction_green_with_income_interaction",
    data = dat
  ) %>% 
  .$estimation_results %>% 
  dplyr::mutate(
    x = rep(at_range, each = 2),
    whom = rep(c("German", "Migrant"), times = length(at_range))
  ) %>% 
  dplyr::arrange(whom)

# add German data
prediction_green_with_income_interaction <-
  dplyr::bind_cols(
    prediction_green_with_income_interaction,
    coef_germ =
      c(rep(NA, length(at_range)),
        rep(
          prediction_green_with_income_interaction %>%
            dplyr::filter(whom == "German") %>%
            .$estimate, 1)
      ),
    lower_germ =
      c(rep(NA, length(at_range)),
        rep(
          prediction_green_with_income_interaction %>%
            dplyr::filter(whom == "German") %>%
            .$conf.low, 1)
      ),
    upper_germ =
      c(rep(NA, length(at_range)),
        rep(
          prediction_green_with_income_interaction %>%
            dplyr::filter(whom == "German") %>%
            .$conf.high, 1)
      )
  )

Average Marginal Effect

AME_green_with_income_interaction <- 
  stata_margins(
    estimates = estimation_green_with_income_interaction, 
    dydx = "migration_background", 
    at = at_income,
    matsize = 2400,
    stata_names = "AME_green_with_income_interaction",
    echo = FALSE,
    data = dat
  ) %>% 
  .$estimation_results %>% 
  dplyr::slice(-c(1:length(at_range))) %>% 
  dplyr::mutate(x = at_range)

Test of Second Differences

second_differences_green <- 
  stata_margins(
    estimates = estimation_green_with_income_interaction, 
    over = "migration_background",
    at =  "at(hh_eq_income=(1)) at(hh_eq_income=(4))",
    mlincom = c("(3 - 1)", "(4 - 2)", "(3 - 1) - (4 - 2)"),
    matsize = 2400,
    stata_names = "second_differences_green",
    echo = FALSE,
    data = dat
  )

second_differences_dydx_green <- 
  stata_margins(
    estimates = estimation_green_with_income_interaction, 
    dydx = "hh_eq_income",
    over = "migration_background",
    mlincom = c("(2 - 1)"),
    matsize = 2400,
    stata_names = "second_differences_green",
    echo = FALSE,
    data = dat
  )

Urban Interaction

The robustness test in the manuscript seeks to test for differences in the previous findings between urban and non-urban municipalities. The code, therefore, repeats a lot of the procedures from before but adds the interaction with urbanicity.

Soil Sealing

We again start with the analysis of soil sealing…

Regression Estimation Model

…and the estimation of the underlying regression model used for the post-estimation methods.

estimation_urban_sealing <- 
  stata_regress_robust(
    dependent_variable = dependent_variables$sealing,
    controls = list(
      c(
        control_variables$with_income_interaction,
        "urban"
      )
    ),
    interaction_effects = list(
      c(
        "east:year", 
        "urban:migration_background",
        "urban:hh_eq_income",
        "urban:I(hh_eq_income^2)",
        "migration_background:hh_eq_income",
        "migration_background:I(hh_eq_income^2)",
        "urban:migration_background:hh_eq_income",
        "urban:migration_background:I(hh_eq_income^2)"
      )
    ),
    stata_names = "estimation_urban_sealing",
    clustervar = "sample_point",
    echo = FALSE,
    data = dat
  )

estimation_urban_no_interaction_sealing <- 
  stata_regress_robust(
    dependent_variable = dependent_variables$sealing,
    controls = list(
      c(
        control_variables$with_income_interaction,
        "urban"
      )
    ),
    interaction_effects = list(
      c(
        "east:year", 
        "urban:migration_background"
      )
    ),
    stata_names = "estimation_urban_no_interaction_sealing",
    clustervar = "sample_point",
    echo = FALSE,
    data = dat
  )

Prediction

Similar to the analyses before, we create predictions of the land use indicators for the income range across Germans and migrants. However, they are only included in the supplementary material.

prediction_urban_sealing <- 
  stata_margins(
    estimates = estimation_urban_sealing, 
    over = "migration_background", 
    at = 
      glue::glue(
        "at(hh_eq_income=({glue::glue_collapse(at_range, sep = ' ')}) urban=(1 2))"
      ),
    matsize = 4800,
    stata_names = "prediction_urban_sealing",
    echo = FALSE,
    data = dat
  ) %>% 
  .$estimation_results %>% 
  dplyr::mutate(
    x = rep(at_range, each = 4),
    whom = rep(c("German", "Migrant"), times = length(at_range) * 2),
    where = rep(c("Non-Urban", "Urban"), each = 2, times = length(at_range))
  ) %>% 
  dplyr::arrange(whom, where)

# add German data
prediction_urban_sealing <-
  dplyr::bind_cols(
    prediction_urban_sealing,
    coef_germ =
      c(rep(NA, length(at_range) * 2),
        rep(
          prediction_urban_sealing %>%
            dplyr::filter(whom == "German") %>%
            .$estimate, 1)
      ),
    lower_germ =
      c(rep(NA, length(at_range) * 2),
        rep(
          prediction_urban_sealing %>%
            dplyr::filter(whom == "German") %>%
            .$conf.low, 1)
      ),
    upper_germ =
      c(rep(NA, length(at_range) * 2),
        rep(
          prediction_urban_sealing %>%
            dplyr::filter(whom == "German") %>%
            .$conf.high, 1)
      )
  )

Average Marginal Effect

AMEs differentiated between urban and non-urban municipalities are also only part of the supplementary material and calculated here.

AME_urban_sealing <- 
  stata_margins(
    estimates = estimation_urban_sealing, 
    dydx = "migration_background", 
    at = 
      glue::glue(
        "at(hh_eq_income=({glue::glue_collapse(at_range, sep = ' ')}) urban=(1 2))"
      ),
    matsize = 4800,
    stata_names = "estimation_urban_sealing",
    data = dat
  ) %>% 
  .$estimation_results %>% 
  dplyr::slice(-c(1:(length(at_range) * 2))) %>%
  dplyr::mutate(
    x = rep(at_range, each = 2),
    where = rep(c("Non-Urban", "Urban"), times = length(at_range))
  )

AME_urban_no_interaction_sealing <- 
  stata_margins(
    estimates = estimation_urban_no_interaction_sealing, 
    dydx = "migration_background", 
    at = 
      glue::glue(
        "at(urban=(1 2))"
      ),
    matsize = 4800,
    stata_names = "estimation_urban_no_interaction_sealing",
    data = dat
  ) %>% 
  .$estimation_results %>% 
  dplyr::slice(3:4) %>%
  dplyr::mutate(where = c("Non-Urban", "Urban"))

Test of Second Differences

What is reported in Table 4 are tests of second differences between the different AMEs. This calculation is done here.

second_differences_dydx_urban_sealing <- 
  stata_margins(
    estimates = estimation_urban_sealing, 
    # dydx = "migration_background", 
    dydx = "hh_eq_income",
    over = "migration_background",
    at =
      "at(urban=(1 2))",
    mlincom = c("(2 - 1)", "(4 - 3)", "(4 - 3) - (2 - 1)"),
    echo = FALSE,
    matsize = 2400,
    stata_names = "second_differences_urban_sealing",
    data = dat
  )

second_differences_dydx_urban_sealing_default <- 
  stata_margins(
    estimates = estimation_urban_sealing, 
    dydx = "hh_eq_income",
    over = "migration_background",
    mlincom = c("(2 - 1)"),
    matsize = 2400,
    stata_names = "second_differences_sealing_default",
    echo = FALSE,
    data = dat
  )

second_differences_dydx_urban_sealing_no_interaction <- 
  stata_margins(
    estimates = estimation_urban_no_interaction_sealing, 
    dydx = "migration_background",
    # over = "migration_background",
    mlincom = c("(2 - 1)"),
    matsize = 2400,
    stata_names = "second_differences_sealing_no_interaction",
    echo = FALSE,
    data = dat
  )

Green Spaces

Again, the analysis of green spaces only differs in the dependent variable from the previous procedures. Therefore, I did not comment on them here.

Regression Estimation Model

estimation_urban_green <- 
  stata_regress_robust(
    dependent_variable = dependent_variables$green,
    controls = list(
      c(
        control_variables$with_income_interaction,
        "urban"
      )
    ),
    interaction_effects = list(
      c(
        "east:year", 
        "urban:migration_background",
        "urban:hh_eq_income",
        "urban:I(hh_eq_income^2)",
        "migration_background:hh_eq_income",
        "migration_background:I(hh_eq_income^2)",
        "urban:migration_background:hh_eq_income",
        "urban:migration_background:I(hh_eq_income^2)"
      )
    ),
    stata_names = "estimation_urban_green",
    clustervar = "sample_point",
    echo = FALSE,
    data = dat
  )

estimation_urban_no_interaction_green <- 
  stata_regress_robust(
    dependent_variable = dependent_variables$green,
    controls = list(
      c(
        control_variables$with_income_interaction,
        "urban"
      )
    ),
    interaction_effects = list(
      c(
        "east:year", 
        "urban:migration_background"
      )
    ),
    stata_names = "estimation_urban_no_interaction_green",
    clustervar = "sample_point",
    echo = FALSE,
    data = dat
  )

Prediction

prediction_urban_green <- 
  stata_margins(
    estimates = estimation_urban_green, 
    over = "migration_background", 
    at = 
      glue::glue(
        "at(hh_eq_income=({glue::glue_collapse(at_range, sep = ' ')}) urban=(1 2))"
      ),
    matsize = 4800,
    stata_names = "prediction_urban_green",
    data = dat
  ) %>% 
  .$estimation_results %>% 
  dplyr::mutate(
    x = rep(at_range, each = 4),
    whom = rep(c("German", "Migrant"), times = length(at_range) * 2),
    where = rep(c("Non-Urban", "Urban"), each = 2, times = length(at_range))
  ) %>% 
  dplyr::arrange(whom, where)

# add German data
prediction_urban_green <-
  dplyr::bind_cols(
    prediction_urban_green,
    coef_germ =
      c(rep(NA, length(at_range) * 2),
        rep(
          prediction_urban_green %>%
            dplyr::filter(whom == "German") %>%
            .$estimate, 1)
      ),
    lower_germ =
      c(rep(NA, length(at_range) * 2),
        rep(
          prediction_urban_green %>%
            dplyr::filter(whom == "German") %>%
            .$conf.low, 1)
      ),
    upper_germ =
      c(rep(NA, length(at_range) * 2),
        rep(
          prediction_urban_green %>%
            dplyr::filter(whom == "German") %>%
            .$conf.high, 1)
      )
  )

Average Marginal Effect

AME_urban_green <- 
  stata_margins(
    estimates = estimation_urban_green, 
    dydx = "migration_background", 
    at = 
      glue::glue(
        "at(hh_eq_income=({glue::glue_collapse(at_range, sep = ' ')}) urban=(1 2))"
      ),
    matsize = 4800,
    echo = FALSE,
    stata_names = "estimation_urban_green",
    data = dat
  ) %>% 
  .$estimation_results %>% 
  dplyr::slice(-c(1:(length(at_range) * 2))) %>% 
  dplyr::mutate(
    x = rep(at_range, each = 2),
    where = rep(c("Non-Urban", "Urban"), times = length(at_range))
  )

AME_urban_no_interaction_green <- 
  stata_margins(
    estimates = estimation_urban_no_interaction_green, 
    dydx = "migration_background", 
    at = 
      glue::glue(
        "at(urban=(1 2))"
      ),
    matsize = 4800,
    stata_names = "estimation_urban_no_interaction_green",
    data = dat
  ) %>% 
  .$estimation_results %>% 
  dplyr::slice(3:4) %>%
  dplyr::mutate(where = c("Non-Urban", "Urban"))

Test of Second Differences

second_differences_dydx_urban_green <- 
  stata_margins(
    estimates = estimation_urban_green, 
    dydx = "hh_eq_income", 
    over = "migration_background",
    at =
      "at(urban=(1 2))",
    mlincom = c("(2 - 1)", "(4 - 3)", "(4 - 3) - (2 - 1)"),
    echo = FALSE,
    matsize = 2400,
    stata_names = "second_differences_urban_green",
    data = dat
  )

second_differences_dydx_urban_green_default <- 
  stata_margins(
    estimates = estimation_urban_green, 
    dydx = "hh_eq_income",
    over = "migration_background",
    mlincom = c("(2 - 1)"),
    matsize = 2400,
    stata_names = "second_differences_green_default",
    echo = FALSE,
    data = dat
  )

second_differences_dydx_urban_green_no_interaction <- 
  stata_margins(
    estimates = estimation_urban_no_interaction_green, 
    dydx = "migration_background",
    # over = "migration_background",
    mlincom = c("(2 - 1)"),
    matsize = 2400,
    stata_names = "second_differences_green_no_interaction",
    echo = FALSE,
    data = dat
  )

TABLES

The creation of tables and figures of the manuscript may not need any comments, as it should be clear from the text.

Table 1: Descriptives

dat_german <-
  dat %>% 
  dplyr::filter(migration_background == "German")

dat_migrant <-
  dat %>% 
  dplyr::filter(migration_background == "Migrant")

descriptives <- 
  rbind(
    c(
      mean(dat_german$sealing), 
      sd(dat_german$sealing), 
      censor_variable(dat_german, sealing), 
      censor_variable(dat_german, sealing, descending = TRUE)
    ),
    c(
      mean(dat_german$green), 
      sd(dat_german$green), 
      censor_variable(dat_german, green), 
      censor_variable(dat_german, green, descending = TRUE)
    ),
    c(mean(dat_german[["hh_eq_income"]]), 
      sd(dat_german[["hh_eq_income"]]), 
      min(dat_german[["hh_eq_income"]]), 
      max(dat_german[["hh_eq_income"]])),
    c(mean(dat_german[["age"]], na.rm = TRUE), 
      sd(dat_german[["age"]], na.rm = TRUE),
      min(dat_german[["age"]], na.rm = TRUE), 
      max(dat_german[["age"]], na.rm = TRUE)),
    c(
      mean(dat_german$gender %>% as.numeric()) - 1,
      sd(dat_german$gender %>% as.numeric()), 
      min(dat_german$gender %>% as.numeric()) - 1, 
      max(dat_german$gender %>% as.numeric()) - 1
    ),
    rep(NA, 4),
    cbind(
      prop.table(table(dat_german$edu) %>% 
                   as.vector(.)) * 100, 
      c(rep(NA, 3)), 
      c(rep(NA, 3)),
      c(rep(NA, 3))
    ),
    c(mean(dat_german[["hhsize"]], na.rm = TRUE), 
      sd(dat_german[["hhsize"]], na.rm = TRUE),
      min(dat_german[["hhsize"]], na.rm = TRUE), 
      max(dat_german[["hhsize"]], na.rm = TRUE)),
    c(
      mean(dat_german$east %>% as.numeric()) - 1,
      sd(dat_german$east %>% as.numeric()), 
      min(dat_german$east %>% as.numeric()) - 1, 
      max(dat_german$east %>% as.numeric()) - 1
    ),
    rep(NA, 4),
    cbind(
      prop.table(table(dat_german$year) %>% 
                   as.vector(.)) * 100, 
      c(rep(NA, 2)), 
      c(rep(NA, 2)), 
      c(rep(NA, 2))),
    c(
      mean(dat_german$EWZ), 
      sd(dat_german$EWZ), 
      1000, 
      500000
    ),
    c(
      mean(dat_german$urban %>% as.numeric()) - 1,
      sd(dat_german$urban %>% as.numeric()), 
      min(dat_german$urban %>% as.numeric()) - 1, 
      max(dat_german$urban %>% as.numeric()) - 1
    ),
    c(
      mean(dat_german[["gem_distance"]], na.rm = TRUE), 
      sd(dat_german[["gem_distance"]], na.rm = TRUE),
      censor_variable(dat_german, gem_distance), 
      10000
    ),
    c(nrow(dat_german), NA, NA, NA)
  ) %>% 
  cbind(
    rbind(
      c(
        mean(dat_migrant$sealing), 
        sd(dat_migrant$sealing), 
        censor_variable(dat_migrant, sealing), 
        censor_variable(dat_migrant, sealing, descending = TRUE)
      ),
      c(
        mean(dat_migrant$green), 
        sd(dat_migrant$green), 
        censor_variable(dat_migrant, green), 
        censor_variable(dat_migrant, green, descending = TRUE)
      ),
      c(mean(dat_migrant[["hh_eq_income"]]), 
        sd(dat_migrant[["hh_eq_income"]]), 
        min(dat_migrant[["hh_eq_income"]]), 
        max(dat_migrant[["hh_eq_income"]])),
      c(mean(dat_migrant[["age"]], na.rm = TRUE), 
        sd(dat_migrant[["age"]], na.rm = TRUE),
        min(dat_migrant[["age"]], na.rm = TRUE), 
        max(dat_migrant[["age"]], na.rm = TRUE)),
      c(
        mean(dat_migrant$gender %>% as.numeric()) - 1,
        sd(dat_migrant$gender %>% as.numeric()), 
        min(dat_migrant$gender %>% as.numeric()) - 1, 
        max(dat_migrant$gender %>% as.numeric()) - 1
      ),
      rep(NA, 4),
      cbind(
        prop.table(table(dat_migrant$edu) %>% 
                     as.vector(.)) * 100, 
        c(rep(NA, 3)), 
        c(rep(NA, 3)),
        c(rep(NA, 3))
      ),
      c(mean(dat_migrant[["hhsize"]], na.rm = TRUE), 
        sd(dat_migrant[["hhsize"]], na.rm = TRUE),
        min(dat_migrant[["hhsize"]], na.rm = TRUE), 
        max(dat_migrant[["hhsize"]], na.rm = TRUE)),
      c(
        mean(dat_migrant$east %>% as.numeric()) - 1,
        sd(dat_migrant$east %>% as.numeric()), 
        min(dat_migrant$east %>% as.numeric()) - 1, 
        max(dat_migrant$east %>% as.numeric()) - 1
      ),
      rep(NA, 4),
      cbind(
        prop.table(table(dat_migrant$year) %>% 
                     as.vector(.)) * 100, 
        c(rep(NA, 2)), 
        c(rep(NA, 2)), 
        c(rep(NA, 2))),
      c(
        mean(dat_migrant$EWZ), 
        sd(dat_migrant$EWZ), 
        1000, 
        500000
      ),
      c(
        mean(dat_migrant$urban %>% as.numeric()) - 1,
        sd(dat_migrant$urban %>% as.numeric()), 
        min(dat_migrant$urban %>% as.numeric()) - 1, 
        max(dat_migrant$urban %>% as.numeric()) - 1
      ),
      c(
        mean(dat_migrant[["gem_distance"]], na.rm = TRUE), 
        sd(dat_migrant[["gem_distance"]], na.rm = TRUE),
        censor_variable(dat_migrant, gem_distance), 
        10000
      ),
      c(nrow(dat_migrant), NA, NA, NA)
    )
  ) %>% 
  magrittr::set_rownames(
    c(
      "Soil Sealing (500m Buffer)*",
      "Green Spaces (500m Buffer)*",
      "Income",
      "Age",
      "Gender (Female)",
      "Education:",
      "Low",
      "Medium",
      "High",
      "Household Size",
      "Region (Eastern Germany)",
      "Survey Year:",
      "2014",
      "2016",
      "Municipality Inhabitants*",
      "Urbanity (Urban)",
      "Distance City Center*",
      "Number of Respondents"
    )
  ) %>%
  magrittr::set_colnames(rep(c("Mean / N / %", "SD", "Minimum", "Maximum"), 2))

descriptives_html <-
  htmlTable::htmlTable(
    htmlTable::txtRound(descriptives, 3),
    tfoot="* Minimum and maximum values censored due to data protection",
    cgroup = c("Germans", "Migrants"),
    n.cgroup = c(4, 4),
    rgroup = c("Main Variables", "Individual Controls", "Inter-Individual Controls"),
    n.rgroup = c(3, 7),
    total = TRUE
  )

htmltools::save_html(
  descriptives_html,
  "./revise_and_resubmit_2/tables/Table 1.html"
)

Table 2: Regression Estimation Models

tab_regressions <- 
  huxtable::huxreg(
    "Without Income" = 
      create_tidy_override(
        estimation_sealing_without_income,
        glance_add = c(
          "AME" = 
            extract_coefficient(
              margins_sealing_migration_background$estimation_results, 
              4
            ),
          "AME_diff" = ""
        ) 
      ),
    "With Income" = 
      create_tidy_override(
        estimation_sealing_with_income,
        glance_add = c(
          "AME" = 
            extract_coefficient(
              margins_sealing_migration_background$estimation_results, 
              5
            ),
          "AME_diff" = 
            extract_coefficient(
              margins_sealing_migration_background$mlincom_results, 
              1
            )
        )
      ),
    "Income Interaction" = 
      create_tidy_override(
        estimation_sealing_with_income_interaction,
        glance_add = c(
          "AME" = 
            extract_coefficient(
              margins_sealing_migration_background$estimation_results, 
              6
            ),
          "AME_diff" = 
            extract_coefficient(
              margins_sealing_migration_background$mlincom_results, 
              2
            )
        )
      ),
    "Without Income" = 
      create_tidy_override(
        estimation_green_without_income,
        glance_add = c(
          "AME" = 
            extract_coefficient(
              margins_green_migration_background$estimation_results, 
              4
            ),
          "AME_diff" = ""
        )
      ),
    "With Income" = 
      create_tidy_override(
        estimation_green_with_income,
        glance_add = c(
          "AME" = 
            extract_coefficient(
              margins_green_migration_background$estimation_results, 
              5
            ),
          "AME_diff" = 
            extract_coefficient(
              margins_green_migration_background$mlincom_results, 
              1
            )
        )
      ),
    "Income Interaction" = 
      create_tidy_override(
        estimation_green_with_income_interaction,
        glance_add = c(
          "AME" = 
            extract_coefficient(
              margins_green_migration_background$estimation_results, 
              6
            ),
          "AME_diff" = 
            extract_coefficient(
              margins_green_migration_background$mlincom_results, 
              2
            )
        )
      ),
    statistics = c(
      "AME Migration Background" = "AME", 
      "Difference Previous Model" = "AME_diff", 
      "N" = "nobs", 
      "R²" = "r.squared"
    ),
    stars = c(`***` = 0.001, `**` =  0.01, `*` = 0.05, `+` = 0.1),
    coefs =
      c(
        "Migration Background" = "2.migration_background",
        "Income" = "hh_eq_income",
        "Income²" = "c.hh_eq_income#c.hh_eq_income",
        "Migration Background X Income" =
          "2.migration_background#c.hh_eq_income",
        "Migration Background X Income²" =
          "2.migration_background#c.hh_eq_income#c.hh_eq_income"
      ),
    note = "{stars}; Data source: GGSS 2014 & 2016 (ZA5250, ZA5270) and IOER Monitor; unstandardized estimates are based on cluster robust standard errors (sample point); all models control for age, gender, education, househould size, german region and survey year interaction, inhabitant size of municipality, and distance to municipality administration"
  ) %>% 
  rbind(c("", "Soil Sealing", "", "", "Green Spaces", "", ""), .) %>% 
  huxtable::merge_cells(1, 2:4) %>% 
  huxtable::merge_cells(1, 5:7)


huxtable::quick_html(tab_regressions, file = "./revise_and_resubmit_2/tables/Table 2.html")

Table 3: Second Differences Interaction Model

tab_second_differences_interaction_model <- 
  huxtable::huxreg(
    "Soil Sealing" =
      create_tidy_override_mlincom(
        second_differences_sealing,
        glance_add = c(
          "AME_all" = 
            extract_coefficient(
              second_differences_dydx_sealing$mlincom_results, 1)
        )
      ),
    "Green Spaces" =
      create_tidy_override_mlincom(
        second_differences_green,
        glance_add = c(
          "AME_all" = 
            extract_coefficient(
              second_differences_dydx_green$mlincom_results, 1)
        )
      ),
    stars = c(`***` = 0.001, `**` =  0.01, `*` = 0.05, `+` = 0.1),
    statistics = c("Migrant-German Income AME" = "AME_all"),
    coefs =
      c(
        "German 1000 -> 4000 EUR" = "(3 - 1)",
        "Migrant 1000 -> 4000 EUR" = "(4 - 2)",
        "German-Migrant 1000 -> 4000 EUR" = "(3 - 1) - (4 - 2)"
      ),
    note = "{stars}; Data source: GGSS 2014 & 2016 (ZA5250, ZA5270) and IOER Monitor; estimates are based on cluster robust standard errors (sample point); all models control for age, gender, education, househould size, german region and survey year interaction, inhabitant size of municipality, and distance to municipality administration"
  ) 


huxtable::quick_html(tab_second_differences_interaction_model, file = "./revise_and_resubmit_2/tables/Table 3.html")

Table 4: Second Differences Urban

sealing_tab_4 <-
  create_tidy_override_mlincom(
    second_differences_dydx_urban_sealing
  )

sealing_tab_4$model <- 
  rbind(
    c("interaction", rep(NA, 6)),
    sealing_tab_4$model, 
    second_differences_dydx_urban_sealing_default$mlincom_results %>% 
      dplyr::mutate(term = "overall"),
    c("no_interaction", rep(NA, 6)),
    AME_urban_no_interaction_sealing %>% dplyr::select(-where),
    second_differences_dydx_urban_sealing_no_interaction$estimation_results[2,] %>% 
      dplyr::mutate(term = "diff")
  )

sealing_tab_4$tidy_cols <- 
  rbind(
    c("interaction", rep(NA, 6)),
    sealing_tab_4$tidy_cols, 
    second_differences_dydx_urban_sealing_default$mlincom_results %>% 
      dplyr::mutate(term = "overall"),
    c("no_interaction", rep(NA, 6)),
    AME_urban_no_interaction_sealing %>% dplyr::select(-where),
    second_differences_dydx_urban_sealing_no_interaction$estimation_results[2,] %>% 
      dplyr::mutate(term = "diff")
  )

green_tab_4 <-
  create_tidy_override_mlincom(
    second_differences_dydx_urban_green
  )

green_tab_4$model <- 
  rbind(
    c("interaction", rep(NA, 6)),
    green_tab_4$model, 
    second_differences_dydx_urban_green_default$mlincom_results %>% 
      dplyr::mutate(term = "overall"),
    c("no_interaction", rep(NA, 6)),
    AME_urban_no_interaction_green %>% dplyr::select(-where),
    second_differences_dydx_urban_green_no_interaction$estimation_results[2,] %>% 
      dplyr::mutate(term = "diff")
  )

green_tab_4$tidy_cols <- 
  rbind(
    c("interaction", rep(NA, 6)),
    green_tab_4$tidy_cols, 
    second_differences_dydx_urban_green_default$mlincom_results %>% 
      dplyr::mutate(term = "overall"),
    c("no_interaction", rep(NA, 6)),
    AME_urban_no_interaction_green %>% dplyr::select(-where),
    second_differences_dydx_urban_green_no_interaction$estimation_results[2,] %>% 
      dplyr::mutate(term = "diff")
  )




tab_second_differences_urban <-
  huxtable::huxreg(
    "Soil Sealing" = sealing_tab_4,
    "Green Spaces" = green_tab_4,
    statistics = c("Migrant-German Income AME" = "AME_all"),
    stars = c(`***` = 0.001, `**` =  0.01, `*` = 0.05, `+` = 0.1),
    coefs = c(
      "Income Interaction Model" = "interaction",
      "Migrant - German Income AME (Non-Urban)" = "(2 - 1)",
      "Migrant - German Income AME (Urban)" = "(4 - 3)",
      "Non-Urban - Urban Income" = "(4 - 3) - (2 - 1)",
      "Migrant - German Income AME (Overall)" = "overall",
      "No Income Interaction Model" = "no_interaction",
      "Migrant - German AME (Non-Urban)" = 
        "2.migration_background:1bn._at",
      "Migrant - German AME (Urban)" = 
        "2.migration_background:2._at",
      "Non-Urban - Urban" = "diff"
    ),
    note = "{stars}; Data source: GGSS 2014 & 2016 (ZA5250, ZA5270) and IOER Monitor; estimates are based on cluster robust standard errors (sample point); all models control for age, gender, education, househould size, german region and survey year interaction, inhabitant size of municipality, and distance to municipality administration"
  ) %>% 
  huxtable::set_bold(c(2, 12), 1)


huxtable::quick_html(
  tab_second_differences_urban, 
  file = "./revise_and_resubmit_2/tables/Table 4.html"
)

FIGURES

green_limits <- ylim(20, 60)
sealing_limits <- ylim(20, 60)
green_minus_blue_limits <- ylim(20, 60)
blue_limits <- ylim(0, 5)
streets_minus_ecotone_limits <- ylim(0, 25)
urban_green_limits <- ylim(0, 100)
income_limits <- xlim(0, 10)
AME_limits_sealing <- ylim(-15, 75)
AME_limits_green <- ylim(-75, 15)
AME_limits_green_minus_blue <- ylim(-75, 15)
AME_limits_blue <- ylim(-5, 2.5)
AME_limits_streets_minus_ecotone <- ylim(-5, 25)
AME_limits_urban_green <- ylim(-20, 20)

# labels 
income_label <- "Equivalized Household Income in 1000 €"

caption_label <- 
  paste0(
    "Data source: GGSS 2016 & 2018; ",
    "N = ", nrow(dat), "; ",
    "95% confidence intervals based on cluster-robust ",
    "standard errors (sample point); \n",
    "all models control for age, gender, education, ",
    "househould size, german region and survey year interaction, ",
    "inhabitant size of \nmunicipality, ",
    "and distance to municipality administration; ",
    "display of predictions is censored after an income of 10\n",
    "(i.e., 10,000 €), but underlying estimates are based on full income range."
  )

Figure 1

fig_3d <- 
  ggplot() + 
  
  # river
  geom_sf(
    data = poi_elbe %>% rotate_data(),
    fill = "skyblue", 
    color = "skyblue"
  ) +
  
  # roads
  geom_sf(data = poi_roads %>% rotate_data(),
          aes(fill = code), fill = "gray", color = "gray") +
  
  # buildings
  geom_sf(data = poi_buildings %>% rotate_data(),
          aes(fill = code), fill = "grey", color = "grey") +
  
  # buffer
  geom_sf(
    data = poi_buffer %>% rotate_data(y_add = 0),
    color = "black",
    fill = "grey",
    linetype = "dashed",
    alpha = .5
  ) +
  
  # point
  geom_sf(data = poi %>% rotate_data(),
          color = "black", size = 3) +
  
  # frame
  geom_sf(
    data = poi_bbox %>% sf::st_as_sfc() %>% sf::st_sf() %>% rotate_data(),
    color = "black",
    fill = NA
  ) +
  
  # label
  annotate("text", x = 13226000, y = 1056500, 
           label = "Respondent's\nLocation") +
  
  # scalebar
  geom_sf(
    data = poi_scalebar_data %>% rotate_data(x_add = -150, y_add = -100),
    color = "black",
    fill = rep(c("black", "white"), 2)
  ) +
  
  annotate("text", 
           x = poi_scalebar_data %>% 
             rotate_data(x_add = -150, y_add = -100) %>% 
             sf::st_bbox() %>% 
             .$xmin, 
           y = poi_scalebar_data %>% 
             rotate_data(x_add = -150, y_add = -100) %>% 
             sf::st_bbox() %>% 
             .$ymax - 150,
           label = "0 km",
           angle = -12,
           size = 3
  ) +
  
  annotate("text", 
           x = poi_scalebar_data %>% 
             rotate_data(x_add = -150, y_add = -100) %>% 
             sf::st_bbox() %>% 
             .$xmax - 200, 
           y = poi_scalebar_data %>% 
             rotate_data(x_add = -150, y_add = -100) %>% 
             sf::st_bbox() %>% 
             .$ymin - 130,
           label = "1 km",
           angle = -12,
           size = 3
  ) +
  
  # compass
  geom_segment(
    data = poi_compass_data %>% 
      rotate_data(x_add = -250, y_add = -300) %>% 
      sf::st_bbox() %>% 
      {data.frame(xmin = .$xmin, xmax = .$xmax, ymin = .$ymin, ymax = .$ymax)},
    aes(x = xmin, y = ymin, xend = xmax, yend = ymax),
    size = 1.2,
    arrow = arrow(length = unit(0.2, "cm"), type = "closed")
  ) +
  
  annotate("text", 
           x = poi_compass_data %>% 
             rotate_data(x_add = -250, y_add = -300) %>% 
             sf::st_bbox() %>% 
             .$xmax + 190, 
           y = poi_scalebar_data %>% 
             rotate_data(x_add = -250, y_add = -300) %>% 
             sf::st_bbox() %>% 
             .$ymax + 1400,
           label = "N",
           angle = -50,
           size = 3
  ) +
  
  # soil sealing
  geom_sf(
    data = poi_sealing %>% rotate_data(y_add = 2000),
    aes(fill = S40RG_2015_100m), 
    color = NA,
    fill = color_gradient(poi_sealing$S40RG_2015_100m, "Reds")
  ) +
  
  # buffer
  geom_sf(
    data = poi_buffer %>% rotate_data(y_add = 2000),
    color = "black",
    fill = "grey",
    linetype = "dashed",
    alpha = .5
  ) +
  
  # point
  geom_sf(data = poi %>% rotate_data(y_add = 2000),
          color = "black", size = 3) +
  
  # label
  annotate("text", x = 13226000, y = 1058500, 
           label = "Soil Sealing") +
  
  # green spaces
  geom_sf(
    data = poi_green %>% rotate_data(y_add = 4000),
    aes(fill = layer),
    color = NA,
    fill = color_gradient(poi_green$layer, "Greens")
  ) +
  
  # buffer
  geom_sf(
    data = poi_buffer %>% rotate_data(y_add = 4000),
    color = "black",
    fill = "grey",
    linetype = "dashed",
    alpha = .5
  ) +
  
  # point
  geom_sf(data = poi %>% rotate_data(y_add = 4000),
          color = "black", size = 3) +
  
  # label
  annotate("text", x = 13226000, y = 1060500, 
           label = "Green Spaces") +
  
  ggsn::blank() +
  
  labs(
    caption = 
      paste0(
        "Data Source: OpenStreetMap, ",
        "Federal Agency for Cartography and Geodesy (BKG), \n",
        "Leibniz Institute for Ecological and Regional Development (IOER)"
      )
  ) +
  theme(plot.caption = element_text(hjust = 0))


ggsave(
  "./revise_and_resubmit_2/figures/FIGURE 1.png",
  fig_3d,
  dpi = 600
)

Figure 2

fig_prediction_sealing_with_income_interaction <-
  ggplot(
    prediction_sealing_with_income_interaction,
    aes(x = x, y = estimate, group = whom)
  ) +
  geom_line(aes(colour = "Migrant")) +
  geom_line(aes(y = conf.high), linetype = "dashed") +
  geom_line(aes(y = conf.low), linetype = "dashed") +
  
  geom_line(aes(y = coef_germ, colour = "German")) +
  geom_line(aes(y = lower_germ), linetype = "dashed", color = "gray") +
  geom_line(aes(y = upper_germ), linetype = "dashed", color = "gray") +
  scale_linetype_manual("",
                        values = c("German" = "solid",
                                   "Migrant" = "solid")) +
  scale_colour_manual("",
                      labels = c("German", "Migrant"),
                      values = c("German" = "gray",
                                 "Migrant" = "black")) +
  sealing_limits +
  income_limits +
  labs(x = income_label,
       y = "Soil Sealing in %") +
  theme_bw() +
  theme(plot.caption = element_text(hjust = 0)) +
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        legend.position = c(0.35, 0.78),
        legend.title.align = 0)

fig_AME_sealing_with_income_interaction <-
  ggplot(AME_sealing_with_income_interaction, aes(x = x, y = estimate)) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_line() +
  geom_line(aes(y = conf.high), linetype = "dashed") +
  geom_line(aes(y = conf.low), linetype = "dashed") +
  
  geom_hline(yintercept = 0, linetype = "dashed") +
  AME_limits_sealing +
  income_limits +
  labs(x = income_label,
       y = "Average Marginal Effect:\nMigrant - German") +
  theme_bw() +
  theme(plot.caption = element_text(hjust = 0)) +
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        legend.title.align = 0)

fig_prediction_green_with_income_interaction <-
  ggplot(
    prediction_green_with_income_interaction,
    aes(x = x, y = estimate, group = whom)
  ) +
  geom_line(aes(colour = "Migrant")) +
  geom_line(aes(y = conf.high), linetype = "dashed") +
  geom_line(aes(y = conf.low), linetype = "dashed") +
  
  geom_line(aes(y = coef_germ, colour = "German")) +
  geom_line(aes(y = lower_germ), linetype = "dashed", color = "gray") +
  geom_line(aes(y = upper_germ), linetype = "dashed", color = "gray") +
  scale_linetype_manual("",
                        values = c("German" = "solid",
                                   "Migrant" = "solid")) +
  scale_colour_manual("",
                      labels = c("German", "Migrant"),
                      values = c("German" = "gray",
                                 "Migrant" = "black")) +
  green_limits +
  income_limits +
  labs(x = income_label,
       y = "Green Spaces in %") +
  theme_bw() +
  theme(plot.caption = element_text(hjust = 0)) +
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        legend.position = c(0.4, 0.2),
        legend.title.align = 0)

fig_AME_green_with_income_interaction <-
  ggplot(AME_green_with_income_interaction, aes(x = x, y = estimate)) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_line() +
  geom_line(aes(y = conf.high), linetype = "dashed") +
  geom_line(aes(y = conf.low), linetype = "dashed") +
  
  geom_hline(yintercept = 0, linetype = "dashed") +
  AME_limits_green +
  income_limits +
  labs(x = income_label,
       y = "Average Marginal Effect:\nMigrant - German") +
  theme_bw() +
  theme(plot.caption = element_text(hjust = 0)) +
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        legend.title.align = 0)

figure_2_new <-
  (fig_prediction_sealing_with_income_interaction + 
     ggtitle("A.1 Soil Sealing") +
     fig_AME_sealing_with_income_interaction + 
     ggtitle("A.2 Soil Sealing")) / 
  ( fig_prediction_green_with_income_interaction + 
      ggtitle("B.1 Green Spaces") +
      fig_AME_green_with_income_interaction  +
      ggtitle("B.2 Green Spaces"))  +
  patchwork::plot_annotation(
    caption = 
      paste0("Data source: GGSS 2016 & 2018; ",
             "N = ", nrow(dat), "; ",
             "95% confidence intervals based on cluster-robust ",
             "standard errors (sample point); \n",
             "all models control for age, gender, education, ",
             "househould size, german region and survey year interaction, ",
             "inhabitant size of \nmunicipality, ",
             "and distance to municipality administration"
      ),
    theme = theme(plot.caption = element_text(hjust = 0))
  )

ggsave(
  "./revise_and_resubmit_2/figures/FIGURE 2.png",
  figure_2_new,
  width = 12,
  height = 12,
  dpi = 600
)

Land Use Disadvantages in Germany: A Matter of Ethnic Income Inequalities - Main Analysis (Code)

Anonymous

2021-03-22

SESSION AND ENVIRONMENT SETUP

Packages

FUNCTIONS

Helper Functions

Spatial Methods

Stata Regressions

Stata Margins

DATA PREPARATION

Geospatial and Macro Data

destatis

ioer

map data

Survey data

ggss geodata

ggss normal survey data

Spatial Linking

Finalizing

ANALYSIS

Soil Sealing

Regression Estimation Models

Post Estimation Models

Simple Predictions

Between Models Margins Tests

Interaction Prediction

Average Marginal Effect

Test of Second Differences

Green Spaces

Regression Estimation Models

Post Estimation Models

Simple Predictions

Between Models Margins Tests

Interaction Prediction

Average Marginal Effect

Test of Second Differences

Urban Interaction

Soil Sealing

Regression Estimation Model

Prediction

Average Marginal Effect

Test of Second Differences

Green Spaces

Regression Estimation Model

Prediction

Average Marginal Effect

Test of Second Differences

TABLES

Table 1: Descriptives

Table 2: Regression Estimation Models

Table 3: Second Differences Interaction Model

Table 4: Second Differences Urban

FIGURES

Figure 1

Figure 2