--- title: "Customize ExPanD" author: "Joachim Gassen" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Customize ExPanD} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` By default, ExPanD offers a set of sample modification and data exploration/visualization components that support a typical EDA workflow. In many situations however, your data does not need all of these components or you might want to center the user experience on a sub-set of these components. In addition, you might want to change the ordering of components, add additional information or even allow the user to define additional variables interactively. You can customize ExPanD along the following dimensions: - Select components to include and their order - Add informative variable definitions - Customize components so that they display certain variables and specifications - Allow users to generate additional variables - Add html blocks to support the flow of your app This vignette will guide you through each of these opportunities by step-by-step developing a customized ExPanD app. The code in the vignette is additive, meaning that you can source each code block independently from the others to see how the 'ExPanD' app is changed by the customization step-by-step. ## Select components to include and their order When you start ExPanD in default mode, e.g., by ```{r intro, eval = FALSE} library(ExPanDaR) library(gapminder) ExPanD(df = gapminder, cs_id = "country", ts_id = "year") ``` ExPanD will include all available components in their default order: -------------------------------------------------------------------------------- Name Description ------------------------ ------------------------------------------------------- `sample_selection` A drop down menu to select the sample to be used and whether it should be balanced `subset_factor` A drop down menu to select a factor on which to limit the sample and an additional menu to select the value of that factor on which you want to focus the analysis `grouping` A drop down menu that gives you the option to name a grouping factor. The grouping factor allows the user to focus certain components on a sub-sample defined by a value of that factor. Also, the grouping component contains the dialog four outlier treatments `bar_chart` A component displaying a bar chart reporting the observations by time period `missing_values` A visual displaying the frequency of missing values for each variable and time period `udvars` A component that allows users to define additional variables (see below) `descriptive_table` A descriptive table for all numerical and logical variables contained in the sample `histogram` A histogram to display the distribution of a chosen numerical or logical variable `ext_obs` A list displaying the 5 most extreme observations for a chosen variable, overall or by time period `by_group_bar_graph` A bar graph that visualizes a chosen descriptive statistic by a chosen grouping factor `by_group_violin_graph` A by group violin plot, where users can choose the variable and the grouping factor `trend_graph` A graph that visualizes the development of up to three variables over time `quantile_trend_graph` A graph that visualizes the distribution of one chosen variable over time `by_group_trend_graph` A grouped time trend graph that visualizes the development of one variable across groups over time `corrplot` A visual representation of Spearman and Pearson correlations of the numerical and logical variables in the sample. Exact correlations are displayed when you hover above the respective cell with your mouse `scatter_plot` A scatter plot, where you can present up to 4 dimensions of your data (x, y, size, color). You can choose whether you want a LOESS line to be displayed and whether you want your sample to be limited to 1,000 observations `regression` A regression component where you can estimate a linear or logit regression, with up to two fixed effect levels and standard error clustering -------------------------------------------------------------------------------- To select the components that you want to display, you have two options: First, you can simply indicate the components that you want to _omit_. The remaining components stay in the original order. As an example, the 'gapminder' data set does not have any missing values and is a balanced sample. This mean that we can safely omit the sample selection component as well as the display of missing values. ``` {r omit_components, eval = FALSE} library(ExPanDaR) library(gapminder) data(gapminder) ExPanD(df = gapminder, cs_id = "country", ts_id = "year", components = c(sample_selection = FALSE, missing_values = FALSE)) ``` Alternatively, you can choose the components that you want to _include_. This also gives you the option to sort the components in your preferred order. Let us assume that we want our app to focus on the Preston curve association of GDP per capita and life expectancy in the 'gapminder' data. So maybe we want to include the following components: ``` {r select_components, eval = FALSE} library(ExPanDaR) library(gapminder) data(gapminder) ExPanD(df = gapminder, cs_id = "country", ts_id = "year", components = c(descriptive_table = TRUE, by_group_violin_graph = TRUE, scatter_plot = TRUE, regression = TRUE)) ``` OK. While we are at it, we can also include an informative title and a short intro sentence. ``` {r include_intro, eval = FALSE} library(ExPanDaR) library(gapminder) data(gapminder) ExPanD(df = gapminder, cs_id = "country", ts_id = "year", title = "Explore the Preston Curve", abstract = paste("This interactive display uses 'gapminder' data to", "let you explore the Preston Curve. Scroll down and enjoy!"), components = c(descriptive_table = TRUE, by_group_violin_graph = TRUE, scatter_plot = TRUE, regression = TRUE)) ``` ## Add variable definitions When you look at the descriptive table you see that it does not offer self-explanatory variable names. Also, no tool-tips are displayed when you hover over the variable names with your mouse. To add the tool-tip functionality, we need to specify a `df_def` data frame that describes the variables. The upside of this is that we no longer need to specify the cross-sectional and time series identifiers as these are also defined in the data frame ``` {r include_df_def, eval = FALSE} library(ExPanDaR) library(gapminder) data(gapminder) df_def <- data.frame( var_name = names(gapminder), var_def = c("Name of the country", "Continent where country is located", "Year of data", "Life expectancy in years at birth", "Population in million", "Gross Domestic Product (GDP) per capita"), type = c("cs_id", "factor", "ts_id", rep("numeric", 3)) ) gapminder$pop <- gapminder$pop / 1e6 ExPanD(df = gapminder, title = "Explore the Preston Curve", abstract = paste("This interactive display uses 'gapminder' data to", "let you explore the Preston Curve. Scroll down and enjoy!"), components = c(descriptive_table = TRUE, by_group_violin_graph = TRUE, quantile_trend_graph = TRUE, scatter_plot = TRUE, regression = TRUE), df_def = df_def) ``` Now tool-tips for the descriptive table inform about our variables. Neat. ## Customize components so that they display certain variables and specifications When you scroll down the components you will see that the scatter plot is not showing the typical Preston curve by default. We want to change that so that the scatter plot is based on the full sample and includes a LOESS line. It should show GDP per capita as X and life expectancy as Y, with population defining the size of the data points and the continent their color. Also, we want to estimate a panel model of life expectancy with country and year firm fixed effects, two-way clustered standard errors and GDP per capita as explanatory variable (Population is time constant in the 'gapminder' data set and as such subsumed by country fixed effects). There are two ways to achieve that, a simple one and a hard one. First, the simple one: Make the desired changes in the app, save the configuration of the app to a file that you can remember (button towards the bottom of the page), read that file with `readRDS()` into your environment and provide the resulting list as the parameter `config_list` to the `ExPanD()` function call. Below, I assume that you saved the configuration of the app as `my_config.RDS` to the working directory. ``` {r include_dl_clist, eval = FALSE} library(ExPanDaR) library(gapminder) data(gapminder) df_def <- data.frame( var_name = names(gapminder), var_def = c("Name of the country", "Continent where country is located", "Year of data", "Life expectancy in years at birth", "Population in million", "Gross Domestic Product (GDP) per capita"), type = c("cs_id", "factor", "ts_id", rep("numeric", 3)) ) gapminder$pop <- gapminder$pop / 1e6 clist <- readRDS("my_config.RDS") ExPanD(df = gapminder, title = "Explore the Preston Curve", abstract = paste("This interactive display uses 'gapminder' data to", "let you explore the Preston Curve. Scroll down and enjoy!"), components = c(descriptive_table = TRUE, by_group_violin_graph = TRUE, scatter_plot = TRUE, regression = TRUE), df_def = df_def, config_list = clist) ``` Now the hard one. You can specify the changes that you want to apply to the default configuration directly in the code. Take a look at `clist` to understand how the data is structured. I hope that most list member names are somewhat self explanatory. ``` {r include_manual_clist, eval = FALSE} library(ExPanDaR) library(gapminder) data(gapminder) df_def <- data.frame( var_name = names(gapminder), var_def = c("Name of the country", "Continent where country is located", "Year of data", "Life expectancy at birth, in years", "Population in million", "Gross Domestic Product (GDP) per capita in US-$, inflation-adjusted"), type = c("cs_id", "factor", "ts_id", rep("numeric", 3)), stringsAsFactors = FALSE ) gapminder$pop <- gapminder$pop / 1e6 clist <- list( scatter_x = "gdpPercap", scatter_y = "lifeExp", scatter_size = "pop", scatter_color = "continent", scatter_loess = TRUE, scatter_sample = FALSE, reg_y = "lifeExp", reg_x = "gdpPercap", reg_fe1 = "country", reg_fe2 = "year", cluster = "4" # Now this is hard to guess # 1: none, 2: first FE, 3: second FE, 4: both FE ) ExPanD(df = gapminder, title = "Explore the Preston Curve", abstract = paste("This interactive display uses 'gapminder' data to", "let you explore the Preston Curve. Scroll down and enjoy!"), components = c(descriptive_table = TRUE, by_group_violin_graph = TRUE, scatter_plot = TRUE, regression = TRUE), df_def = df_def, config_list = clist) ``` Now we have our customized components well documented in code. The scatter plot exhibits the Preston curve. The association is non-linear, potentially explaining the negative coefficient in the panel model. Maybe we want to allow the user to address this issue on-the-fly by defining additional variables? ## Allow users to generate additional variables This can be achieved by adding a component `udvars`. ``` {r include_udvs, eval = FALSE} library(ExPanDaR) library(gapminder) data(gapminder) df_def <- data.frame( var_name = names(gapminder), var_def = c("Name of the country", "Continent where country is located", "Year of data", "Life expectancy in years at birth", "Population in million", "Gross Domestic Product (GDP) per capita"), type = c("cs_id", "factor", "ts_id", rep("numeric", 3)), stringsAsFactors = FALSE ) gapminder$pop <- gapminder$pop / 1e6 clist <- list( scatter_x = "gdpPercap", scatter_y = "lifeExp", scatter_size = "pop", scatter_color = "continent", scatter_loess = TRUE, scatter_sample = FALSE, reg_y = "lifeExp", reg_x = "gdpPercap", reg_fe1 = "country", reg_fe2 = "year", cluster = "4" # No this is hard to guess 1: none, 2: first FE, 3: second FE, 4: both FE ) ExPanD(df = gapminder, title = "Explore the Preston Curve", abstract = paste("This interactive display uses 'gapminder' data to", "let you explore the Preston Curve. Scroll down and enjoy!"), components = c(descriptive_table = TRUE, by_group_violin_graph = TRUE, scatter_plot = TRUE, udvars = TRUE, regression = TRUE), df_def = df_def, config_list = clist) ``` This generates another component in your app allowing the user to define additional variables. It is positioned just below the scatter plot. For fun and jiggles, you can define `logGpdPercap <- log(gdpPercap)` and verify using the scatter plot that using a log-transformed explanatory variable makes the association more linear, reflecting also in a now positive coefficient in the regression model. If you worry about security, the code that is passed to the server via the user variable definition component is sandboxed in a sense that is only includes the variables of the sample and the functions listed in the help text of the component. Time to move to the finishing touches. ## Add html blocks to support the flow of your app From time to time it might be useful to include some additional text or other info in between components. For that, you have the option to include an arbitrary number of HTML blocks as additional components. Each html block will be wrapped in a fluid row, so you can use columns similar to what you would do in shiny. ``` {r include_html, eval = FALSE} library(ExPanDaR) library(gapminder) data(gapminder) df_def <- data.frame( var_name = names(gapminder), var_def = c("Name of the country", "Continent where country is located", "Year of data", "Life expectancy in years at birth", "Population in million", "Gross Domestic Product (GDP) per capita"), type = c("cs_id", "factor", "ts_id", rep("numeric", 3)), stringsAsFactors = FALSE ) gapminder$pop <- gapminder$pop / 1e6 clist <- list( scatter_x = "gdpPercap", scatter_y = "lifeExp", scatter_size = "pop", scatter_color = "continent", scatter_loess = TRUE, scatter_sample = FALSE, reg_y = "lifeExp", reg_x = "gdpPercap", reg_fe1 = "country", reg_fe2 = "year", cluster = "4" # No this is hard to guess 1: none, 2: first FE, 3: second FE, 4: both FE ) html_blocks <- c( paste('
As you see below, life expectancy varies widely", "across countries and continents. One potential reason for this", "variation is the difference in income levels across countries.", "This association is visualized by the", "", "Preston Curve that you also find below.", "