Real Estate ProspectR

In this project we will examine how to use data provided by public data sources for analysis of home values, rental rates, and movements in home price over a 17 year time period from 2004 to 2020. Analysis of this nature can be used for investment research, home value analysis, appraisal, and comparative value analysis.

Introduction & Setup

This report uses data from current, Zillow Research real estate information database (zhvi), and additional archival data, also from Zillow, which was retrieved from Kaggle. The additional data was uploaded in 2017 and includes some data metrics that do not currently appear on the Zillow Research site. The datasets include median home price estimates and rental estimates for geographical areas between the zip code and state level.

Set State, Metro, and Zip codes to Explore

Set the state, metro (MSA) and zip code you are interested in researching. Some charts that deal with more specific subjects or geographic regions will use these as filters to limit the scope of analysis. Not all charts will use all parameters.

explore_statename <- "California"
explore_state <- "CA"
explore_city <- "Santa Rosa"
explore_metro <- "Santa Rosa"
explore_county <- "Sonoma"
explore_zip <- "95409"
explore_multi_metro <- c('Santa Rosa','San Francisco','San Jose')

Environment Setup

Load packages used by this analysis project and set the default to standard notation.

# Note: this process could take a couple of minutes
if(!require(dplyr)) install.packages("dplyr", repos = "http://cran.us.r-project.org")
if(!require(ggrepel)) install.packages("ggrepel", repos = "http://cran.us.r-project.org")
if(!require(tidyverse)) install.packages("tidyverse", repos = "http://cran.us.r-project.org")
if(!require(tidycensus)) install.packages("tidycensus", repos = "http://cran.us.r-project.org")
if(!require(caret)) install.packages("caret", repos = "http://cran.us.r-project.org")
if(!require(dslabs)) install.packages("dslabs", repos = "http://cran.us.r-project.org")
if(!require(rpart.plot)) install.packages("rpart.plot", repos = "http://cran.us.r-project.org")
if(!require(rgbif)) install.packages("rgbif", repos = "http://cran.us.r-project.org")
if(!require(corrr)) install.packages("corrr", repos = "http://cran.us.r-project.org")
if(!require(splitstackshape)) install.packages("splitstackshape", repos = "http://cran.us.r-project.org")
if(!require(mice)) install.packages("mice", repos = "http://cran.us.r-project.org")
if(!require(randomForest)) install.packages("randomForest", repos = "http://cran.us.r-project.org")
if(!require(data.table)) install.packages("data.table", repos = "http://cran.us.r-project.org")
if(!require(lubridate)) install.packages("lubridate", repos = "http://cran.us.r-project.org")
if(!require("rnoaa")) install.packages("rnoaa", repos = "http://cran.us.r-project.org")

library(randomForest)
library(mice)
library(tidyverse)
library(dslabs)
library(caret)
library(dplyr)
library(rpart.plot)
library("rgbif")
library(corrr)
library(splitstackshape)
library(data.table)
library(ggrepel)
library(tidycensus)
library(lubridate)
library('rnoaa')

options(scipen = 999)

Load Data Set

In this section we will conduct Extract, Transform & Load (ETL) processes on several data files. The data has been saved as a comma separated values (.csv) file in the /data/ directory relative to this Rmd file.

#set reload to false if you wish to rerun analysis without reloading the files.
reload=TRUE

#set teh working directory where the project is loaded. Data folder should be in this directory
setwd("D:/RProjects/RealEstateProspector/")

#the file name for the county level time series file
sourcefilename <- "./data/County_time_series.csv"
if(reload){
  counties <- read.csv(sourcefilename, header=TRUE, fill = TRUE, stringsAsFactors = FALSE, fileEncoding="UTF-8-BOM")
}
totalrows <- nrow(counties)

#the file name for the county level cross walk file
sourcefilename2 <- "./data/CountyCrossWalk_Zillow.csv"
if(reload){
  crosswalk <- read.csv(sourcefilename2, header=TRUE, fill = TRUE, stringsAsFactors = FALSE, fileEncoding="UTF-8-BOM")
}
totalrows2 <- nrow(crosswalk)

profiles <- counties %>% left_join(crosswalk, by=c("RegionName" = "FIPS"))

#the file name for the ZHVI source by Metro
sourcefilename3 <- "./data/Metro_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_mon.csv"
if(reload){
  zhvi <- read.csv(sourcefilename3, header=TRUE, fill = TRUE, stringsAsFactors = FALSE, fileEncoding="UTF-8-BOM")
}
totalrows3 <- nrow(zhvi)

#the file name for the ZORI source by Metro
sourcefilename4 <- "./data/Metro_ZORI_AllHomesPlusMultifamily_SSA.csv"
if(reload){
  zori <- read.csv(sourcefilename4, header=TRUE, fill = TRUE, stringsAsFactors = FALSE, fileEncoding="UTF-8-BOM")
}
totalrows4 <- nrow(zori)


#the file name for the ZHVI source by Zip
sourcefilename5 <- "./data/Zip_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_mon.csv"
if(reload){
  zhvi_zip <- read.csv(sourcefilename5, 
                       header=TRUE, 
                       fill = TRUE, 
                       stringsAsFactors = FALSE, 
                       fileEncoding="UTF-8-BOM")
}
totalrows5 <- nrow(zhvi_zip)

#Join ZHVI and ZORI Metro tables on RegionID
values <- zhvi %>% left_join(zori, by=c("RegionID" = "RegionID"))

We will now conduct some initial exploratory data analysis (EDA).

#These calls help with initial EDA - Commented out
#glimpse(zhvi)
#glimpse(zori)
#summary(zhvi)
#summary(zori)

The Data includes many characteristics of real estate data related to home values, inventories, time on market, rent estimates, and other factors.

Data Transformations

The Zillow ZHVI and ZORI Data will use end of year values for charts in this section. The ZHVI is a proprietary smoothed estimate of “typical home values” provided by zillow. It is not the exact median or average. Details about how the ZHVI is calculated is available on the Zillow research site: https://www.zillow.com/research/zhvi-user-guide/

Year End Home Values by Region

Year end home value data is available, grouped by metro.

#explore the ZHVI data on home values at the end of each year BY METRO
#2020 uses the october data that was latest available at time of this analysis
yearend_homevalues <- zhvi %>%
  select( RegionName, StateName,
                eoy_2004=X2004.12.31, 
                eoy_2005=X2005.12.31,
                eoy_2006=X2006.12.31,
                eoy_2007=X2007.12.31,
                eoy_2008=X2008.12.31,
                eoy_2009=X2009.12.31,
                eoy_2010=X2010.12.31,
                eoy_2011=X2011.12.31,
                eoy_2012=X2012.12.31,
                eoy_2013=X2013.12.31,
                eoy_2014=X2014.12.31,
                eoy_2015=X2015.12.31,
                eoy_2016=X2016.12.31,
                eoy_2017=X2017.12.31,
                eoy_2018=X2018.12.31,
                eoy_2019=X2019.12.31,
                eoy_2020=X2020.10.31) %>%
  pivot_longer(c('eoy_2004', 
                 'eoy_2005', 
                 'eoy_2006', 
                 'eoy_2007', 
                 'eoy_2008', 
                 'eoy_2009', 
                 'eoy_2010', 
                 'eoy_2011', 
                 'eoy_2012', 
                 'eoy_2013', 
                 'eoy_2014', 
                 'eoy_2015', 
                 'eoy_2016', 
                 'eoy_2017', 
                 'eoy_2018', 
                 'eoy_2019', 
                 'eoy_2020'
                 ), 
               names_to = "year", 
               values_to = "homevalue") %>% 
 mutate(varz='home_value') 

Year End Home Values by Zip Code

The year end home data is also available grouped by zip code. This data will be sued for “dril down” into specific cities since thre are generally too many zip codes in a state to be effectively visualized.

#explore the ZHVI data on home values at the end of each year BY ZIP
#2020 uses the october data that was latest available at time of this analysis

yearend_homevalues_byzip <- zhvi_zip %>%
  select( RegionID, RegionName, CountyName, City, StateName, Metro,
                eoy_2004=X2004.12.31, 
                eoy_2005=X2005.12.31,
                eoy_2006=X2006.12.31,
                eoy_2007=X2007.12.31,
                eoy_2008=X2008.12.31,
                eoy_2009=X2009.12.31,
                eoy_2010=X2010.12.31,
                eoy_2011=X2011.12.31,
                eoy_2012=X2012.12.31,
                eoy_2013=X2013.12.31,
                eoy_2014=X2014.12.31,
                eoy_2015=X2015.12.31,
                eoy_2016=X2016.12.31,
                eoy_2017=X2017.12.31,
                eoy_2018=X2018.12.31,
                eoy_2019=X2019.12.31,
                eoy_2020=X2020.10.31) %>%
  pivot_longer(c('eoy_2004', 
                 'eoy_2005', 
                 'eoy_2006', 
                 'eoy_2007', 
                 'eoy_2008', 
                 'eoy_2009', 
                 'eoy_2010', 
                 'eoy_2011', 
                 'eoy_2012', 
                 'eoy_2013', 
                 'eoy_2014', 
                 'eoy_2015', 
                 'eoy_2016', 
                 'eoy_2017', 
                 'eoy_2018', 
                 'eoy_2019', 
                 'eoy_2020'
                 ), 
               names_to = "year", 
               values_to = "homevalue") %>% 
 mutate(varz='home_value') 

Year-End Rent Estimate Data by Region

Rent data is more limited in the length of time the data was recorded, and is available for a limited number of metros only.

#Explore ZORI values - 2014 to present (Shorter time period)
yearend_rents <- zori %>% select(RegionName, 
                eoy_2014=X2014.12,
                eoy_2015=X2015.12,
                eoy_2016=X2016.12,
                eoy_2017=X2017.12,
                eoy_2018=X2018.12,
                eoy_2019=X2019.12,
                eoy_2020=X2020.10
                )  %>%
  pivot_longer(c('eoy_2014', 
                 'eoy_2015', 
                 'eoy_2016', 
                 'eoy_2017', 
                 'eoy_2018', 
                 'eoy_2019', 
                 'eoy_2020'
                 ), 
               names_to = "year", 
               values_to = "medianrent") %>% 
 mutate(varz='median_rent') 

Loading the USCB American Community Survey

In this section we will pull in some data from the US Census Bureau’s American Community Survey using the tidycensus package. An API Key acquired from the US Census web site.

If census_api_key(’’) is blank - you will need to go fetch a key of your own, and enter it inside the emoty quotes in this method call.

Unfortunately - Zillow does not use the GEOID assigned by the USCB - so the best we can do is try to use string comparisons and a state abbreviation index to find matching geography data in USCB data sources. Additional work should be done here to create a proper index table, but the county name will be sufficient for rough matching.

The following data set are using the 5-Year ACS which is recommended for use where precision is importrant - the year ACS covers all geographies and provides the largest sample sizes, while the 1 and 3 year ACS are only available for geographies with greater populations.

Many additional subjects are available for analysis as outlined in the link below. This analysis will be expanded to include additional subjects in future versions.

https://www.census.gov/programs-surveys/acs/guidance/subjects.html

#before running this code, you must plug in your own census API KEY:
#census_api_key("")

#For reduction of key usage, this data should be loaded only once per session or as needed 
#change this to TRUE to load the census data again. 
#This must be set to true before knitting or the knit may fail.
reloadcensus=TRUE

library(datasets)
states_index <- data.frame(STATEABB=as.character(state.abb),
                           STATENAME=as.character(state.name),
                           STATEREGION=as.character(state.region))

if(reloadcensus==TRUE){
  
joinsto <- "county"

population <- get_acs(geography = joinsto, variables = "B01001_001E")
income <- get_acs(geography = joinsto, variables = "B06011_001E")
medianage <- get_acs(geography = joinsto, variables = "B01002_001E")
renters <- get_acs(geography = joinsto, variables = "B25129_038E")

population <- population %>% separate(NAME, c("COUNTY","STATE"),sep=",") %>% mutate(population_est = estimate) %>% select(-estimate)
income <- income %>% separate(NAME, c("COUNTY","STATE"),sep=",") %>% mutate(income_est = estimate) %>% select(-estimate,-moe)
medianage <- medianage %>% separate(NAME, c("COUNTY","STATE"),sep=",") %>% mutate(medianage_est = estimate) %>% select(-estimate,-moe)
renters <- renters %>% separate(NAME, c("COUNTY","STATE"),sep=",") %>% mutate(renters_est = estimate) %>% select(-estimate,-moe)

population$STATE <- trimws(population$STATE)
income$STATE <- trimws(income$STATE)
medianage$STATE <- trimws(medianage$STATE)
renters$STATE <- trimws(renters$STATE)

population <- population %>% 
  mutate(STATE = as.character(STATE)) %>%
  left_join(states_index, by = c("STATE"="STATENAME"))

income <- income %>% 
  mutate(STATE = as.character(STATE)) %>%
  left_join(states_index, by = c("STATE"="STATENAME"))

medianage <- medianage %>% 
  mutate(STATE = as.character(STATE)) %>%
  left_join(states_index, by = c("STATE"="STATENAME"))

renters <- renters %>% 
  mutate(STATE = as.character(STATE)) %>%
  left_join(states_index, by = c("STATE"="STATENAME"))


joinsto <- "zcta"

population_zip <- get_acs(geography = joinsto, variables = "B01001_001E")
income_zip <- get_acs(geography = joinsto, variables = "B06011_001E")
medianage_zip <- get_acs(geography = joinsto, variables = "B01002_001E")
renters_zip <- get_acs(geography = joinsto, variables = "B25129_038E")

population_zip <- population_zip %>% mutate(population_est = estimate) %>% select(-estimate)
income_zip <- income_zip %>% mutate(income_est = estimate) %>% select(-estimate)
medianage_zip <- medianage_zip %>% mutate(medianage_est = estimate) %>% select(-estimate)
renters_zip <- renters_zip %>% mutate(renters_est = estimate) %>% select(-estimate)


}



population %>% slice(1:10)
## # A tibble: 10 x 8
##    GEOID COUNTY       STATE  variable    moe population_est STATEABB STATEREGION
##    <chr> <chr>        <chr>  <chr>     <dbl>          <dbl> <chr>    <chr>      
##  1 01001 Autauga Cou~ Alaba~ B01001_0~    NA          55380 AL       South      
##  2 01003 Baldwin Cou~ Alaba~ B01001_0~    NA         212830 AL       South      
##  3 01005 Barbour Cou~ Alaba~ B01001_0~    NA          25361 AL       South      
##  4 01007 Bibb County  Alaba~ B01001_0~    NA          22493 AL       South      
##  5 01009 Blount Coun~ Alaba~ B01001_0~    NA          57681 AL       South      
##  6 01011 Bullock Cou~ Alaba~ B01001_0~    NA          10248 AL       South      
##  7 01013 Butler Coun~ Alaba~ B01001_0~    NA          19828 AL       South      
##  8 01015 Calhoun Cou~ Alaba~ B01001_0~    NA         114618 AL       South      
##  9 01017 Chambers Co~ Alaba~ B01001_0~    NA          33660 AL       South      
## 10 01019 Cherokee Co~ Alaba~ B01001_0~    NA          25903 AL       South
income %>% slice(1:10)
## # A tibble: 10 x 7
##    GEOID COUNTY          STATE   variable   income_est STATEABB STATEREGION
##    <chr> <chr>           <chr>   <chr>           <dbl> <chr>    <chr>      
##  1 01001 Autauga County  Alabama B06011_001      29725 AL       South      
##  2 01003 Baldwin County  Alabama B06011_001      29802 AL       South      
##  3 01005 Barbour County  Alabama B06011_001      17963 AL       South      
##  4 01007 Bibb County     Alabama B06011_001      21958 AL       South      
##  5 01009 Blount County   Alabama B06011_001      26976 AL       South      
##  6 01011 Bullock County  Alabama B06011_001      21571 AL       South      
##  7 01013 Butler County   Alabama B06011_001      19685 AL       South      
##  8 01015 Calhoun County  Alabama B06011_001      24069 AL       South      
##  9 01017 Chambers County Alabama B06011_001      22754 AL       South      
## 10 01019 Cherokee County Alabama B06011_001      23121 AL       South
medianage %>% slice(1:10)
## # A tibble: 10 x 7
##    GEOID COUNTY          STATE   variable   medianage_est STATEABB STATEREGION
##    <chr> <chr>           <chr>   <chr>              <dbl> <chr>    <chr>      
##  1 01001 Autauga County  Alabama B01002_001          38.2 AL       South      
##  2 01003 Baldwin County  Alabama B01002_001          43   AL       South      
##  3 01005 Barbour County  Alabama B01002_001          40.4 AL       South      
##  4 01007 Bibb County     Alabama B01002_001          40.9 AL       South      
##  5 01009 Blount County   Alabama B01002_001          40.7 AL       South      
##  6 01011 Bullock County  Alabama B01002_001          40.2 AL       South      
##  7 01013 Butler County   Alabama B01002_001          40.8 AL       South      
##  8 01015 Calhoun County  Alabama B01002_001          39.6 AL       South      
##  9 01017 Chambers County Alabama B01002_001          42   AL       South      
## 10 01019 Cherokee County Alabama B01002_001          46.5 AL       South
renters %>% slice(1:10)
## # A tibble: 10 x 7
##    GEOID COUNTY          STATE   variable   renters_est STATEABB STATEREGION
##    <chr> <chr>           <chr>   <chr>            <dbl> <chr>    <chr>      
##  1 01001 Autauga County  Alabama B25129_038        5715 AL       South      
##  2 01003 Baldwin County  Alabama B25129_038       20034 AL       South      
##  3 01005 Barbour County  Alabama B25129_038        3654 AL       South      
##  4 01007 Bibb County     Alabama B25129_038        1763 AL       South      
##  5 01009 Blount County   Alabama B25129_038        4424 AL       South      
##  6 01011 Bullock County  Alabama B25129_038        1017 AL       South      
##  7 01013 Butler County   Alabama B25129_038        1955 AL       South      
##  8 01015 Calhoun County  Alabama B25129_038       13351 AL       South      
##  9 01017 Chambers County Alabama B25129_038        4376 AL       South      
## 10 01019 Cherokee County Alabama B25129_038        2433 AL       South

Analysis

In this section we will explore elements of data from the Zillow and ACS data sets in various charts. The charts will often examine a subject at increasingly specific levels of geographic hierarchy, starting at a national or state level and drilling down to the metro, city, or zip code.

Boxplot of Median Home Values by State

A box plot of median values by state may reveal which states have the highest median home values, and what the distribution of median values is in each state. The red, blue and green horizontal lines represent the maximium, median, and minimum (respectively) of all observations in the chart. The state you are exploring will be compared to the national median.

#get a list of states
statelist <- yearend_homevalues_byzip %>% 
  filter(!is.na(homevalue) & !is.na(StateName)) %>% 
  group_by(StateName) %>% 
  summarize(minptr = median(homevalue)) %>% 
  arrange(minptr) 
 

#extract top zips by state
topzips <- yearend_homevalues_byzip %>% 
  filter(City != 'Hollandale') %>% #the value in this county appears to be an anomaly/outlier
  filter(year == 'eoy_2020') %>% 
  group_by(StateName) %>% 
  arrange(-homevalue) %>% 
  slice(1:1)

topzip_instate <- topzips %>% filter(StateName == explore_state)

#extract top zips by state
botzips <- yearend_homevalues_byzip %>% 
  filter(City != 'Hollandale') %>% #the value in this county appears to be an anomaly/outlier
  filter(year == 'eoy_2020') %>% 
  group_by(StateName) %>% 
  arrange(homevalue) %>% 
  slice(1:1)

botzip_instate <- botzips %>% filter(StateName == explore_state)

#determine the highest, lowest and average median value
#these will be added as horizontal lines on the box plot
lowestmedian <- min(statelist$minptr)
meanmedian <- mean(statelist$minptr)
maxmedian <- max(statelist$minptr)

state_regions <- states_index %>% mutate(StateName = STATEABB, StateRegion = STATEREGION) %>% select(StateName,StateRegion)

yearend_homevalues_byzip <- yearend_homevalues_byzip %>% left_join(state_regions, by="StateName")

top_median_comp <- round((topzip_instate$homevalue / meanmedian) * 100,1)
bot_median_comp <- round((botzip_instate$homevalue / meanmedian) * 100,1)

#create some log10 transforms
yearend_homevalues_byzip %>% 
  filter(City != 'Hollandale') %>% 
  filter(year == 'eoy_2020') %>% 
  mutate(homevalue=log10(homevalue)) %>%
  select(StateName,homevalue) %>%
  gather(varz, length, 
  homevalue) %>%
  ggplot(aes(StateName, 
             length, 
             fill = StateName)) +
  geom_boxplot() +
  geom_text_repel(data=topzips,
            aes(x=StateName, 
             y=log10(homevalue),
             label=paste(City,StateName)),color="darkgreen") +
  geom_text_repel(data=botzips,
            aes(x=StateName, 
             y=log10(homevalue),
             label=paste(City,StateName)),color="red") +
  geom_hline(yintercept=log10(lowestmedian),color="red") +
  geom_hline(yintercept=log10(meanmedian),color="blue") +
  geom_hline(yintercept=log10(maxmedian),color="darkgreen") +
  #facet_wrap(~StateRegion, scales = "free", ncol=2) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position="bottom")  + 
  labs(title = "Home Values by Zip Code - Grouped by State", 
       subtitle = "Top city in each state is noted", 
       caption = paste("Data source:",sourcefilename)
       )

top_comp <- paste("The highest priced zip code in ", topzip_instate$StateName, " is ", topzip_instate$RegionName," in the City of ",topzip_instate$City," in ", topzip_instate$CountyName,". The median home sales price there is ", topzip_instate$homevalue,", which is ", top_median_comp,"% of the national median ", round(meanmedian,0) ,". ")

bot_comp <- paste("The highest priced zip code in ", botzip_instate$StateName, " is ", botzip_instate$RegionName," in the City of ",botzip_instate$City," in ", botzip_instate$CountyName,". The median home sales price there is ", botzip_instate$homevalue,", which is ", bot_median_comp,"% of the national median ", round(meanmedian,0) ,". ")

top_comp
## [1] "The highest priced zip code in  CA  is  94027  in the City of  Atherton  in  San Mateo County . The median home sales price there is  6541400 , which is  3597.5 % of the national median  181831 . "
bot_comp
## [1] "The highest priced zip code in  CA  is  92389  in the City of  Shoshone  in  Inyo County . The median home sales price there is  26669 , which is  14.7 % of the national median  181831 . "

The highest priced zip code in CA is 94027 in the City of Atherton in San Mateo County . The median home sales price there is 6541400 , which is 3597.5 % of the national median 181831 .

The highest priced zip code in CA is 92389 in the City of Shoshone in Inyo County . The median home sales price there is 26669 , which is 14.7 % of the national median 181831 .

Boxplot of Median Home Values by State in a Selected Region

This is the same home value data as above but zoomed in to a specific state. The locations with the highest home values are stamped for each metro.

selectregion <- 'West'
selectmetro <- explore_metro
year <- 2020
  
yearmarker <- paste('eoy_',year,sep='')

allvalues <- yearend_homevalues_byzip %>% 
  filter(StateRegion ==selectregion) %>% 
  filter(year == yearmarker)%>% 
  filter(!is.na(homevalue)) %>% pull(homevalue)

minmedian <- min(allvalues)
midmedian <- median(allvalues)
maxmedian <- max(allvalues)

toplocation <- yearend_homevalues_byzip %>% 
  filter(StateRegion ==selectregion) %>% 
  filter(year == yearmarker) %>% 
  group_by(StateName) %>% 
  arrange(-homevalue) %>% 
 slice(1:1)

botlocation <- yearend_homevalues_byzip %>% 
  filter(StateRegion ==selectregion) %>% 
  filter(year == yearmarker) %>% 
  group_by(StateName) %>% 
  arrange(homevalue) %>% 
 slice(1:1)

yearend_homevalues_byzip %>% 
  filter(StateRegion ==selectregion) %>% 
  filter(year == yearmarker)%>% 
  ggplot(aes(as.character(StateName), 
             log10(homevalue), 
             fill = StateName)) +
  geom_boxplot() +
  geom_text_repel(data=toplocation,color="darkgreen",
            aes(x=as.character(StateName), 
             y=log10(homevalue),
             label=paste(City,StateName))) +
  geom_text_repel(data=botlocation,color="red",
            aes(x=as.character(StateName), 
             y=log10(homevalue),
             label=paste(City,StateName))) +
  geom_hline(yintercept=log10(minmedian),color="red") +
  geom_hline(yintercept=log10(midmedian),color="blue") +
  geom_hline(yintercept=log10(maxmedian),color="darkgreen") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position="none")  + 
  labs(title = paste(year,selectregion,"Range of Median Home Values in Each Zipcode, by State"), 
       subtitle = "Taller bars indicate larger range of values", 
       caption = paste("Data source:",sourcefilename3))