In this tutorial, you will learn how to import Pennsylvania Spatial Data Access (PASDA) features into R.


Step 1. Explore PASDA to identify the feature you’d like to import.


In this case I want to import Pennsylvania Department of Environmental Protection (DEP) data, so I’ll search by provider from the PASDA homepage. Alternatively, you can search by keywords. After you’ve entered your search, click on the magnifying glass icon to the right of the search (rather than clicking on Browse All Files, which will return all results).


I’d like to import DEP regions (polygons). Scroll to find the data. Click on the object.


Now click on the REST link


Then click on the Layers object DEP Regions 2011_04 (7). Copy the URL (https://mapservices.pasda.psu.edu/server/rest/services/pasda/DEP2/MapServer/7), this will be the basis of your R query.



Step 2. Use R to import PASDA data


We will rely on two R packages to import spatial features: sf and httr. Before proceeding, make sure these packages are installed.


If you would like to import a feature in its entirety, simply combine the hyperlink for the feature identified above https://mapservices.pasda.psu.edu/server/rest/services/pasda/DEP2/MapServer/7 with the following chunk, which tells R to return geometry, return all features, all fields, in geojson format /query?returnGeometry=true&where=1=1&outFields=*&f=geojson.


dep_regions <- sf::st_read('https://mapservices.pasda.psu.edu/server/rest/services/pasda/DEP2/MapServer/7/query?returnGeometry=true&where=1=1&outFields=*&f=geojson')


This returns the PA DEP Regions polygon feature. The coordinate reference system can be checked using sf::st_crs(dep_regions), showing that this feature uses WGS 84.


dep_regions
## Simple feature collection with 6 features and 8 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -80.5189 ymin: 39.71993 xmax: -74.68878 ymax: 42.26948
## Geodetic CRS:  WGS 84
##   OBJECTID DEPREGION SNAME                         LNAME         LOCATION
## 1        1         1  SERO    South East Regional Office Philadelphia, PA
## 2        2         2  NERO    North East Regional Office Wilkes Barre, PA
## 3        3         3  SCRO South Central Regional Office   Harrisburg, PA
## 4        4         4  NCRO North Central Regional Office Williamsport, PA
## 5        5         5  SWRO    South West Regional Office   Pittsburgh, PA
## 6        6         6  NWRO    North West Regional Office    Meadville, PA
##   SHAPE_Leng Shape_Length  Shape_Area                       geometry
## 1   381859.3     381859.3  5704789655 POLYGON ((-75.57921 39.8389...
## 2   678139.4     678139.4 16678209517 POLYGON ((-76.28239 41.3765...
## 3   790236.5     790236.5 25274132935 POLYGON ((-77.68086 40.7297...
## 4   803369.0     803369.0 26904669350 POLYGON ((-76.22085 41.0046...
## 5   745971.5     745971.5 20043766268 POLYGON ((-80.51851 40.4778...
## 6   900688.4     900688.4 22728264748 POLYGON ((-78.91836 41.9987...


We can quickly plot the feature to examine visually using plot()

plot(dep_regions$geometry)


If you would like to filter a feature before importing, your code will have to be a bit more advanced using the httr package. You will submit a query on specific columns of the feature before importing. This process requires an understanding of the feature’s tabular structure. You can do this by exploring the metadata of the feature on the PASDA website. This process is recommended when the features you want to import are large! Filtering a feature before importing will reduce query times.


A good example is the PA DEP 2024 Integrated List streams feature. This feature includes attaining stream segments in an integrated format for the Clean Water Act reporting and listing. This feature is very large, consisting of >257,000 statewide line segments of >85,000 miles of streams and rivers. If this feature was imported without filtering, it would take more than 10 minutes.


Here is an example of filtering the feature to retrieve only the Juniata River segments. This import takes less than one second.

url <- httr::parse_url("https://mapservices.pasda.psu.edu/server/rest/services/pasda/DEP/MapServer") # main directory
url$path <- paste(url$path, "16/query", sep = "/") # feature-specific directory - is added onto the url created above

url$query <- list(where = "GNIS_NAME = 'Juniata River'", # specify your query! uses SQL language
                  outFields = "*",
                  returnGeometry = "true",
                  f = "geojson")
request <- httr::build_url(url)

juniata_r <- sf::st_read(request)
## Reading layer `OGRGeoJSON' from data source 
##   `https://mapservices.pasda.psu.edu/server/rest/services/pasda/DEP/MapServer/16/query?where=GNIS_NAME%20%3D%20%27Juniata%20River%27&outFields=%2A&returnGeometry=true&f=geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 218 features and 18 fields
## Geometry type: LINESTRING
## Dimension:     XYZ
## Bounding box:  xmin: -78.06832 ymin: 40.36058 xmax: -77.01158 ymax: 40.61012
## z_range:       zmin: 0 zmax: 0
## Geodetic CRS:  WGS 84


A check of the coordinate reference system sf::st_crs(juniata_r) shows that this feature also uses WGS 84, consistent with the PA DEP region feature. So these features can be plotted together.


plot(dep_regions$geometry)
plot(juniata_r$geometry, add = T, col = 'blue')

The where argument in the Juniata River example above uses SQL language and is very versatile. For instance, you can use a wildcard % to grab all elements of the feature where GNIS_Name begins with ‘Rock’:


url <- httr::parse_url("https://mapservices.pasda.psu.edu/server/rest/services/pasda/DEP/MapServer") # main directory
url$path <- paste(url$path, "16/query", sep = "/") # feature-specific directory

url$query <- list(where = "GNIS_NAME LIKE 'Rock%'", # add a wildcard (%) after Rock
                  outFields = "*",
                  returnGeometry = "true",
                  f = "geojson")
request <- httr::build_url(url)

rock_wildcard <- sf::st_read(request)


This query retrieved 762 stream segments, where the GNIS_NAME begins with ‘Rock’. Below are all the unique entries in GNIS_NAME that were returned. Although this query resulted in hundreds of features, it only took 2.6 seconds with a high speed internet connection.

sort(unique(rock_wildcard$GNIS_NAME))
##  [1] "Rock Bottom Creek"    "Rock Cabin Run"       "Rock Creek"          
##  [4] "Rock Hill Creek"      "Rock Hollow Run"      "Rock Lick Run"       
##  [7] "Rock Port Creek"      "Rock Run"             "Rock Spring Run"     
## [10] "Rockdale Creek"       "Rockey Run"           "Rocklick Creek"      
## [13] "Rocklick Run"         "Rockwell Creek"       "Rocky Forest Creek"  
## [16] "Rocky Fork"           "Rocky Gap Run"        "Rocky Mountain Creek"
## [19] "Rocky Run"


We can then simply plot the features to see the statewide distribution of streams that begin with ‘Rock’.

plot(dep_regions$geometry)
plot(rock_wildcard$geometry, add = T, col = 'blue')


There is a lot of help available online. Search for additional ways to expand your spatial query capabilities. The myriad of spatial data on PASDA, coupled with these tools allow tons of functionality for geospatial analysis in R!




This tutorial was prepared by Matt Shank in January 2024