[Please Leave Comments and Suggestions for Revisions]
By Geoff Cunfer
This exercise has two goals. First, it will introduce students to basic GIS methods and concepts and provide initial exposure to ArcGIS software. Second, it will introduce students to HGIS methods and concepts by exploring the spatial distribution of population in the Great Plains in the past.
The spatial data and attribute data employed here are publicly available from the National Historical GIS project based at the University of Minnesota. That project created map layers at various scales (nation, state, county, census tract, etc.) to correspond with the decennial U.S. censuses for years ending in 0 from 1790 until 2000. You can download census statistics and corresponding maps, designed for use in a GIS or in statistical software, from http://www.nhgis.org.
This project will walk students through the process:
- loading GIS data
- joining attributes to maps
- and symbolizing maps based on those attribute data.
- It will enable students to explore historical questions about the spatial distribution of people of different ethnic backgrounds in the past.
- Students will learn how to create printable maps that answer descriptive historical questions.
Download ArcLesson1.zip and save in your Documents\ArcGIS folder (make sure you unzip the folder). The final file path should be YOURUSERNAME\Documents\ArcGIS\ArcLesson1
From your computer desktop or Start Menu open the ArcMap software (note there are a number of programs in the ESRI GIS package, including ArcCatalog and ArcGlobe, but you will mostly use ArcMap). Begin by saving a new project file Click File or CNTL-S to ArcLesson1\ProjectFiles and call it 1930pop1.mxd.
Key concept: In GIS work it is important to pay close attention to where on the computer you save files. The program you see on your screen does not read just a single file, like a Microsoft Word document or a PowerPoint presentation does. Instead, it refers to multiple underlying files, sometimes dozens or even hundreds of files. The software has to know where to find those files by means of their path. Moving files after the fact or changing the names of directories can cripple a GIS project. Set up a directory structure before you start a GIS project, then stick with it. For this exercise, a directory structure is already in place for you.
Click the Save button to save your work about every 5 minutes throughout the exercise. GIS software crashes now and again and you will save yourself a lot of frustration if you get into the habit of clicking save.
Load map data
Key concept: Spatial data represent geographic features in a similar way to familiar paper maps. We will start with vector map data, which come in three different forms: points, lines, and polygons. This exercise makes use of polygon map data in a vector format. You could use graphics software to draw maps with lines, points and polygons. What distinguishes vectors features in a GIS is they represent places on the earth with known locations. Each point is a latitude and longitude. Lines and polygons are created by stringing together points, so they are also grounded with latitudes and longitudes. This makes it possible to automatically layer vector data and have it all line up to create maps.
- Navigate to ArcLesson1 and double click on the geodatabase called GIS_Demo1.gdb.
- Select the map layers called GP_county_1930 and GP_states_1930. Hold down the control key to select both layers at the same time, then click the Add button.
On the left side of your screen in the Table Of Contents two new layers appear with check marks next to them. One is GP_county_1930 and the other is GP_states_1930. Each has a coloured box below it. On the right side of your screen appears a corresponding map of all of the counties in the 12 Great Plains states as they existed in 1930.
You can’t see the map of the Great Plains states because the county layer is covering it up. Uncheck the box next to GP_county_1930. Now you can see the GP_states_1930 map layer.
Symbolize and arrange map layers
Check and uncheck the boxes next to the two layers to turn them on and off. Check both to turn both layers on. Click the icon in the top left corner of the Table of Contents to List By Drawing Order . Then click and hold on the GP_states_1930 layer and drag it above the GP_county_1930 layer. When you release the mouse button it moves to the top of the list.
4. Now symbolize the GP_states_1930 layer so that it complements the GP_county_1930 layer, rather than obscuring it. Click on the coloured box beneath GP_states_1930. The Symbol Selector dialog box appears. Click the Hollow box. Make the Outline Color black and the Outline Width 2. Click OK. Notice how the symbol box has changed.
Now you can see both map layers at the same time. GP_states_1930 appears as an outline only, without any fill colour, while GP_county_1930 retains its fill colour. The state boundaries make it easier to distinguish which counties are within which state.
Load attribute data
Key concept: Attribute data are digitized descriptions of geographic places. Examples of attributes include things like the total population of a county (polygon), the speed limit along a segment of road (line), or the temperature at a given time at a weather station (point). Attribute data can be numeric or text. The list of possible attributes about the world is infinite. In this exercise we employ attribute data taken from the U.S. Census of Population about all counties in the Great Plains states.
Click the Add Data button again . Inside GIS_Demo1.gdb select the dataset called GP1930pop. Click the Add button. On the left side of your screen a new data table appears, but this time there is no map corresponding to it.
Right click on the dataset called GP1930pop and select Open. A Table called GP1930pop opens. It looks much like a spreadsheet. Each row represents one county in the 12 Great Plains states. Each column represents one type of attribute information about those counties.
Explore the data table by scrolling up and down and left and right. The State and County attributes indicate the location that each row represents. The column headings across the top indicate what information about those places is available in this dataset. Individual cells contain numerical or textual data. This selection of data from the 1930 US Census includes information about
- total population
- rural or urban residence
- tenure of farms
- country of birth for the foreign-born population
This data table will help us characterize the population of each county and understand the demographic structures of different places in the Great Plains.
Join map data and attribute data
At the moment the ArcMap program represents the location of all of the counties in the Great Plains states. And the program contains a table of information (attributes) about all of those counties. But the computer does not yet know how to connect those two sets of information.
Key concept: One significant power of GIS stems from the ability to connect map data and attribute data. This join process is what separates GIS as an analytical tool from database and graphics software. A database program can manipulate attribute data in sophisticated ways. A graphics program can make beautiful maps. But only a GIS can bring together the maps with the attributes so that researchers can analyze the spatial relationships of attribute data.
6. The map layers do have some attributes already built in. Right click on the GP_county_1930 map layer and select Open Attribute Table. A similar data table called GP_county_1930 appears in the Table window. This table looks a lot like the other one, with several columns of variable data and a row for every county. There are 17 attribute columns, all with different kinds of identification information. Scroll right and left through the columns — this table doesn’t contain much information of substance about the characteristics of each county.
The crucial difference between the two tables is that GP_county_1930 is directly connected to the map while the other one is not.
- Resize and drag the Table window so that it doesn’t obscure the map. In GP_county_1930 click on one of the small gray boxes to the far left of the table window. Doing so highlights a row in the table that represents a single county and highlights a single corresponding county feature on the map. If you want to know where any given county is, click on the adjacent gray box in the table to select it.
Before going to the next step, click on the Clear Selection button located below the Add Data button. Note that you can toggle between the two tables you have open by clicking on the tabs in the bottom left of the Table window .
Notice that one column is identical in both tables: GISJOIN. This variable presents a unique ID code for each county. Custer County, Colorado has the code 800270 in both tables, for example. These unique IDs will allow ArcMap to join the data table with census information to the map layer attribute table so that the program can connect locations on earth with attributes describing those locations.
In the Table of Contents, right click on GP_county_1930 and select Joins and Relates, then select Join. The Join Data dialog box appears, which will allow you to connect the Census data to the map. Make the following selections in the dialog box:
Click OK. The table called GP1930pop has not changed. Close it. Inspect the table called GP_county_1930. All of the 17 original ID columns are still there. Scroll to the right in the table. All of the census data are now also present in the map attribute table. Now when you click on a gray box on the left side of the table, selecting both the row and a county on the map, you can determine how many people lived in that county in 1930, how many were male and female, what their age and racial structures were, etc. The join procedure has connected the map data with attribute data about the map features, in this case counties.
To make this join permanent, export the map layer with a new name. In the Table of Contents right click on the GP_county_1930 layer and select Data then Export Data. In the Export Data dialog box select the following settings:
For Output feature class:
- Click on the Folder icon and navigate to ArcLesson1\GIS_Demo1.gdb.
- Name the new map file GP_county_1930_join
Click Save, then OK.
Click Yes to add the exported data to the map as a layer.
Symbolize attribute data
You have completed the basic data set-up procedures for a GIS. The map data and the attribute data are related to one another. Because the data represent conditions more than 75 years ago, this is an HGIS, a Historical Geographic Information System. Now we can ask historical questions of the GIS.
- Where in the 12 Great Plains states could we find the most people of Norwegian birth in 1930?
Key concept: One of the fun parts of GIS analysis is the process of data exploration. Data exploration refers to the iterative mapping and remapping of attribute information to look for interesting spatial patterns. Some of the most productive and creative analysis in GIS work comes from repeatedly making maps, then changing a parameter and remaking the map, as a way to learn about the nature of the attribute data.
In this exercise we will begin the data exploration process.
Clear any selected features and close the Table window so that only the map is visible. The data are still there, connected to the map, but they need not always be visible. In the Table of Contents right click on the GP_county_1930_join layer and select Properties to bring up the Layer Properties dialog box. Click on the Symbology tab. On the left side of the dialog box select Quantities and then Graduated colors. Here you can instruct the program to represent one of the census attributes on the map.
In the Fields dropdown menu set Value to Population. This instructs the software to map the total population information from the attribute table. From the Color Ramp dropdown select a red option. Change the number of classes to 4 and click the Classify button. Set Classification Method to Manual. On the right side of the Classification dialog box set the Break Values to the following:
Click OK. You have just instructed the program to categorize counties into 4 groups based on total population size: those under 10,000, those between 10,000 and 25,000, those between 25,000 and 80,000, and those over 80,000. The more people in a county, the darker red the fill symbol will be.
Inspect the Layer Properties box to confirm that the legend seems appropriate. Move the Layer Properties box off of the map and click the Apply button to see which parts of the plains had more people and which had fewer in 1930.
During this data exploration phase you can change parameters on the fly to ask different questions and to refine your answers. Maybe you want to change the 4 population ranges or the number of ranges represented. You can change the colour choices and even the variable mapped until you hit on information that is useful and interesting.
- Which Great Plains states had many counties with high Norwegian-born populations in 1930?
- Which Great Plains states had some counties with high Norwegian-born population in 1930?
We got our answer, but what do we really mean by “high Norwegian-born populations”? High compared to what? What we have mapped here is high Norwegian-born populations compared to all counties in the 12 states. We don’t really know what percentage of the population in these redder counties was Norwegian-born. To discover that we need to Normalize.
Key concept: Normalization is the process of allocating data equivalently based on a larger total. For example when we represent wheat acreage as a proportion of the total acreage of a county, we normalize based on total area. When we represent Norwegian-born people as a proportion of all the people in the county we normalize based on total population. It is often more accurate and honest to represent normalized data rather than raw data.
Inspect the un-normalized map as it is now. Note the large county in northeastern Minnesota that is one of the darkest on the map. This county may have a large Norwegian-born population simply because it is considerably bigger than the one just to the south of it. Mapping raw data doesn’t account for the variations in county size or for the fact that a county with a large city in it has a lot more total people than a rural county.
Revise your map to normalize for total population.
This tells the software to take the Norwegian-born number for each county and divide it by the total population of each county. The result will be the proportion Norwegian-born. [Remember your math? Example: A proportion of 0.11 = 11% of the population. To convert proportion to percentage, shift the decimal point 2 positions to the right.]
Click Apply. The map changes quite dramatically.
- Which Great Plains states had many counties with high Norwegian-born proportions in 1930?
- Which Great Plains states had some counties with high Norwegian-born proportions in 1930?
Let’s adjust the map to use more intuitive break points in the legend. Click the Classify button and change the Break Values on the right side of the Classification box to the following:
Click OK. This setting groups together the counties with under 2%, 2-5%, 5-10% and 10-13% Norwegian-born population, with darker reds indicating higher percentages. Convert the proportions indicated under the Range heading to percentages under the Label heading by typing in the following Label items:
Click Apply, then click OK to close the Layer Properties dialog box.
Generate a finished map
Now that you have answered you analytical question, you may want to generate a map to accompany the historical discussion of this question in your text.
On the toolbar at the top of the ArcMap window look for a blue-green globe icon called Full Extent . Click on it to center your map. Now look in the bottom left corner of the map window. A small map icon there indicates that you are viewing your map in “Data View.” Hold your cursor over the icon to confirm this. Now we will switch to “Layout View” to complete our map. Just to the right of the small map icon is a page icon . Click on it to see a page layout mock-up of a publishable map.
The map we just created is already there. We need to add a legend, scale bar, and title that will help map readers interpret the meaning of the map. Because we are conscientious scholars, we will also add a credit line to acknowledge the source of our data. The steps below tell you how to add these items to your layout page. If you don’t like the initial result, you can always delete the element and try again.
From the menu at the top of the ArcMap window, select Insert, then select Title. A text box appears on the map layout. Type in “Percentage Norwegian-born, 1930” for the map’s title and click OK. Click on the title text box to select it, and use your arrow keys to move the title up above the map but still inside the solid black neat line.
From the menu at the top of the ArcMap window, select Insert, then select Legend. In the Legend Wizard set Legend Items to show only GP_county_1930_join. Click Next. Change the Legend Title to read “Percent Norwegian-born.” Continue to click Next through the rest of the windows and Finish to complete the legend. It shows up in the middle of the page. Click and drag the legend box to the bottom left corner of the page, inside the neat line.
From the menu at the top of the ArcMap window, select Insert, then select Scale Bar. In the Scale Bar Selector dialog box choose the option you like best. If you want to tinker with the legend properties, click the Properties button. For example you might want the scale bar to indicate kilometres instead of meters. When you are done, click OK. Select the scale bar box, click, and drag it to the bottom right of the layout, inside the neat line.
From the menu at the top of the ArcMap window, select Insert, then select Text. A small text box appears in the middle of the map. Type the following: “Map data and U.S. Census data from http://www.nhgis.org.” Then drag the box to the bottom of the page so that it doesn’t overlap with any other map elements.
When the layout is arranged to your satisfaction, you are ready to export your map. You could, at this point, print a paper copy. Instead, for today, we will export the map as a .jpg image that could be imported to a Word document, uploaded to a web site, or used in a PowerPoint presentation. From the menu at the top of the ArcMap window, select File, then select Export Map. Navigate to ArcLesson1\OutputMaps and name your file Norwegian1930.jpg. Set Save as type: to JPEG (*.jpg) and dpi to 200. Click Save.
Explore your own historical question using the 1930 data
Click the Save button to save your Norwegian-born map in its current form.
Now rename the project file as 1930pop2.mxd. From the menu at the top of the ArcMap window, select File, then select Save As. Be sure to save the newly saved file in ArcLesson1\ProjectFiles. Every time you are ready to generate a new map, start by re-saving the .mxd file with a new name
It is time to pursue your own curiosity. Because the data are already set up for analysis, you don’t need to reload the data. Instead, repeat the procedures starting with the Symbolize attribute data section to ask historical questions of the data, symbolize the answers appropriately, and generate output maps. Try mapping lots of different attributes. Tinker with the best symbolization settings to represent them accurately and clearly. Find one or two with interesting results, and generate a couple of output maps. (If you come up with something good, post it on Twitter and tag @geospatialhist.)
Explore change through time
Key concept: GIS is useful to many academic disciplines and for a multitude of commercial applications. Historical GIS is an emerging methodology within the historical profession that uses GIS technology to analyze and understand the past. An aspect of HGIS that distinguishes it from other GIS uses is the need to represent time. One solution to the challenge of time in GIS is to make sequential maps that hold constant the attribute mapped, the scale, and the legend, but change the dates
Choose one variable and map it for several time points. To explore change through time you’ll need to repeat some of the steps from the Add Data sections above to import map and attribute data, join them, and then symbolize variables. The exercise files contain data for several census years. Keep in mind that the attributes available for each year are not always the same. One that is consistent is total population, so a relatively easy time series to construct would be changes in total population through time. To make maps that are truly comparable through time, they all need to have the same scale and use the same legend categories and colours.