The spring General Meeting of the American Philosophical Society is April 25–27. Read the program and live stream the proceedings

Using Open Data - Franklin Ledgers

Using Open Data - Franklin Ledgers

By Craig Fox (Museum Guide) 

One of the hardest parts of creating projects with open data is figuring out how to extract the endless supply of data into something useful. Here’s a guide on how we turned some of the data from the Benjamin Franklin Postal Project into an educational resource for students and educators.

The goal is to convert the data file into something usable for young students to understand more about early American history. A lot of the modifications involve removing incomplete or unused data in order to simplify the spreadsheet. It helps to know what information you want to extract or be left with before you go through some of these steps. The steps below assume the reader has a basic level of understanding of how to use Google Sheets or Microsoft Excel. While the process is mostly the same for both programs, all major differences are mentioned. If you want to see the finished version and the original file look at the sheets here.

Setup

First, download the data used for the project. The data used can be gotten at https://diglib.amphilsoc.org/islandora/object/compound:53#page/44/mode/1up by clicking on the “Datasets and Documentation” icon or by clicking here. Unzip the downloaded folder and use the “BF85f6-30_Dataset A_Post office book, letters received, 1767-1768.csv” file. Other datasets and a guide to reading them can be found here. Any datasets used from this project will end in “.csv” which is an open source, human readable file format for storing tables and spreadsheets. We will then convert this to Microsoft Excel’s file format (xlsx) or to a Google Sheets file.

Steps for Google Sheets

Setup: Open up Google Sheets and click the folder icon (it should read “open file picker” when hovered over). Click the upload tab in the window that pops up and add the file.

  1. Insert 2 rows at the top of the document. Copy formatting from Initial Columns section located below. Pay attention to which cells are merged (cells are merged by highlighting them and then pressing the merge icon).
  2. Hide the following Columns: “Notes” (Not Used), “Transcribers Notes” (Not Used), “Belonging to this office only” (Empty for the data we are using), “Sorting ID” (Not Used), “Page Number” (Not Used), “URL” (Not Used)
  3. Highlight the third row, then click the “Data” tab and press “Create a new filter”. Select the filter icon next to “Data Completeness” and pick “Complete”.
  4. Hide “Data Completeness” column
  5. Create a new Column called “Time of Receiving” to the right of “Time of Receiving: Day”. Set the first row below the header equal to =DATEVALUE(CONCATENATE(TO_TEXT(F66), "-", TO_TEXT(G66), "-", TO_TEXT(E66))). Then drag this down to fill in all the rows.

(Note: The DATE function is not used because it will not accept pre-1900 years, so this workaround is needed to properly display those dates)

  1. Highlight the column and go click the “Format” tab, then “Number”, then “More Formats”, then “custom number formats” and enter: mmm" "d", "yyyy
  2. Hide “Time of Receiving: Year”, “Time of Receiving: Month”, “Time of Receiving: Day” columns
  3. Create a new Column called “Date of the Bills Received” to the right of “Date of the Bills Received: Day”. Set the first row below the header equal to =DATEVALUE(CONCATENATE(TO_TEXT(K66), "-", TO_TEXT(L66), "-", TO_TEXT(J66))). Then drag this down to fill in all the rows.
  4. Highlight the column and go click the “Format” tab, then “Number”, then “More Formats”, then “custom number formats” and enter: mmm" "d", "yyyy
  5. Hide “Date of the Bills Received: Year”, “Date of the Bills Received: Month”, “Date of the Bills Received: Day” columns
  6. Select the filter icon next to “Name of the Offices and Ships, from whence they were received” and pick everything except “#Value”.
  7. Repeat step 12 for “Date of the Bills received” column
  8. Select the filter icon next to “Number paid for” and pick everything except “null”.
  9. Hide row 3 (Note: This row exists to allow filtering since you can not use merged cells like those in rows 1 and 2 to do filtering. It is hidden to provide a cleaner view)
  10. Highlight the whole document and set horizontal align to center, vertical align to center, and width to auto

Steps for Microsoft Excel

Setup: Open Microsoft Office and click “upload and open”. Put in the file. Once the file loads, click “Edit Workbook” on the yellow bar that appears above the data, then press “Convert” in the pop-up that appears.

  1. Insert 2 rows at the top of the document. Copy formatting from Initial Columns section located below. Pay attention to which cells are merged (cells are merged by highlighting them and then pressing the merge icon).
  2. Hide the following Columns: “Notes” (Not Used), “Transcribers Notes” (Not Used), “Belonging to this office only” (Empty for the data we are using), “Sorting ID” (Not Used), “Page Number” (Not Used), “URL” (Not Used)
  3. Highlight the third row, then click the “Data” tab and press “Create a new filter”. Select the filter icon next to “Data Completeness” and pick “Complete”.
  4. Hide “Data Completeness” column
  5. Create a new Column called “Time of Receiving” to the right of “Time of Receiving: Day”. Set the first row below the header equal to =CONCATENATE(F66, "-", G66, "-", E66). Then drag this down to fill in all the rows.

(Note: The DATE function is not used because it will not accept pre-1900 years. Microsoft Excel does not have a concise workaround so the date is left as numbers in the form of mm-dd-yyyy)

  1. Highlight the column and go click the “Format” tab, then “Number”, then “More Formats”, then “custom number formats” and enter: mmm" "d", "yyyy
  2. Hide “Time of Receiving: Year”, “Time of Receiving: Month”, “Time of Receiving: Day” columns
  3. Create a new Column called “Date of the Bills Received” to the right of “Date of the Bills Received: Day”. Set the first row below the header equal to =CONCATENATE(K66, "-", L66, "-", J66). Then drag this down to fill in all the rows.
  4. Highlight the column and go click the “Format” tab, then “Number”, then “More Formats”, then “custom number formats” and enter: mmm" "d", "yyyy
  5. Hide “Date of the Bills Received: Year”, “Date of the Bills Received: Month”, “Date of the Bills Received: Day” columns
  6. Select the filter icon next to “Name of the Offices and Ships, from whence they were received” and pick everything except “#Value”.
  7. Repeat step 12 for “Date of the Bills received” column
  8. Select the filter icon next to “Number paid for” and pick everything except “null”.
  9. Hide row 3 (Note: This row exists to allow filtering since you can not use merged cells like those in rows 1 and 2 to do filtering. It is hidden to provide a cleaner view)
  10. Highlight the whole document and set horizontal align to center, vertical align to center, and width to auto

Initial Columns

Help Expand These Records

Postal records like these can be hard to find because they sometimes weren't considered valuable enough to put in an archive! But to tell a fuller story about relationships between people and places in colonial British North America, we'd love to be able to connect our records from Philadelphia to similar postal records from the other twelve colonies. We could use your help! If you know of any account books, postal records, or other data sources related to the postal service prior to 1800, please let us know at [email protected].