Our conversations with agronomists and soil scientists...
...revealed the two components of the survey documents are typically used simultaneously, and that typically one needs to move from the map to the narrative based on the soil categories or phenomena that are common to both. The idea, then, was to strip both components of the publication down to their data elements -- text and map geometry, respectively -- then build them back up in a mutually-aware, interactive web application.
The original survey document...
...(narrative portion) was scanned at archive quality, OCRd, and ingested into Purdue Libraries’ eArchives, a CONTENTdm-driven node in the Libraries’ Distributed Institutional Repository, where it waited to be connected to the map data.

Considerable time went into developing a process...

...that was automatable and efficient. Rather than fuss with pixel-level classification, for example, we used Definiens Professional Earth (formerly eCognition) to apply segmentation algorithms to the raster content. In this method, soil zones can be vectorized as continuous shapes, more suitable for classification and, later, conversion into a geospatial format. These zones were extracted as polygons into a geodatabase. Once there, edges were smoothed, gaps filled, and unclassed or mis-classed polygons were fixed through the application of database topologies and proximity geoprocessing (e.g. assigning classes to “blank” polygons based on their nearness to classed polygons).
Through repetition and refinement, we were able to hone the process to where it took ~10 hours from start to finish (including as much as 2-3 hours of unnecessary control point placement). We’re pleased with this pace, though maps after 1906 tend to be much more complex.
Our online GIS...
...was built on top of a MapServer/PostGIS back end and we used the open source ka-Map! API to precache and pre-tile MapServer’s output so that usability was more intuitive and smooth. Querying the map proffers a picklist of the results. Clicking the “Preview in eArchives” link pipes CONTENTdm imagery straight into a preview window with the query term (soil class) highlighted using CONTENTdm’s highlighting mechanism. The preview image itself is the link into the full CONTENTdm-hosted document. Additional (non-survey) map layers were and will be added, but at this point the target functionality – moving from map to narrative based on shared semantic targets – is working.
The datasets...
...are available for download from within the map interface itself, but you're welcome to also download them here:
The extensibility of our model...
...has us thinking of ways to integrate many types of additional content, both explicitly and implicitly geospatial. The map speaks fluent XML, and we have successfully leveraged this capability toward the inclusion of wiki and content management system (CMS) content in the map itself. This allows us to annotate the map with anything that can itself be related to the earth, including citations to articles, images, snippets of other maps, or videos. Doing so will in some ways make the map itself a platform, an API that is easily-mashable with content contributed from agronomists, high schoolers, historians, or those long-dead surveyors whose field notes have survived them. No knowledge or interest in GIS required.

