Sustainable Archaeological Databases — a view from Digital Antiquity

This post is part of the May 2012 Technology Week, a quarterly topical discussion about technology and historical archaeology, presented by the SHA Technology Committee. This week’s topic examines the use and application of digital data in historical archaeology. Visit this link to view the other posts.

At the Center for Digital Antiquity (Digital Antiquity), we are committed to improving access to and preservation and use of archaeological information. Over the past four years, we’ve built tDAR (The Digital Archaeological Record), a digital repository designed to preserve the digital documents, data sets, images, and other digital results of archaeological investigations and excavations. tDAR is one of a number of discipline-specific repositories designed from the bottom up to better support the needs of the content by providing rich, archaeologically-specific metadata along with tools to discover, access, and use the uploaded materials.

Looking into the crystal ball, there are a number of significant challenges and important opportunities ahead:

  1. creating and maintaining a stable foundation for future archaeological research and resource management
  2. access and use (and preservation too)
  3. collaboration

Sustainability

If there’s anything that we can learn from the basic practice of archaeology, it’s that things do not get preserved unless the environment is right to enable preservation. This works best if there are multiple sources and tools available. In the case of archaeological data, it means that there is a mixture of sustainable technology, organizations, and tools to enable and facilitate preservation.

A digital repository that has the ambition of providing long-term preservation for archaeological data must be sustainable for the long term.  There must be a realistic plan for funding the variety of activities required in order to ensure access and preservation of information, as well as succession plans.  These are core components of being certified as a “Trusted Digital Repository,” something that Digital Antiquity aspires to make tDAR in the near future.

At Digital Antiquity, we have a plan and a schedule for achieving it.  We see the development of a digital curation service useful for public agencies, research organizations, and individual researchers as key to sustaining the tDAR repository. We plan to charge for the deposit of information into tDAR to support the archiving of those materials, and are negotiating with other archives to serve as backup repositories for tDAR. The main point here is that any organization that is serious about providing for long-term support to maintain must have a plan to ensure financial support and must work diligently to execute this plan.

Digital Antiquity cannot solve this problem alone, however, sustainability requires multiple sources, technologies and approaches tools like LOCKSS or organizations like the Internet Archive or HathiTrust to help ensure sustainable archaeological information.  Sustainability also requires a change in culture. It requires that public agencies, research organizations, and individual researchers who create data ensure that it is available and remains preserved for future access and use, and budget funds as part of their activities to support the digital repositories.

Access and Use

One of the easiest ways to understand the challenges of the future is to look at the problems we’re still struggling with from the past.  Looking back to the 70′s, 80′s, and 90′s tremendous quantities of archaeological data, in the form of reports, documents, data sets, and other materials have been produced. Most of this data collected in the US has been funded by public undertakings conducted through cultural resource management (CRM) investigations.

The challenge is that much, perhaps most, of this information is on the verge of being forgotten about and lost. Almost all of the reports from the CRM era are available only as paper records. Unless systematic efforts to preserve, digitize, and make more widely available these older reports and data are undertaken, this body of work will be forgotten or essentially lost.

Recently produced archaeological reports and other data often are in digital formats. However, if these reside only on a floppy disk they too are one step away from being lost. The digital analog to the situation with paper records is not much better: a broken hard-drive or a Dropbox account that’s been corrupted, and the critical data has been lost. When data is maintained and kept at the “personal” level without appropriate documentation and backup, it’s at risk.

With the advent of the web, some documents and databases have moved to the web as simple webpages or more complex websites.  Moving to the web has been a major step forward enhancing discover and providing easier access. Tools like Google may enable these materials to be discovered and used, but not all databases are “discoverable.” For example, the NADB database has been hosted for a number of years by the Center for Advanced Spatial Technology (CAST) at the University of Arkansas. In this form, it was available online, but for potential users to use it, they had to know both about NADB and how to access the NADB web page in order to perform a search. Simply putting it on the web does not equate with accessibility.

From an archival standpoint, a database like NADB in its current form would not be preserved either. Services like the Internet Archive, attempt to archive sites, but only those that pages can be linked-to, and many databases are only accessible via search-forms. Furthermore, if they are accessible, the data is being preserved in a translated form – definitely better than not preserving the data at all, but not ideal.

The other challenge can be boiled down to a fundamental question… what will happen to the website in 20 years? Sites like Geocities or ma.gnol.ia are examples of what can happen to data on the web without stewardship. Software reaches end-of-life comparatively quickly (5 years in some cases), with backend software or hardware no longer supported — tools like Cold Fusion, early versions of Oracle, or older file formats such as Word Perfect are becoming more scarce, and harder to use / access.  Over the next 10-20 years, these challenges will grow as computing continues to evolve. The growth of cloud computing has great potential: tools like Google Docs and online databases provide a myriad of features we could have only dreamed of in the past, but offer new challenges for preservation and use as they may be dependent on the tool, and restrict access for preservation or use. These too will have time and costs involved and will require online migration and future support.

Regarding use, within the United States there are federal and state regulations that prohibit the general availability of some kinds of archaeological information, specifically detailed site location information. This protection is critical to the management and preservation of the physical site. This, however, requires that online tools be sensitive to this information and that repositories develop methods for screening access and dealing this kind of information.

There are two aspects to consider: First, most information about archaeological resources need not be held as confidential.  In our experience, documents of several hundreds of pages may have only a few with specific site location information on them and many reports do not have any of this kind of detailed information in them.  The challenge, is to ensure that the goal of site protection does not endanger overall ability to preserve and provide access, something tDAR does by enabling documents to be marked as confidential (or enabling redaction), restricting access to the site location information, preserving it and making it discoverable, but restricting access.

The other aspect of this issue is how to ensure that those individuals and officials who need to have access to confidential information can get it? Issues of the identity of repository users will require that over time, tools are created to help in the management of identity and helping to vet users to migrate from each system managing separate credentials or requiring the initial uploader to validate all users.

Collaboration

With the advent of the web, real-time, large-scale collaboration has become feasible, and in many cases quite productive. It requires a shared knowledgebase and interest between the parties, as well as trust. Examples of collaboration range from NSF projects that span a country, or even the development of the state site-files. But, for these collaborations to work, significant synthesis work must be accomplished first, agreed-upon terms, definitions, archaeological and data standards, etc. Within the world of archaeology, this is problematic. There are definitely some categories of classification that can be agreed upon, from faunal characteristics, to scientific measurements, but many qualitative classifications do not have formal, agreed-upon, meanings. Furthermore, significant work must be done once data has been collected in order to prepare it for collaborative endeavors. But, for any of this to happen, there must be more data sharing and publication through tools like tDAR or Open Context.

Reuse

The technology visionary dreams of the Semantic web and linked data, the world where data is infinitely accessible and any query can be answered with a quick search and a click of the mouse. One where data can be collated from multiple sources automatically to answer questions that were impossible otherwise. The dream of the semantic web is one where data is “free” of the database, there are no silos and data is interconnected in ways that the original creator could never conceive. The theory of the semantic web is that if you had online databases of various types linked together and available for users, that it would enable complex, advanced searching functionality that would link the multiple databases together in new, and unique ways.

The challenges of this, however, are great from data quality, to knowledge of external tools, to technical skill.  The latter being, in some ways, the greatest challenge;  Archaeologists, in general are a smart bunch, and often quite technically savvy, but these tools also have a high barrier to entry for use.  Some of these barriers include:

  1. Perceived value and need. If  putting data into a semantic format were as simple as clicking a button and hitting “save as” in Access, Excel, or Word, then this discussion would be moot. Instead, it’s a manual or technically involved process that requires users to isolate different types of data, evaluate it, standardize it, and map it.  It works best for quantitative measurements, and has some real challenges for qualitative data. But, regardless of the ability to publish the data, without a number of shining examples of how the data can be used to produce new impactful and significant ways that change the valuation of the work: reward ratio this will remain a problem.Within tDAR, we have started to develop tools to help users go through the process of making their data accessible through simple web forms. This enables the analysis and mapping of data from coding sheets to shared knowledge structures (ontologies) that can be used in data analysis within tDAR and in the future outside as well.
  2. Once data is in a semantic form, it’s difficult to use. Most archaeologists are not, and do not want to be programmers (though many programmers may want to be archaeologists). While large companies like Google, Microsoft, and Facebook are starting to make use of semantic data in searches (reviews, product searches, flight times, etc are examples of this), the main way of integrating semantic data into your own data is to do it programmatically. Until off-the-shelf tools or discipline specific tools make use of this information, most archaeologists will not be able to use it (or even understand it’s value).Within tDAR, we’ve started to build tools to enable integration of data sets by providing built-in tools enable users to map, collate, and integrate data without being a programmer. Faunal analysts have used these tools to look at use patterns across-sites and continents among other uses.
  3. Once data is in semantic form, how do you evaluate its quality? This is likely the final challenge, semantic or open data is useful only in as far as you can evaluate quality. Leveraging data from the semantic web often means joining or comparing data sets by one aspect in order to gain an understanding of another – but this requires that these connections be evaluated and that the quality of the data be vetted before those connections are made, something that may be hard within online data sets.

In summary, none of these challenges are insurmountable, we have organizations dedicated to the preservation and use of digital data; and we have tools that are evolving to make it easier to ask and answer questions that we could only dream of in the past, linking data together and making new connections.

What we must work together to do is to continue to change the culture or archaeology to ensure that both legacy and new data is properly archived and preserved. And, the challenge for the technologists to build tools that empowers non-programmers to analyze and re-use data in new ways.

Parks Canada Cuts

Many SHA members realize that Parks Canada has recently been subjected to absolutely draconian cuts that risk crippling one of the world’s most influential stewards for cultural and natural heritage and historical archaeological research.  Very few historical archaeology labs are not outfitted with a host of essential Parks Canada publications like Olive Jones and Catherine Sullivan’s Parks Canada Glass Glossary, Lynne Sussman’s The Wheat Pattern, its Archaeological Recording Manual, and many of the technical publications available on the SHA web page.  In January, 2014 the SHA will hold its conference in Quebec City, so it is especially demoralizing to know that by the time we arrive most of Parks Canada’s archaeology staff will have been released.  At the Quebec center, a team of 12 archaeologists was reduced to one; in Cornwall six of seven staff members were eliminated; and just one archaeologist will be responsible for the whole 120,000 km2 of the Canadian Arctic.

The SHA has written a letter to the Canadian Prime Minister joining our international colleagues including the Society for American Archaeology who have appealed to the Canadian government to reconsider the scope of these transformations in one of the world’s models for historic preservation, cultural heritage, and historic archaeology.  Let’s hope that by the time we meet in Quebec in January, 2014 the Canadian government will reconsider the breadth and sweep of these changes.

School’s Out for Summer: Explore Arcadia Mill

 

Entrance to the boardwalk at Arcadia Mill (Courtesy of Arcadia Mill Archaeological Site)

Arcadia Mill Archaeological Site in Milton, Florida provides a multi-disciplinary educational experience for people of all ages. Arcadia Mill represents the first and largest water-powered industrial complex in northwest Florida. Between 1828 and 1855, the industrial complex developed into a multi-faceted operation that included two water-powered sawmills, a railroad, bucket factory, shingle mill, textile mill, and an experimental silk cocoonery. In addition to the industrial facilities, Arcadia had an ethnically diverse community populated by enslaved African American laborers, Anglo American workers, and an elite Anglo American management class. In the late 1980s, local awareness and efforts made by the Santa Rosa Historical Society and the University of West Florida helped to save a portion of the Arcadia Mill site from residential development.

Today, Arcadia Mill functions as an archaeological site that is open to the public. Our facilities include an elevated boardwalk with interpretive signage, a newly renovated visitor’s center and museum, and an outdoor pavilion with working replicas. Arcadia hosts thousands of visitors annually including a large number of students on scheduled field trips. Our educational programming at Arcadia has made great strides over the last few years, but we are always looking for new ways to reach our younger audience.

During the summer months when field trips have tapered off, Arcadia hosts a portion of the University of West Florida archaeological field school. This gives our visitors a chance to see an active archaeological dig; however we are missing part of our audience and the opportunity to use the dig as an educational tool for school children. With a little brainstorming, we came up with the first of several steps to take in order to beat the summer time slump.

A year ago we launched a pilot summer camp, Explore Arcadia Mill, as a new way to provide educational programming when school is out of session. The weeklong camp features a multi-disciplinary approach that is designed for upcoming 4th through 6th graders. Campers learn about geography, history, archaeology, and historic preservation through lessons that feature hands-on educational crafts, group projects, and outdoor activities. Arcadia Mill is a case study for many of the lessons such as understanding the landscape, how to use historical documents, and how historic preservation has helped to save the site.

Learning about stratigraphy (Courtesy of Arcadia Mill Archaeological Site)

The archaeology portion of the camp involves lessons and activities focused on principles and ethics. The campers learn about fundamental concepts such as the Law of Superposition and then test their knowledge on our stratigraphy canvas. We also teach them about the different tools that archaeologists use followed by a seek-and-find exercise using real photographs from our field school. Once we have completed the introduction to archaeology, the campers are taken to the field school excavations where they can visualize everything they’ve learned. The campers do not participate in the actual field work, but they observe and document the visit in their field books.

Campers visit the field school site to learn more about archaeological excavations (Courtesy of Arcadia Mill Archaeological Site)

The campers really enjoy the archaeology lessons and activities in the classroom, but the crowning achievement is the ability to incorporate an active archaeological dig. Aside from being an excellent visual aid, the ability to visit the field school helps us to educate the campers on ethics, stewardship, and professionalism. At the end of the week the campers combine everything they’ve learned and create a primary document, but for fun sake it is really a scrapbook! The parents or guardians of each camper are invited to come view the scrapbooks and learn about what went on throughout the week. Therefore, the campers become the teachers and the camp directors stand by with pride.

With one successful camp season behind us and another just around the corner, the possibilities for activities and lessons have become endless. The camp was giant lesson for us as professionals since we quickly learned what worked and what didn’t work. It will get much easier with time, but now we are ready to implement additional programming. Where do we go from here? The camp was such a great experience that we are now looking at large scale or year round programming. The idea of an after school program came into question, but is that too much? There’s a fine line between educational programming and babysitting. It would be a large undertaking, but it could be very rewarding and worthwhile. Have you tried an after school program or a similar concept?