Will today’s graduate training in Historical Archaeology predict the future of digital research archives?

This post is part of the May 2012 Technology Week, a quarterly topical discussion about technology and historical archaeology, presented by the SHA Technology Committee. This week’s topic examines the use and application of digital data in historical archaeology. Visit this link to view the other posts.

The Digital Archaeological Archive of Comparative Slavery (http://www.daacs.org/) provides standardized artifact, contextual, spatial, and image data from excavated sites of slavery throughout the early modern Atlantic World. Currently, DAACS is the largest archive, paper or digital, of standardized archaeological data related to slavery and slave societies. We have built it, with grant funds, generous data sharing, and intellectual input of more than 50 collaborating archaeologists and historians. For over ten years, these scholars and many others have contributed to DAACS’ overarching goal: to facilitate the comparative archaeological study of the spatial and temporal variation in slavery and the archaeological record by providing standardized archaeological data from multiple archaeological sites that were once homes to enslaved Africans.

DAACS strives to achieve this goal by giving researchers access to detailed standardized archaeological data in a format that allows the assemblages to be seamlessly compared quantitatively without any additional processing by the researcher.  We do so by physically reanalyzing the assemblages, and their associated contexts, to the same classification and measurement protocols that were established with the help of the DAACS Steering Committee in 2000. This is the critical aspect of the DAACS program—providing the standardized data that are essential to any comparative archaeological study.

DAACS data are stored in a massive relational Structured Query Language (SQL) database and are delivered over the internet via the DAACS website. The website debuted in 2004 with complete data sets from 15 domestic slave sites in Virginia. They were made available then, as they are today, through an easy-to-use, point-and-click query interface. By the end of 2012, DAACS will contain complete archaeological datasets, including data on over 2 million artifacts, from sixty sites of slavery in Maryland, Virginia, South Carolina, Jamaica, Nevis, and St. Kitts.

During the past year, over 10,000 unique visitors have landed on the DAACS website. Many DAACS users go straight for the Archive’s meta-data: the section of the website that contains information on the DAACS data structures and authority terms, DAACS cataloging manuals and stylistic element guides, and research papers and posters.  Others spend time browsing and reading through the archaeological sites pages, the text-heavy portion of the DAACS website that provides extensive background data on each site, site chronologies, access to images and maps, and bibliographies. We consider these pages essential to anyone using the archaeological data accessible through the DAACS Query Module.

Visitors often move from the background pages to the DAACS Query Module, which provides access to standardized data on hundreds-of-thousands of artifacts and archaeological contexts. The query interface masks a complex set of queries to the relational database that contains the raw archaeological data from all sites in the Archive. Queried data are returned and made available to users through the web browser and through downloadable ASCII files that can easily be imported into the user’s favorite statistical package.

DAACS is explicitly and clearly designed for large-scale comparative archaeological research. The website features—the Query Module, Archaeological Sites Pages, and corresponding meta-data—are critical to meeting the goals of the project.

In the evolving ecology of accessible digital data, digital archives vary in the extent to which they are designed to facilitate comparative research versus the extent to which they facilitate and make possible the preservation of archaeological data. These elements of online archives and databases are not mutually exclusive; many research archives preserve data and preservation archives encourage research. Projects such as tDAR (The Digital Archaeological Record) and ADS (Archaeological Data Service) are essential to the preservation of born-digital data generated by individual researchers. These critical resources preserve and make searchable data from any type of archaeological project, regardless of region or time period. Data from projects range from digital reports and basic finds lists to full-blown archaeological databases. However, there are comparability problems, to the extent that the contributing researchers use different classification and measurement protocols.

To date, research archives have focused on specific regions and time periods in order to provide datasets that enable researchers to address synthetic research questions. Examples include the Chaco Research Archive, A Comparative Archaeological Study of Colonial Chesapeake Culture, and DAACS. These projects provide a venue in which protocols that work well in particular times and places encourage individual researchers to think seriously about how to ensure their data plays well with others’ data, making it easier to researchers to glimpse the fruits of comparative analysis that shared protocols make possible.

But each archive type requires specific tradeoffs. For research archives making comparative quantitative research easy requires standardization.  However, it is not clear how, over the long-term, the requisite standardization will emerge. Sites like DAACS may be one way forward. No matter where one sits on the continuum, a firm commitment to open and transparent data sharing underpins all digital archiving projects.

The demand for archives that specialize in digital data preservation and accessibility will continue to grow as individuals, museums, universities, and the government grapple with archiving and making the large quantities of archaeological data they curate accessible. The success and growth of research archives that generate detailed comparable digital data accessible for the explicit research purposes will depend on how we meet the analytical needs of inquisitive archaeological researchers.

Over the past six years, we’ve seen a marked increase in the number of graduate students who approach us with the desire to pursue data-driven comparative research.  Their questions and needs may be a bellwether for the development, use and longevity of research archives.

Our experience at DAACS is that undergraduate and graduate students are eager to engage in archaeological data analysis, both on the single site and comparative levels.  They come to DAACS asking questions that require serious archaeological data analysis however many are missing two critical skills: the ability to link arguments about what happened in the past to archaeological variation and the skills in data analysis that allow them to summarize patterns in the data that speak to the arguments.

A concrete example is one related to chronology. Chronological control is the critical first analytical step in doing any archaeological study, whether at a single site or comparative analysis – you do not want to mistake temporal change for synchronic variation. Yet we have discovered that graduate students who have completed their coursework in Historical Archaeology do not know how to get started. From framing an argument to executing data retrieval, discovering patterns in the results, and linking those patterns back to the original argument we have discovered that most historical archaeology students come to us seeking advice on where and how to begin working with their data and the data in DAACS. An informal survey suggests that one reason is that only a handful of graduate programs that provide advanced degrees with specializations in historical archaeology require students to take even a single course in statistical methods.

But it is clear that students (and our colleagues) want more resources for learning how to work with these data. We receive regular requests to provide training in statistical analysis and to teach the more arcane analytical methods that we occasionally use but which are necessary to fully engage with the quantity of fine-grained data available through DAACS.

As the promise of using online databases for research has become increasingly obvious over the past five years, the demand for data has risen. It is how we meet the demand not only for the data but also for the analytical skills to make sense of the data that will determine the trajectory of online databases in the next 5 to 10 years.

While I worry about the trajectory of archaeological training, I remain sanguine about the promise of research archives in large part because I am lucky enough to work with graduate and undergraduate students engaging with DAACS’s online database, students who work doggedly to learn methods they were never taught, and who have come to realize that the data in DAACS are so rich that the hard work it takes to learn analytical approaches to their data provides big payoffs and exciting answers to previously unanswerable questions.

Sustainable Archaeological Databases — a view from Digital Antiquity

This post is part of the May 2012 Technology Week, a quarterly topical discussion about technology and historical archaeology, presented by the SHA Technology Committee. This week’s topic examines the use and application of digital data in historical archaeology. Visit this link to view the other posts.

At the Center for Digital Antiquity (Digital Antiquity), we are committed to improving access to and preservation and use of archaeological information. Over the past four years, we’ve built tDAR (The Digital Archaeological Record), a digital repository designed to preserve the digital documents, data sets, images, and other digital results of archaeological investigations and excavations. tDAR is one of a number of discipline-specific repositories designed from the bottom up to better support the needs of the content by providing rich, archaeologically-specific metadata along with tools to discover, access, and use the uploaded materials.

Looking into the crystal ball, there are a number of significant challenges and important opportunities ahead:

  1. creating and maintaining a stable foundation for future archaeological research and resource management
  2. access and use (and preservation too)
  3. collaboration


If there’s anything that we can learn from the basic practice of archaeology, it’s that things do not get preserved unless the environment is right to enable preservation. This works best if there are multiple sources and tools available. In the case of archaeological data, it means that there is a mixture of sustainable technology, organizations, and tools to enable and facilitate preservation.

A digital repository that has the ambition of providing long-term preservation for archaeological data must be sustainable for the long term.  There must be a realistic plan for funding the variety of activities required in order to ensure access and preservation of information, as well as succession plans.  These are core components of being certified as a “Trusted Digital Repository,” something that Digital Antiquity aspires to make tDAR in the near future.

At Digital Antiquity, we have a plan and a schedule for achieving it.  We see the development of a digital curation service useful for public agencies, research organizations, and individual researchers as key to sustaining the tDAR repository. We plan to charge for the deposit of information into tDAR to support the archiving of those materials, and are negotiating with other archives to serve as backup repositories for tDAR. The main point here is that any organization that is serious about providing for long-term support to maintain must have a plan to ensure financial support and must work diligently to execute this plan.

Digital Antiquity cannot solve this problem alone, however, sustainability requires multiple sources, technologies and approaches tools like LOCKSS or organizations like the Internet Archive or HathiTrust to help ensure sustainable archaeological information.  Sustainability also requires a change in culture. It requires that public agencies, research organizations, and individual researchers who create data ensure that it is available and remains preserved for future access and use, and budget funds as part of their activities to support the digital repositories.

Access and Use

One of the easiest ways to understand the challenges of the future is to look at the problems we’re still struggling with from the past.  Looking back to the 70′s, 80′s, and 90′s tremendous quantities of archaeological data, in the form of reports, documents, data sets, and other materials have been produced. Most of this data collected in the US has been funded by public undertakings conducted through cultural resource management (CRM) investigations.

The challenge is that much, perhaps most, of this information is on the verge of being forgotten about and lost. Almost all of the reports from the CRM era are available only as paper records. Unless systematic efforts to preserve, digitize, and make more widely available these older reports and data are undertaken, this body of work will be forgotten or essentially lost.

Recently produced archaeological reports and other data often are in digital formats. However, if these reside only on a floppy disk they too are one step away from being lost. The digital analog to the situation with paper records is not much better: a broken hard-drive or a Dropbox account that’s been corrupted, and the critical data has been lost. When data is maintained and kept at the “personal” level without appropriate documentation and backup, it’s at risk.

With the advent of the web, some documents and databases have moved to the web as simple webpages or more complex websites.  Moving to the web has been a major step forward enhancing discover and providing easier access. Tools like Google may enable these materials to be discovered and used, but not all databases are “discoverable.” For example, the NADB database has been hosted for a number of years by the Center for Advanced Spatial Technology (CAST) at the University of Arkansas. In this form, it was available online, but for potential users to use it, they had to know both about NADB and how to access the NADB web page in order to perform a search. Simply putting it on the web does not equate with accessibility.

From an archival standpoint, a database like NADB in its current form would not be preserved either. Services like the Internet Archive, attempt to archive sites, but only those that pages can be linked-to, and many databases are only accessible via search-forms. Furthermore, if they are accessible, the data is being preserved in a translated form – definitely better than not preserving the data at all, but not ideal.

The other challenge can be boiled down to a fundamental question… what will happen to the website in 20 years? Sites like Geocities or ma.gnol.ia are examples of what can happen to data on the web without stewardship. Software reaches end-of-life comparatively quickly (5 years in some cases), with backend software or hardware no longer supported — tools like Cold Fusion, early versions of Oracle, or older file formats such as Word Perfect are becoming more scarce, and harder to use / access.  Over the next 10-20 years, these challenges will grow as computing continues to evolve. The growth of cloud computing has great potential: tools like Google Docs and online databases provide a myriad of features we could have only dreamed of in the past, but offer new challenges for preservation and use as they may be dependent on the tool, and restrict access for preservation or use. These too will have time and costs involved and will require online migration and future support.

Regarding use, within the United States there are federal and state regulations that prohibit the general availability of some kinds of archaeological information, specifically detailed site location information. This protection is critical to the management and preservation of the physical site. This, however, requires that online tools be sensitive to this information and that repositories develop methods for screening access and dealing this kind of information.

There are two aspects to consider: First, most information about archaeological resources need not be held as confidential.  In our experience, documents of several hundreds of pages may have only a few with specific site location information on them and many reports do not have any of this kind of detailed information in them.  The challenge, is to ensure that the goal of site protection does not endanger overall ability to preserve and provide access, something tDAR does by enabling documents to be marked as confidential (or enabling redaction), restricting access to the site location information, preserving it and making it discoverable, but restricting access.

The other aspect of this issue is how to ensure that those individuals and officials who need to have access to confidential information can get it? Issues of the identity of repository users will require that over time, tools are created to help in the management of identity and helping to vet users to migrate from each system managing separate credentials or requiring the initial uploader to validate all users.


With the advent of the web, real-time, large-scale collaboration has become feasible, and in many cases quite productive. It requires a shared knowledgebase and interest between the parties, as well as trust. Examples of collaboration range from NSF projects that span a country, or even the development of the state site-files. But, for these collaborations to work, significant synthesis work must be accomplished first, agreed-upon terms, definitions, archaeological and data standards, etc. Within the world of archaeology, this is problematic. There are definitely some categories of classification that can be agreed upon, from faunal characteristics, to scientific measurements, but many qualitative classifications do not have formal, agreed-upon, meanings. Furthermore, significant work must be done once data has been collected in order to prepare it for collaborative endeavors. But, for any of this to happen, there must be more data sharing and publication through tools like tDAR or Open Context.


The technology visionary dreams of the Semantic web and linked data, the world where data is infinitely accessible and any query can be answered with a quick search and a click of the mouse. One where data can be collated from multiple sources automatically to answer questions that were impossible otherwise. The dream of the semantic web is one where data is “free” of the database, there are no silos and data is interconnected in ways that the original creator could never conceive. The theory of the semantic web is that if you had online databases of various types linked together and available for users, that it would enable complex, advanced searching functionality that would link the multiple databases together in new, and unique ways.

The challenges of this, however, are great from data quality, to knowledge of external tools, to technical skill.  The latter being, in some ways, the greatest challenge;  Archaeologists, in general are a smart bunch, and often quite technically savvy, but these tools also have a high barrier to entry for use.  Some of these barriers include:

  1. Perceived value and need. If  putting data into a semantic format were as simple as clicking a button and hitting “save as” in Access, Excel, or Word, then this discussion would be moot. Instead, it’s a manual or technically involved process that requires users to isolate different types of data, evaluate it, standardize it, and map it.  It works best for quantitative measurements, and has some real challenges for qualitative data. But, regardless of the ability to publish the data, without a number of shining examples of how the data can be used to produce new impactful and significant ways that change the valuation of the work: reward ratio this will remain a problem.Within tDAR, we have started to develop tools to help users go through the process of making their data accessible through simple web forms. This enables the analysis and mapping of data from coding sheets to shared knowledge structures (ontologies) that can be used in data analysis within tDAR and in the future outside as well.
  2. Once data is in a semantic form, it’s difficult to use. Most archaeologists are not, and do not want to be programmers (though many programmers may want to be archaeologists). While large companies like Google, Microsoft, and Facebook are starting to make use of semantic data in searches (reviews, product searches, flight times, etc are examples of this), the main way of integrating semantic data into your own data is to do it programmatically. Until off-the-shelf tools or discipline specific tools make use of this information, most archaeologists will not be able to use it (or even understand it’s value).Within tDAR, we’ve started to build tools to enable integration of data sets by providing built-in tools enable users to map, collate, and integrate data without being a programmer. Faunal analysts have used these tools to look at use patterns across-sites and continents among other uses.
  3. Once data is in semantic form, how do you evaluate its quality? This is likely the final challenge, semantic or open data is useful only in as far as you can evaluate quality. Leveraging data from the semantic web often means joining or comparing data sets by one aspect in order to gain an understanding of another – but this requires that these connections be evaluated and that the quality of the data be vetted before those connections are made, something that may be hard within online data sets.

In summary, none of these challenges are insurmountable, we have organizations dedicated to the preservation and use of digital data; and we have tools that are evolving to make it easier to ask and answer questions that we could only dream of in the past, linking data together and making new connections.

What we must work together to do is to continue to change the culture or archaeology to ensure that both legacy and new data is properly archived and preserved. And, the challenge for the technologists to build tools that empowers non-programmers to analyze and re-use data in new ways.