Tech Week: Online Databases and Data Sharing

It’s Tech Week on the Blog and the Technology Committee has something special in store. We have brought together three innovators in the field of online databases and data sharing, and have asked each author to answer a question:

Where do you see online databases and data sharing in five to ten years? What role do you see your respective organization playing in the larger field of archaeological data sharing and online databases? What major hurdles do you think stand in the way of wide scale acceptance and use of online databases in the archaeological community?

Our contributors:

Mark Freeman from Stories Past

  • Mark has worked with the National Park Service and a range of other groups to develop online databases for everything from data driven research databases to interactive education modules. Primarily working with museums and governmental agencies, Mark represents the cutting edge in online databases and data sharing.

Jillian Galle from the Digital Archaeological Archive of Comparative Slavery (DAACS)

  • DAACS, which is based in Monticello’s archaeology department, is one of the largest and most respected online databases for Historical Archaeology. Starting in 2000, when many archaeologists hadn’t even thought of online databases, DAACS was working hard to provide researchers information that would normally take years to get. Jillian has been the DAACS project manager for twelve years and is a pioneer in online databases and data sharing.

Adam Brin and Frank McManamon from the Center for Digital Antiquity

  • When you think of online databases and data sharing, the Digital Archaeological Record (tDAR) is probably one of the first things that come to mind. Adam and Frank work with a wide range of database professionals and archaeologists, and have created an extensive database for everything from digital documents to data sets to GIS files. tDAR represents a digital repository for archaeological data from all over the world. Perhaps the largest archaeological database, tDAR is constantly working to bring more information to researchers and to expand our understanding of the history and prehistory of the world.

Each author has provided us with an interesting view point from their own personal experience and organization. By looking at each post, it should be possible to get a good understanding of where data sharing has come from, where it is going, and what is on the horizon. We encourage you to read the posts and join in the conversation in the comment section or on Twitter, using the #SHAtechWk hashtag.

Click on the banner at the bottom of each post to return to this page! Thanks for reading, and enjoy Tech Week!

Primary Archaeology data for non-archaeologists?

This post is part of the May 2012 Technology Week, a quarterly topical discussion about technology and historical archaeology, presented by the SHA Technology Committee. This week’s topic examines the use and application of digital data in historical archaeology. Visit this link to view the other posts.

Is there value in exposing archaeological primary data to non-professional audiences? Can online archaeology databases serve broader goals? Can they both inform and serve as a tool for advocacy at time when the practice of archaeology is again being challenged in popular culture?

The National Park Services museum.nps.gov.

The National Park Service website, museum.nps.gov, is the online face of ICMS, the database tool that the Department of the Interior uses to manage its collections. In pre-launch testing the most common reaction was surprise that the parks actually had collections. Individual parks decide what to present on the website and it currently includes nearly 450,000 records, representing over four million objects, half of which are archaeological. Some information is removed before it reaches the web. Crucially for archaeology, this includes site name, site location, within-site provenience and UTM data; excluded to protect sites from the very real threat of looting, and at the request of Native American groups.

But stripping the artifacts of physical context before they reach the web is problematic at best for archaeology, so an attempt has been made to restore some contextual information. Collection highlights were developed to be used by the park staff to allow the grouping of objects, creating a virtual context that can represent a physical space – a site or an archaeological feature – or a thematic context, or a virtual exhibit. Fort Vancouver National Historic Site has created several highlights, including The Fort Vancouver Village. The highlight includes narrative text to explain the complex cultural landscape and is supported by 32 selected artifacts. Those artifacts are hyper-linked to the over two hundred thousand records which are part of Fort Vancouver’s online collection. I’d argue that even if most visitors never look at those records. they need to know that they are there. The National Park Service doesn’t just have great scenery, they have curated over forty million cataloged objects.

At Mount Vernon, George Washington’s Virginia plantation along the Potomac River, The South Grove midden excavation uncovered more than 60,000 artifacts. These represent almost 400 ceramic and glass vessels, hundreds of pounds of brick, mortar, and plaster fragments from renovating buildings, buckles, buttons, tobacco pipes, and more than 30,000 animal bones. A new website (in progress at www.mountvernonmidden.com) focuses on 400 objects, but the full database is there (and available on the Digital Archaeological Archive of Comparative Slavery site) and items are presented in the context of the wider collection. Additionally, the website includes a timeline, a map of the site in relation to the broader plantation landscape, historical notes and related published papers, and a database of the Washington family Invoices and Orders – all part of the larger data set that comprises the project.

So site databases, like the truth, need to be out there. Showing artifacts to the public, without this data-rich environment, suggests that just a few objects have primacy, elevating the qualitative over the quantitative. And if archaeologists want support for the process of archaeology and for digital preservation, then showing the volume of data makes sense.

The problem of exposing the soft underbelly of archaeological data is that at least some members of the public might start to question what’s presented. Why is it so hard to compare one site with another? Why are different methodologies used at different sites? Why does every project record different information? Why does the terminology differ between sites? There is a slow move forward in addressing all these issues (Kansa et al. 2011), but if archaeologists want to hammer home the point that pot hunting and looting are bad, then they should be willing to present and rationalize the datasets that professional archaeologists creates.

I’m not suggesting that advocacy is the only reason to show data. As text books and other electronic publications slowly transition from electronic copies of physical books into fully interactive media, perhaps they’ll also start to include accessible databases, and not just as appendices. Database could support graphs and result sets, allowing data to be manipulated, examined and even challenged. Perhaps eventually these datasets could be more than just one-way presentations of data. On websites, by recording the questions asked of the data, by tracking the datasets produced, these databases might come to be a part of research as well as publication.

References Cited

Will today’s graduate training in Historical Archaeology predict the future of digital research archives?

This post is part of the May 2012 Technology Week, a quarterly topical discussion about technology and historical archaeology, presented by the SHA Technology Committee. This week’s topic examines the use and application of digital data in historical archaeology. Visit this link to view the other posts.

The Digital Archaeological Archive of Comparative Slavery (http://www.daacs.org/) provides standardized artifact, contextual, spatial, and image data from excavated sites of slavery throughout the early modern Atlantic World. Currently, DAACS is the largest archive, paper or digital, of standardized archaeological data related to slavery and slave societies. We have built it, with grant funds, generous data sharing, and intellectual input of more than 50 collaborating archaeologists and historians. For over ten years, these scholars and many others have contributed to DAACS’ overarching goal: to facilitate the comparative archaeological study of the spatial and temporal variation in slavery and the archaeological record by providing standardized archaeological data from multiple archaeological sites that were once homes to enslaved Africans.

DAACS strives to achieve this goal by giving researchers access to detailed standardized archaeological data in a format that allows the assemblages to be seamlessly compared quantitatively without any additional processing by the researcher.  We do so by physically reanalyzing the assemblages, and their associated contexts, to the same classification and measurement protocols that were established with the help of the DAACS Steering Committee in 2000. This is the critical aspect of the DAACS program—providing the standardized data that are essential to any comparative archaeological study.

DAACS data are stored in a massive relational Structured Query Language (SQL) database and are delivered over the internet via the DAACS website. The website debuted in 2004 with complete data sets from 15 domestic slave sites in Virginia. They were made available then, as they are today, through an easy-to-use, point-and-click query interface. By the end of 2012, DAACS will contain complete archaeological datasets, including data on over 2 million artifacts, from sixty sites of slavery in Maryland, Virginia, South Carolina, Jamaica, Nevis, and St. Kitts.

During the past year, over 10,000 unique visitors have landed on the DAACS website. Many DAACS users go straight for the Archive’s meta-data: the section of the website that contains information on the DAACS data structures and authority terms, DAACS cataloging manuals and stylistic element guides, and research papers and posters.  Others spend time browsing and reading through the archaeological sites pages, the text-heavy portion of the DAACS website that provides extensive background data on each site, site chronologies, access to images and maps, and bibliographies. We consider these pages essential to anyone using the archaeological data accessible through the DAACS Query Module.

Visitors often move from the background pages to the DAACS Query Module, which provides access to standardized data on hundreds-of-thousands of artifacts and archaeological contexts. The query interface masks a complex set of queries to the relational database that contains the raw archaeological data from all sites in the Archive. Queried data are returned and made available to users through the web browser and through downloadable ASCII files that can easily be imported into the user’s favorite statistical package.

DAACS is explicitly and clearly designed for large-scale comparative archaeological research. The website features—the Query Module, Archaeological Sites Pages, and corresponding meta-data—are critical to meeting the goals of the project.

In the evolving ecology of accessible digital data, digital archives vary in the extent to which they are designed to facilitate comparative research versus the extent to which they facilitate and make possible the preservation of archaeological data. These elements of online archives and databases are not mutually exclusive; many research archives preserve data and preservation archives encourage research. Projects such as tDAR (The Digital Archaeological Record) and ADS (Archaeological Data Service) are essential to the preservation of born-digital data generated by individual researchers. These critical resources preserve and make searchable data from any type of archaeological project, regardless of region or time period. Data from projects range from digital reports and basic finds lists to full-blown archaeological databases. However, there are comparability problems, to the extent that the contributing researchers use different classification and measurement protocols.

To date, research archives have focused on specific regions and time periods in order to provide datasets that enable researchers to address synthetic research questions. Examples include the Chaco Research Archive, A Comparative Archaeological Study of Colonial Chesapeake Culture, and DAACS. These projects provide a venue in which protocols that work well in particular times and places encourage individual researchers to think seriously about how to ensure their data plays well with others’ data, making it easier to researchers to glimpse the fruits of comparative analysis that shared protocols make possible.

But each archive type requires specific tradeoffs. For research archives making comparative quantitative research easy requires standardization.  However, it is not clear how, over the long-term, the requisite standardization will emerge. Sites like DAACS may be one way forward. No matter where one sits on the continuum, a firm commitment to open and transparent data sharing underpins all digital archiving projects.

The demand for archives that specialize in digital data preservation and accessibility will continue to grow as individuals, museums, universities, and the government grapple with archiving and making the large quantities of archaeological data they curate accessible. The success and growth of research archives that generate detailed comparable digital data accessible for the explicit research purposes will depend on how we meet the analytical needs of inquisitive archaeological researchers.

Over the past six years, we’ve seen a marked increase in the number of graduate students who approach us with the desire to pursue data-driven comparative research.  Their questions and needs may be a bellwether for the development, use and longevity of research archives.

Our experience at DAACS is that undergraduate and graduate students are eager to engage in archaeological data analysis, both on the single site and comparative levels.  They come to DAACS asking questions that require serious archaeological data analysis however many are missing two critical skills: the ability to link arguments about what happened in the past to archaeological variation and the skills in data analysis that allow them to summarize patterns in the data that speak to the arguments.

A concrete example is one related to chronology. Chronological control is the critical first analytical step in doing any archaeological study, whether at a single site or comparative analysis – you do not want to mistake temporal change for synchronic variation. Yet we have discovered that graduate students who have completed their coursework in Historical Archaeology do not know how to get started. From framing an argument to executing data retrieval, discovering patterns in the results, and linking those patterns back to the original argument we have discovered that most historical archaeology students come to us seeking advice on where and how to begin working with their data and the data in DAACS. An informal survey suggests that one reason is that only a handful of graduate programs that provide advanced degrees with specializations in historical archaeology require students to take even a single course in statistical methods.

But it is clear that students (and our colleagues) want more resources for learning how to work with these data. We receive regular requests to provide training in statistical analysis and to teach the more arcane analytical methods that we occasionally use but which are necessary to fully engage with the quantity of fine-grained data available through DAACS.

As the promise of using online databases for research has become increasingly obvious over the past five years, the demand for data has risen. It is how we meet the demand not only for the data but also for the analytical skills to make sense of the data that will determine the trajectory of online databases in the next 5 to 10 years.

While I worry about the trajectory of archaeological training, I remain sanguine about the promise of research archives in large part because I am lucky enough to work with graduate and undergraduate students engaging with DAACS’s online database, students who work doggedly to learn methods they were never taught, and who have come to realize that the data in DAACS are so rich that the hard work it takes to learn analytical approaches to their data provides big payoffs and exciting answers to previously unanswerable questions.