This month marks the 50th anniversay of President John F. Kennedy's inauguration. It also marks the unveiling of the nation's largest online digitized presidential archive.
EMC, along with AT&T, Raytheon and Iron Mountain, have partnered to create a searchable online archive, available to all at www.jfklibrary.org.
One of EMC's roles was to provide the core scan, protect and storage capabilities at the heart of any digital archive.
It's safe to say that the internet has changed how we think about information access.
I think this announcement is only the tip of the iceberg as we apply new technologies to preserving our cultural heritage, and making it vastly more accessible to all.
But, as you'll see in a moment, it won't be an easy job.
The JFK Library and Museum
The JFK Library is one of 13 presidential libraries administered by the National Archives and Records Administration. The building is situated on a gorgeous site overlooking Boston Harbor.
But it's a physical building, with largely physical assets.
If you wanted access, you had to make the time to travel and pore through massive amounts of physical records that weren't born digital.
Today's announcement represents a four-year, $10m effort to digitize -- and make accessible -- a small fraction of the library's contents. And that's just one presidential library -- itself only a small subset of all the worthwhile archives and libraries around the world.
This short video does a great job of communicating the significance -- and some of the details -- behind this effort. I'd encourage you to watch it, as I'd like to make a few points off of the discussion here.
Go right ahead, I'll be here when you get back :-)
The Heavy Lifting Of Capture
Imagine having to scan millions of documents -- all around 50 years old or so -- one at a time -- and doing so in sufficient resolution that there's rarely a need for the original. That thankless activity is at the core of this effort.
It's easy to see that only a small fraction of the physical assets have been digitized at this point -- even though the current digital collection is the largest of its type. It speaks to the massive collective effort in front of all of us that will be required to digitize our physical history.
I found the "on-demand" approach to the remaining physical artifacts insightful: digitize core artifacts as a starting point, make the entire index searchable, and then digitize on-demand as researchers' interests dictate.
Perhaps that's the only logical way of tackling an effort of this magnitude.
The Importance of Metadata
The video did a good job of hinting at the significant effort required to standardize, capture and validate the metadata around each artifact. When you're about talking many millions of digital objects, the archive will only be as good as its metadata. It's unlikely that anyone will do a sequential scan :-)
Several parts of EMC tend to focus on issues around metadata in its hundreds of different forms -- how it's generated, how it's represented, how it's used -- making this aspect of the initiative. You can get a sense of this from Suvig Boyed's comments in the video.
Lots of Storage?
You might imagine a data center with rack after rack of storage needed to house these digital archives.
That's not really the case -- at least, at this stage. Basically, the setup is a modest NS120 front-ending a few modest Centeras.
Put differently, the cost of physical storage isn't the economic barrier to large-scale digitization of cultural artifacts: it's human capital.
EMC Involvement
Besides the usual contributions of products and people, it was nice to see some of EMC's engineering crew involved. In particular, our own Steve Todd blogged extensively on the effort.
The effort appears to have got him thinking about other related issues, like information provenance. And, of course, other library archives to target :-)
The Bottom Line
EMC thinks the digital preservation of our cultural heritage is an important issue. It's not about profits, it's about giving back to current and future generations with a gift that's somewhat unique to us.
Whether its any of the several library digitization projects we're involved with, or the EMC Information Heritage Initiative, or perhaps the more community-oriented EMC Heritage Trust Project -- this is an area we really care about.
I think Bill Teuber's quote sums it up best:
“History’s treasures are easily lost. Through the power of advanced technology and innovation, we’re giving John F. Kennedy’s most important artifacts new life, on a global scale. What was destined to backroom shelves is now easily accessible for the first-time to students, historians and interested people around the world.”
And there's so much more to do ...
George Santayana, Harvard philosopher and poet has well said:
“Those who forget the past are doomed to repeat it.”
Digital records of this history that we can store, protect, replicate and distribute across the Internet are one way to preserve that history. Blogging about history is another. In a world where increasingly the event horizon for the past does not extend before one's birth, this is invaluable.
Bill Petro, your friendly neighborhood historian
Posted by: Bill Petro | January 13, 2011 at 12:52 PM
Thanks for sharing. We're always follow your blog.
Posted by: Alex | February 10, 2013 at 06:03 PM