Saturday, January 23, 2016

Learning: The Trouble with Indexes

I am a huge proponent of indexing, especially as a volunteer. I believe that the volume of records left to be made available to the public is so enormous, that volunteer indexing is the only way many records will ever see the light of day. There simply aren't enough resources--in money or time--for any one organization to pay for what records exist to be indexed.

So it falls to every genealogist, the users of these records, to seek volunteer opportunities to index records wherever possible. Everyone who cares about these records, and the people listed on them, must become a well-trained custodian of the past. How we index records is a part of that stewardship.

So when we consider what makes up a good index of a record set, I think we can agree on some core elements.


The index that is created must be an accurate representation of what is written on the records. Some errors will always present themselves due to transcription, and these are especially understandable with handwritten records. But an index greatly loses value when all names and details that appear on the records are not included, or the index misrepresents any data points.


The index should be searchable by all relevant data points. And I'm consistently in the camp that says that if it appears on the record, it's relevant information. While it may not be practical to make every record searchable by every data point, an index's search functionality should be as inclusive as possible to every name that appears on the record.

One significant example of a commonly missing search parameter is race. At the time of this writing, the only website of the Big Four (, My Heritage, FindMyPast, and FamilySearch) that allows you to search universally by race is This creates a significant impediment to doing research for anyone of African descent.

The problem becomes compounded increasingly when the black individuals for which I'm searching are not African American, and have no connection to the United States. But the example I'll be addressing below applies specifically to the challenges of indexing African American records.

Availability of Original Images

Not every index can provide images for original records. To do so is often cost prohibitive, especially for smaller organizations that are new to indexing.

However, every index should communicate the origins for the information being indexed. Date ranges, specific locations, repository, and all other information necessary to obtain copies of the original images should be present in the index description and item descriptions.

Not only does this aid in the crafting of quality source citations, but makes it possible for interested parties to request original copies.

I think we can all agree that the organization that makes it easiest for genealogists to index records on a large scale is FamilySearch. Their interface has provided the standard for any organization wanting to engage their communities in indexing efforts. What they deliver, especially since it relies almost entirely on volunteers, is impressive.

But sometimes even they can miss the mark.

Pittsylvania County, Virignia Death Index on FamilySearch: A Bad Example

What happens when an index is created, but many of the above points are ignored? How does it affect a person's ability to find desired records?

After the Ben Affleck/Finding Your Roots controversy hit the fan, it really got me thinking about what the best approach is to document slavery. And while I think everyone has a right of privacy to share or withhold whatever they want about their family to an international audience, I think slavery demands more of genealogists and family historians. Because I have slave holders and slaves throughout various lines of my family, I've decided to document both groups with total openness and objectivity.

Pittsylvania County, Virginia has a treasure trove of records in comparison to other communities in the South, including for African Americans. I decided to start there, which led me to Virginia Deaths and Burials, 1853-1912 on FamilySearch.

When I first sat down to do this project in September of last year, there was no way to search this collection by race. As of this writing, that has since changed. Because I was specifically interested in slaves held by the Keatts family in Pittsylvania County, my search parameters included the years 1853-1865.

The following are examples of records I found.

One issue I noticed, having seen both the originals and another index of these records, is the placement of the slave holder's name. As you can see here, it has been placed in the father's name position. While I cannot comment on the paternity of any of the slaves owned by my family, I can vouch for the fact that the original records made no such attempt. On the original records as provided by the other index, Richard Keatts is labelled in a column specifically designated for the owner of deceased slaves. He is also listed as the Consort, or informant.

The practice of putting the slave owner's name in the father's name field is consistent throughout the collection, regardless of the gender or relationship of the owner. Aletha "Letty" Keatts is female, the sister of Richard Keatts, and her name also appears in the father's field.

Upon closer inspection of the FamilySearch collection, giving the owner's name, followed by "(Owner)," and providing that information as the father's name appears to be the convention for all records related to slaves in Pittsylvania County. In order to have that degree of compliance with this many records, this has to be what the indexers of the collection were instructed to do. However, without the original images, I cannot say whether every slave holder in the FamilySearch collection has such a designation. Additionally, that convention is not disclosed in any description of the collection, or in the Known Issues page of the collection that I could find.

Additionally, the emancipation status of every African American was stated plainly on the original records. Whether the deceased was white was answered with a Yes or No. In a second column, labeled "Colored," their emancipation status was listed as either "Slave" or "Free." However, that information was not indexed in the FamilySearch collection.

While it may be possible to isolate all of the enslaved African Americans by using the race search box and searching for "Owner" in the father's name field, there are some issues with this approach. The first is that I cannot determine if every slave holder has such a designation. The second issue is that every search result with that designation appears twice--once as a result for the deceased individual, and once for the so-called father/owner.

As a result, anyone trying to find enslaved ancestors in Pittsylvania County for this time period has to comb through a results list full of duplicates. Anyone trying to find emancipated ancestors for the same time period is unable to isolate these results from everything else. Given that this information is so clearly stated on the records, the real issue here is the way the records were indexed. The indexing program simply didn't have fields to index the names of slave holders.

But if real efforts are going to be made to index records pertaining to African Americans and their ancestors, these improvements to the indexing program need to be made. And the fastest, simplest way to correct all of the issues related to these records is to re-index the collection.

While it is possible to rely on user submitted corrections to individual records, which FamilySearch stated to me as their proposed solution, the lack of thorough correction to all affected records shows a real lack of accountability for the situation they've created. Some might also say that an inaccurate or "quirky" index is better than no index at all. But for errors that so disproportionately affect the African American community, such a glib response is unconscionable. It leaves us to wonder how many other collections related to slavery have similarly botched indexes, and what FamilySearch is doing to identify and correct these issues.

A Lesson Learned

The most important lesson to take from this example is that indexing efforts should be well-planned if we expect them to be well-executed. Taking shortcuts, or trying to avoid proper adaptation of current resources to created a true derivative, ultimately creates more work than it alleviates.

Because of the way computer databases are constructed on the back end, the only chance there is to address multiple records at once is when the collection is indexed. After that, it is a one-by-one, tedious effort to do any corrections.

Indexes can be incredibly useful. They serve a necessary, low-cost function in providing free access to records. But unfortunately, there is a dark and messy underbelly to them of which every genealogist should be aware before using them.

The issues that come with them sometimes make them about as useful as another brick in the wall.