Tree Climber

In the interests of practicing some programming skills that I don’t use at work (while I have written a few .Net apps at work, I mainly code in ABAP), I have tried lately to come up with fun projects to work on personally. A while back Mom and I were talking about genealogy and the fact that she mainly has information in PAF on her family and not Dad’s (even though there is lots of info in the Ancestral File on FamilySearch.org). I mentioned that I could probably write a program to extract that data, and she said that data would be good to have. As you already might have guessed, last week I took up the challenge.

I set out to write a C# app to extract all genealogical data in the Ancestral File for my direct ancestors. Of course, I had to give it a cool name. I called it Tree Climber…get it? You know; it’s a program that climbs a family tree. Most people reading this are probably shaking their head at me, but I thought it was clever. I utilized a technique called scraping, where the program gets a web page’s source HTML from which to harvest information. Some websites consider this a violation of their terms of use; so I checked the FamilySearch.org terms of use, which does not mention such a violation. So here’s how it works:

  1. For a given person in the tree (I started the program at this node in the tree), download the GEDCOM file using the link found at the top of the page.
  2. Store nodes 8 through 15 for further processing.
  3. Repeat steps 1 & 2 for all of the stored nodes.
  4. Once all of the nodes have been visited and all GEDCOMs downloaded, merge all of the GEDCOMs into one combined file. This is used to import into PAF.


After a couple days of coding & testing, I was ready to try it out. As mentioned, I started the processing at Caitlin (Mom’s Caitlin, not Shannon’s) to include both Mom and Dad’s genealogical information. I created the program with a textbox that shows the names as they are extracted. While it was running, I was amazed at all of the interesting names and at the volume of names. We were always told that a whole lot of genealogical work has been done on Dad’s Mother’s side. Now I have an idea as to how true that statement is. My program ran for 10 hours before it had completed! It took 6 hours to extract 5,683 GEDCOMs and an additional 4 hours to merge them into one combined file. After I imported the GEDCOM into PAF I discovered that 19,580 names had just been extracted. Holy cow! The volume of names is not nearly as impressive as some of the cool people I found that I was related to.


I was actually going to list the complete descendancy from me to Otto I (just because I can), but I figure I have been long-winded enough. If anyone wants the data I extracted or if you actually want to see the descendancy just let me know.

7 Comments

  1. shane says:

    what a genius program! Also i m glad to hear that i m not the only one that programs outside of work.

  2. shane says:

    also great name for the app!

  3. Andrew says:

    Thanks. Maybe next I will add functionality to extract the data in new.familysearch.org (as it has more data). The only problem is that the new family search is painfully slow when you use it manually. Hmm.

  4. shane says:

    i m surprised that they don t have some sort of open api to allow outside devs select access to their data.

  5. Andrew says:

    They do, but it looked like a lot of trouble to get access to it. If I was developing a solution that I was going to market to people, then that is the way to go. (https://devnet.familysearch.org/) Wouldn t you know I found that right before I had finished my program.

  6. shane says:

    that s how things usually go, you can never find something when it s needed.

  7. Russell says:

    Every family tree has some sap in it.

Leave a Reply