Thinking About Gedcom — #Meme, #Genealogy, #RootsTech

by C. Michael Eliasz-Solomon

Stanczyk has been thinking about GEDCOM a lot these days. As you may know, GEDCOM is the de facto standard format for a genealogical family tree file, in order for it to be shared amongst the many genealogical software programs / websites / apps. Most genealogy programs still use their own proprietary format for storing data but will import / export the data in the GEDCOM standard for you to exchange data with another program or genealogist.

Did you catch the phrase ‘de facto standard’ ? OK it is NOT an open standard maintained by ISO or ANSI standards organizations. But it is widely supported and in fact you should NOT buy or use software that does not support the export and import of GEDCOM files!

Well we are coming up on RootsTech 2013 and my mind is turning back to the technical part of genealogy again!

Today’s blog is about the GEDCOM used by Ancestry.com. Were you aware that you can export your family tree from Ancestry.com? You can by selecting/clicking on ‘Tree Settings‘ under the ‘Tree pages‘ drop down menu (Tree Settings will be the second from the bottom in the menu list). If you click on ‘Tree Settings’ you will see a screen similar to:

ANCESTRY_TreeSettings

Notice that after you click on the ‘Export tree‘ button, that you get a new button named, ‘Download your GEDCOM file‘  in that same place.

In all likelihood if you click on the  ‘Download your GEDCOM file‘ button you will get a file in your Downloads directory on your local hard drive. It will have a name of:

<your-family-tree-name>.GED

Now the phrase ‘<your-family-tree-name>’  will actually be something like ‘Eliasz Family Tree.GED’ . So your Downloads directory will have a similar named file (complete with blanks in the file name). The size of the file will be dependent on how many individuals, families, sources, etc. that you have recorded in your family tree. Figure on a file size of 2MB for about 1,100 people.

Now this file you just downloaded from Ancestry.com is really just a plain text file with a set of standardized ‘tags’ defined by the GEDCOM standard. Software vendors are free to define their own custom tags too. Although CUSTOM tags must begin with an underscore (‘_’). I was curious as to how well Ancestry.com implements/adheres to the GEDCOM standard, so I wrote a little program (in PERL for you programmer types) to analyze my GEDCOM file that I just downloaded.

ReadGedcom_ANCESTRY

My program, read_gedcom.pl, spits out a slew stats about the GEDCOM including the tags used. As you may be able to see from the screenshot, there sorted at the end were 5 custom tags:

_APID,  _FREL,  _MILT,  _MREL,  _ORIG

These names do not have any meaning except to Ancestry.com and their website’s program(s). What you also see are that in 48,538 lines (in the GEDCOM file downloaded), that 5,158 lines have one of these five custom tags. Normally, I will just ignore these tags and import the GEDCOM file into my laptop’s genealogy software (REUNION, RootsMagic, PAF, etc.) and let that software ignore these non-understandable tags and within seconds I have my Ancestry.com family tree imported in to my computer’s genealogy software. That is fine  – no problems.

But what do you think happens you if turn right around and upload that GEDCOM file into your RootsWeb family tree? If you use RootsWeb, then you know you get a LOT of _APID notes across all of your ancestors and sometimes, if you have many facts/citations for any ancestor, then the RootsWeb page for him/her will be horribly marred by all of these _APID tags!

TIP

Remember I said the GEDCOM file is a TEXT file. As such it can be edited by whatever your favorite text editor that you use. If your editor does global search/replace, then you can easily remove these CUSTOM tags (_APID, etc.). That will make your RootsWeb family tree individual pages look MUCH better.

Now I know what you are thinking. Do NOT go editing your GEDCOM file!  I agree.  Make a copy of your GEDCOM file and edit the copy of the downloaded GEDCOM file to remove the lines with ‘_APID’ on them. You can remove all custom tags, but I just bother with the _APID which are so irksome. If your editor can remove the lines with ‘_APID’ then that is what you should do. But if all your editor can do is replace the lines that have _APID on them with a blank line then that is OK too. Make those edits and save the edited (copy) file.  The blank lines seem to be ignored by RootsWeb – thank goodness.

Now you can upload the edited file, with the _APID custom tags removed to RootsWeb and your family tree will again look the way it used to before,  without these irksome custom tags.

Next time I will tell you what I found when I looked closely at what ANCESTRY.com was putting into the downloaded GEDCOM file.

About these ads

2 Comments to “Thinking About Gedcom — #Meme, #Genealogy, #RootsTech”

  1. I know the word used to be “don’t use a program that does not support support GEDCOM”, but I don’t think that blanket statement is as true as it used to be. I have a post on some issues writing software to the GEDCOM format, and there is a discussion here on the topic as well. http://www.youtube.com/watch?v=eU67WKpdMFw&feature=plcp

    I believe Ancestry is particularly bad at putting their own spin on interpreting GEDCOM fields, and it is VERY difficult to import their data with any sensible results (importing into Family Tree Maker aside – they have the inside scoop))

    I won’t repeat all the arguments about GEDCOM limitations here, but basically its a bit long in the tooth and has been for years. There are several efforts out there to try to address its limitations, but for some those efforts are taking too longs, and unless you just want to see developers push out revisions to existing software applications the old adage just can’t be carved in stone anymore.

    There is a new generation of software apps coming that are trying to fill gaps left by the current software offerings and they can’t afford to wait for a new standard. So they do what they can, support the parts of the GEDCOM that make sense so that there can be SOME data sharing, and wish their were a true standard.

    • Ed,
      Welcome to the blog and thanks for writing. I too am frustrated with FamilySearch and their guardianship of GEDCOM.

      I want software vendors to:
      (1) continue to support import/export GEDCOM last standard
      (2) create USEFUL enhancements via NEW tags (Custom tags beginning with ‘_’ underscore)
      (3) add functionality based upon the new tags
      (4) Go to RootsTech lobby for their new tags to be added to GEDCOM standard (thus removing the ‘_’)
      (5) RootsTech attendees vote on vendor tags to be included in GEDCOM
      (6) RootsTech attendees vote on USER promoted tags to be dropped, altered, moved, created in GEDCOM which go into a provisional CUSTOM tags for the next years vote.
      (7) Vendors attempt to utilize USER promoted tags where it makes sense for them and their software and is viable

      Both VENDORS and USERS need to ‘Own’ GEDCOM and FAMILYSEARCH just become the ‘ISO-like’ standard maintainer and publisher on the basis of RootsTech annual votes.

      The GEDCOM standard needs to evolve and grow again. What is it over a decade out of date?

      If FamilySearch does not agree, then we should form our own Genealogy/Technology conference and create a new GEDCOM standard (GEDCOM++ anyone) that is vendor neutral and allows users to vote annually on new features proposed by vendors, users, genealogy societies, and other industry proponents.

      The need for interchangeability of genealogy data to allow movement for users to new software is good for all. Also, the ability for separate genealogy researchers to trade genealogy data via export/import must be protected. Finally, we MUST NOT lose old gedcom data from prior researchers who may have died, but their GEDCOM lives on, somewhere on the Internet (ex. Ancestry.com or RootsWeb or etc.).

      GENEALOGISTS must learn the GEDCOM tags and understand their implications, SOFTWARE vendors must educate genealogists on their tags and how that supports current or future features.

      -–Stanczyk

Tell Me Your Thoughts ...

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 368 other followers

%d bloggers like this: