Stanczyk has been thinking about GEDCOM a lot these days. As you may know, GEDCOM is the de facto standard format for a genealogical family tree file, in order for it to be shared amongst the many genealogical software programs / websites / apps. Most genealogy programs still use their own proprietary format for storing data but will import / export the data in the GEDCOM standard for you to exchange data with another program or genealogist.
Did you catch the phrase ‘de facto standard’ ? OK it is NOT an open standard maintained by ISO or ANSI standards organizations. But it is widely supported and in fact you should NOT buy or use software that does not support the export and import of GEDCOM files!
Well we are coming up on RootsTech 2013 and my mind is turning back to the technical part of genealogy again!
Today’s blog is about the GEDCOM used by Ancestry.com. Were you aware that you can export your family tree from Ancestry.com? You can by selecting/clicking on ‘Tree Settings‘ under the ‘Tree pages‘ drop down menu (Tree Settings will be the second from the bottom in the menu list). If you click on ‘Tree Settings’ you will see a screen similar to:
Notice that after you click on the ‘Export tree‘ button, that you get a new button named, ‘Download your GEDCOM file‘ in that same place.
In all likelihood if you click on the ‘Download your GEDCOM file‘ button you will get a file in your Downloads directory on your local hard drive. It will have a name of:
Now the phrase ‘<your-family-tree-name>’ will actually be something like ‘Eliasz Family Tree.GED’ . So your Downloads directory will have a similar named file (complete with blanks in the file name). The size of the file will be dependent on how many individuals, families, sources, etc. that you have recorded in your family tree. Figure on a file size of 2MB for about 1,100 people.
Now this file you just downloaded from Ancestry.com is really just a plain text file with a set of standardized ‘tags’ defined by the GEDCOM standard. Software vendors are free to define their own custom tags too. Although CUSTOM tags must begin with an underscore (‘_’). I was curious as to how well Ancestry.com implements/adheres to the GEDCOM standard, so I wrote a little program (in PERL for you programmer types) to analyze my GEDCOM file that I just downloaded.
My program, read_gedcom.pl, spits out a slew stats about the GEDCOM including the tags used. As you may be able to see from the screenshot, there sorted at the end were 5 custom tags:
_APID, _FREL, _MILT, _MREL, _ORIG
These names do not have any meaning except to Ancestry.com and their website’s program(s). What you also see are that in 48,538 lines (in the GEDCOM file downloaded), that 5,158 lines have one of these five custom tags. Normally, I will just ignore these tags and import the GEDCOM file into my laptop’s genealogy software (REUNION, RootsMagic, PAF, etc.) and let that software ignore these non-understandable tags and within seconds I have my Ancestry.com family tree imported in to my computer’s genealogy software. That is fine – no problems.
But what do you think happens you if turn right around and upload that GEDCOM file into your RootsWeb family tree? If you use RootsWeb, then you know you get a LOT of _APID notes across all of your ancestors and sometimes, if you have many facts/citations for any ancestor, then the RootsWeb page for him/her will be horribly marred by all of these _APID tags!
Remember I said the GEDCOM file is a TEXT file. As such it can be edited by whatever your favorite text editor that you use. If your editor does global search/replace, then you can easily remove these CUSTOM tags (_APID, etc.). That will make your RootsWeb family tree individual pages look MUCH better.
Now I know what you are thinking. Do NOT go editing your GEDCOM file! I agree. Make a copy of your GEDCOM file and edit the copy of the downloaded GEDCOM file to remove the lines with ‘_APID’ on them. You can remove all custom tags, but I just bother with the _APID which are so irksome. If your editor can remove the lines with ‘_APID’ then that is what you should do. But if all your editor can do is replace the lines that have _APID on them with a blank line then that is OK too. Make those edits and save the edited (copy) file. The blank lines seem to be ignored by RootsWeb – thank goodness.
Now you can upload the edited file, with the _APID custom tags removed to RootsWeb and your family tree will again look the way it used to before, without these irksome custom tags.
Next time I will tell you what I found when I looked closely at what ANCESTRY.com was putting into the downloaded GEDCOM file.