Posts tagged ‘GEDCOM’

April 22, 2013

Ancestry Adds Diacriticals to Their GEDCOM Exports — #Genealogy, #Polish, #Diacriticals, #GEDCOM

by C. Michael Eliasz-Solomon

AncestryFixed

Left – Ancestry fixed (diacriticals) • Right – Data missing diacriticals before fix

Ancestry.com ,  you must forgive Stanczyk. With the cacophony of the Terroristic Bombing at the Boston Marathon, The Fertilizer Explosion in West, TX, Chinese Earthquake and so many big news stories that kept my attention, I had neglected to read your email.

This week Ancestry fixed at least two of my bugs. I cannot test the other bug. Let me back up a bit. Long time readers may remember my blog article, “Thinking About @Ancestrydotcom ‘s GEDCOM” from (1st-March-2013). In that article, I asked Ancestry.com to fix three things:

  1. CHAR tag in their GEDCOM export
  2. Support diacriticals ex.:    ą ć ę ł ń ó ś ź ż   (in proper Polish nouns)
  3. Phantom Notes ???

I am here to publicly THANK YOU, Ancestry.com  for taking my tweets at (Twitter: @Ancestrydotcom ) and fixing these problems. I assume #3 was fixed too, but I have no way to test your fix (as I had deleted the offending comments in March). But I am here to tell the INTERNET that Ancestry.com took my tweets and opened a ticket and fixed the bugs!

Now my diacriticals which I entered into my Ancestry.com family tree were exported from Ancestry.com website, downloaded to my laptop, where I examined the GEDCOM file in gVIM (I am still a techy at heart) and saw the new CHAR tag in the GEDCOM file!

I then imported the GEDCOM into RootsMagic (which I knew supported diacriticals) and voila there was my data all proper in RootsMagic!

AncestryFixed_UTF8Ancestry used a value of UTF-8 for the CHAR tag. This allowed me to keep my diacriticals on the export. So now my Family Tree in Ancestry and my Family Tree in RootsMagic can have identical data. I no longer lose my valuable work from Ancestry (or have to re-enter the data on my laptop).

So I hope  my readers follow my example and Tweet at Ancestry. You too can help improve their software, when you find bugs and request bug fixes. I hope my fixes help other genealogists (like the many Polish genealogists who read this blog). NOW we have diacriticals. See they are a kindly 800 lb. (362.87 kg) Gorilla of the genealogy world. That is why I am a subscriber — they value me as a customer.

THANKS AGAIN, Ancestry!

–Stanczyk

4/22/2013

March 1, 2013

Thinking About @Ancestrydotcom ‘s GEDCOM — #Genealogy, #GEDCOM

by C. Michael Eliasz-Solomon

GorillaFamilyTreeAncestry.com (Twitter: @Ancestrydotcom ) is the proverbial 800 lb (362.87 kg) gorilla in the genealogical archive. You cannot miss him — mostly he’s lovable. So today after you read this blog post, Stanczyk wants you to tweet at him (see Twitter link above). I am hoping the big ape will make some improvements to their software. Hint .. Hint !

A couple of days ago (25-Feb-2013), I ran my PERL program against the GEDCOM file I exported from my family tree on Ancestry.com ‘s  website. That tree, the RootsWeb tree, and this blog are Stanczyk’s main tools for collaboration with near and distant cousin-genealogists (2nd cousins, 3rd, 4th, 5th cousins — all are welcome).

Quick Facts —

  1. No invalid tags  – Good
  2. Five custom tags – Also Good
  3. CHAR tag misused – ANSI [not good]
  4. My Ancestry Family Tree uses diacriticals: ą ć ę ł ń ó ś ź ż   in proper nouns [not good]
  5. Phantom Notes ??? [really not good]

So, Mr. Ancestry (sir) can you please fix #’s 3, 4, and 5, please?

CHAR -  I think Ancestry should use what is in the standards: ANSEL | UTF-8 | UNICODE | ASCII . I think this is easily do-able (even if all you do is just substitute ASCII).

This is not a picayune, nit-picky, persnickety, or snarky complaint. In fact, it leads right into the next problem (#4 above). Not only does Ancestry export the GEDCOM file as “ANSI”, it strips out my diacriticals too (as a result?). So now I have potentially lost valuable information from my research. For Slavic researchers, these diacriticals can be vital to finding an ancestor as they guide how original name was pronounced and how it might have been misspelled or mistranscribed in the many databases. Without the diacriticals that vital link is lost.

The last criticism is an insidious problem. Every time I exported the GEDCOM, I would get a note on one person in the tree. I would carefully craft the note on Ancestry, but what I received in the GEDCOM file downloaded would be different ???

I reported the problem to no avail and no response. This is not very good for an 800 lb gorilla.

Digging Deeper

I have since gone on to do some experiments and the results may astound you (or not). I copied the NOTE I was getting in my GEDCOM and saved it off to a text file, perplexed as to where it came from, since it was not the NOTE I was editing on Ancestry??? Now I did something bold. I deleted the note from that person on Ancestry and then downloaded the GEDCOM file again. Do you what I got? Wrong! I did not get my carefully crafted NOTE, I got yet another NOTE. I copied that note’s text and repeated my process of deleting the note and downloading the GEDCOM file a 3rd time. This time when I edited my GEDCOM file, I found MY note!!! But where/how did the other two notes come about? Why were there three notes? Why could I see and edit the 3rd note, but only get the first note when I downloaded the GEDCOM file? How did notes 2 & 3 get there? Why did I not get all three notes when I downloaded the GEDCOM? All good questions that I have no answer to. My suspicion is that Ancestry should not allow more than one EDITOR on a tree, other contributors should only be allowed to comment or maybe provide an ability to leave sticky-notes on a person [that does not go into a GEDCOM file]. I do not think the notes were created by their mobile app since I always saw my NOTE (and not the other two notes). I am chalking this up to an Ancestry.com bug and urging others who see strange things in their notes to take deliberate steps to unravel their notes. I hope Ancestry will fix this and let people know. I hope they fix all of items #’s: 3, 4, and 5.

So, my dear readers, I am asking you to tweet to Ancestry (as I will too) and  ask them for bug fixes. Perhaps if enough people tweet at @Ancestrydotcom, they will respond and not give us the cold  gorilla shoulder.

February 25, 2013

Thinking About Gedcom — #Meme, #Genealogy, #RootsTech

by C. Michael Eliasz-Solomon

Stanczyk has been thinking about GEDCOM a lot these days. As you may know, GEDCOM is the de facto standard format for a genealogical family tree file, in order for it to be shared amongst the many genealogical software programs / websites / apps. Most genealogy programs still use their own proprietary format for storing data but will import / export the data in the GEDCOM standard for you to exchange data with another program or genealogist.

Did you catch the phrase ‘de facto standard’ ? OK it is NOT an open standard maintained by ISO or ANSI standards organizations. But it is widely supported and in fact you should NOT buy or use software that does not support the export and import of GEDCOM files!

Well we are coming up on RootsTech 2013 and my mind is turning back to the technical part of genealogy again!

Today’s blog is about the GEDCOM used by Ancestry.com. Were you aware that you can export your family tree from Ancestry.com? You can by selecting/clicking on ‘Tree Settings‘ under the ‘Tree pages‘ drop down menu (Tree Settings will be the second from the bottom in the menu list). If you click on ‘Tree Settings’ you will see a screen similar to:

ANCESTRY_TreeSettings

Notice that after you click on the ‘Export tree‘ button, that you get a new button named, ‘Download your GEDCOM file‘  in that same place.

In all likelihood if you click on the  ‘Download your GEDCOM file‘ button you will get a file in your Downloads directory on your local hard drive. It will have a name of:

<your-family-tree-name>.GED

Now the phrase ‘<your-family-tree-name>’  will actually be something like ‘Eliasz Family Tree.GED’ . So your Downloads directory will have a similar named file (complete with blanks in the file name). The size of the file will be dependent on how many individuals, families, sources, etc. that you have recorded in your family tree. Figure on a file size of 2MB for about 1,100 people.

Now this file you just downloaded from Ancestry.com is really just a plain text file with a set of standardized ‘tags’ defined by the GEDCOM standard. Software vendors are free to define their own custom tags too. Although CUSTOM tags must begin with an underscore (‘_’). I was curious as to how well Ancestry.com implements/adheres to the GEDCOM standard, so I wrote a little program (in PERL for you programmer types) to analyze my GEDCOM file that I just downloaded.

ReadGedcom_ANCESTRY

My program, read_gedcom.pl, spits out a slew stats about the GEDCOM including the tags used. As you may be able to see from the screenshot, there sorted at the end were 5 custom tags:

_APID,  _FREL,  _MILT,  _MREL,  _ORIG

These names do not have any meaning except to Ancestry.com and their website’s program(s). What you also see are that in 48,538 lines (in the GEDCOM file downloaded), that 5,158 lines have one of these five custom tags. Normally, I will just ignore these tags and import the GEDCOM file into my laptop’s genealogy software (REUNION, RootsMagic, PAF, etc.) and let that software ignore these non-understandable tags and within seconds I have my Ancestry.com family tree imported in to my computer’s genealogy software. That is fine  — no problems.

But what do you think happens you if turn right around and upload that GEDCOM file into your RootsWeb family tree? If you use RootsWeb, then you know you get a LOT of _APID notes across all of your ancestors and sometimes, if you have many facts/citations for any ancestor, then the RootsWeb page for him/her will be horribly marred by all of these _APID tags!

TIP

Remember I said the GEDCOM file is a TEXT file. As such it can be edited by whatever your favorite text editor that you use. If your editor does global search/replace, then you can easily remove these CUSTOM tags (_APID, etc.). That will make your RootsWeb family tree individual pages look MUCH better.

Now I know what you are thinking. Do NOT go editing your GEDCOM file!  I agree.  Make a copy of your GEDCOM file and edit the copy of the downloaded GEDCOM file to remove the lines with ‘_APID’ on them. You can remove all custom tags, but I just bother with the _APID which are so irksome. If your editor can remove the lines with ‘_APID’ then that is what you should do. But if all your editor can do is replace the lines that have _APID on them with a blank line then that is OK too. Make those edits and save the edited (copy) file.  The blank lines seem to be ignored by RootsWeb — thank goodness.

Now you can upload the edited file, with the _APID custom tags removed to RootsWeb and your family tree will again look the way it used to before,  without these irksome custom tags.

Next time I will tell you what I found when I looked closely at what ANCESTRY.com was putting into the downloaded GEDCOM file.

May 4, 2012

BIG Genealogy — #Genealogy, #FamilyTree, #GEDCOM

by C. Michael Eliasz-Solomon

When Stanczyk, wrote the title, he was not referring to Ancestry.com or any other endeavor by genealogical companies from the western USA. No, Stanczyk is fascinated with numbers .. of people.

Yesterday, this jester wrote about the Confuscius Family Tree. It is commonly accepted to be the largest genealogy (family tree). But I had to wonder … Why?

It is an old genealogy, dating back to Confucius’ birth in 551 BCE. It is now 2012, so we have a genealogy that is 2,563 years old. My much beloved wife/kids are Jewish. In the Hebrew calendar we are presently in the year 5772. Despite my having been to a Jewish Genealogical Conference and meeting a man who told me his genealogy went back to King David. [This jester resisted the rude/snarky comment that if he researched using both Old & New Testaments he could push his research back to Adam.]

I also did not ask him to show me his documentation, but assuming he could, his genealogy would have been another 500 years earlier (~ 1050BCE) and therefore this tree mathematically speaking (assuming there are other Judeo-Christian couplings before I & my wife) his tree had the potential if you could/would follow all/many branches and not just the direct lineal trunk you have a tree with approximately 100 generations (adding another 17 generations to the 83 for Confucius). This assumes a generation is 30 years. Now if we look at Confucius and see 2560 years = 83 generations, we see an average of 30.84 years per generation — so 30 years per generation is not a bad estimate.

What genealogy could be older still? Well according to the Bible we record the Jewish peoples in Babylonia. So perhaps we can extend King David and/or one of his citizens back to King Hammurabi of Babylonia — that would yield another 650 years (~1700BCE) or about another 22 generations. Let me see if Confucius’ family tree is about 2 Million for 83 generations we get about 24,096 people per generation. So by adding 39 more generations then Hammurabi’s Family Tree should contain approximately another 940,000 people. So come on Iraq produce your family tree of nearly 3 Million people!

What genealogy could be older than that? There is a quote that goes something like, “History knows no time when the Egyptians were not highly developed both physically and intellectually.” True enough, recorded history does go back furthest in the Pharaohnic dynasties. That takes genealogy back to the first dynasty King (Pharoah) Menes, who sure enough had a son who wrote about Astronomy [source: Timechart History Of The World, ISBN 0-7607-6534-0 ]. That takes us to approximately, 3,000 BCE, another 1300 years/44 generations/1.06Million people! Ok, since there is no recorded history earlier than that, we will not have a properly sourced genealogy older than this. So people who are Elizabeth Shown Mills devotees turn your heads away …

What genealogy could possibly be older than that? I read that the indigenous peoples of Australia have an oral history of 48,000 generations! The aboriginal people of Australia date back to about 50,000 BCE, which would be 52,000 years ago/1734 generations/41.8Million people in their family tree. That’s not 48,000 generations, but that is more than twice as much as genealogy researchers test using their FAN24.ged file which has 24 completely full generations with 16.8Million pseudo people.

Now that is what I call BIG Genealogy. But where is that family tree (not FAN24.ged)? Why has no genealogy older than Confucius’ genealogy been found and carried forward to the present day? Is it possible that such a family tree exists?

–Email me!

Related Blog Articles …

Random Musings” (10-March-2010, see musing #2)

March 12, 2012

Ancestry.com Fixed My GEDCOM Export !!!

by C. Michael Eliasz-Solomon

Stanczyk is happy once again !

The folks at Ancestry.com fixed my GEDCOM Export. It was about 10-14 days, but at least the job got done and my ability to export my research is back to normal.

The timing of the infinitely spinning icon could not have been worse. I had just imported a great deal of photos and I continued to do so even with the export problem. But all is well that ends well. So I did one more export (and it worked) to get myself to a valid checkpoint of my work.

Whew! What a relief. I did not want to have to once again re-enter my multimedia. Nor was I previously aware that I would also have lost my valued contributors too. Who knows if their emails have changed since I invited them???

At any rate, thank you Ancestry.com for fixing my GEDCOM Export!

March 10, 2012

Ancestry.com Broken ? Is Your GEDCOM Export OK? — #Genealogy, #Technology

by C. Michael Eliasz-Solomon

Stanczyk, wants to know if anyone else is having problems Exporting their GEDCOM from Ancestry.com?


 This is what I see when I try to export my gedcom from the tree settings screen. It never gets past 0% complete.

I have tried to submit a Help Ticket for technical support and so far I have not received any response. What gives Ancestry?

I can still work on my tree and updates appear to be saved. I can synch to the Ancestry App (on the iPhone) and the changes are there too. 

March 2, 2012

Diacritical Redux – Ancestry GEDCOM — #Genealogy, #Technology

by C. Michael Eliasz-Solomon

As Stanczyk, was writing about the GEDCOM standard since #RootsTech 2012, I began to pick apart my own GEDCOM file (*.ged). I did this as I was engaged with Tamura Jones (a favorite foil to debate Genealog Technology with). During our tête-á-tête, I noticed that my GEDCOM lacked diacriticals???

What happened? At first I thought it was the software that Tamura had recommended I use, but it was not the problem of that software (PAF). So I looked at the gedcom file that I had imported and the diacriticals were missing from there meaning, my export software was the culprit.

I looked at the GEDCOM’s  HEAD tag and the CHAR sub-tag, and it said “ANSI” [no quotes] was the value. That is not even a valid possible value! According to the GEDCOM 5.5.1 standard [on page 44 of the FamilySearch PDF document]:

CHARACTER_SET:= {Size=1:8}
[ ANSEL |UTF-8 | UNICODE | ASCII ]

Who is this dastardly purveyor of substandard GEDCOM that strips out your diacriticals (that I assumed you have been working so hard to add since my aritcle on Tuesday,  “Dying For Diacriticals“)? I’ll give you a HINT, it is the #1 Genealogy Website  — Yes,  it is ANCESTRY.COM !

Now what makes this error even more dastardly is that the website shows you the diacriticals in the User Interface (UI), but when you go to export/download the diacriticals are not there in the gedcom and unless you study things closely, you may be oblivious (as Stanczyk was for a long time) that these errors have crept into your research. I also found a spurious NOTE that I cannot find anywhere on anyone in my tree — which gets attributed to my home person (uh, me). This is very alarming to me too !!!

Tim Sullivan (CEO of Ancestry.com), I expected better of you and your website. I entrusted my family tree to you and that is what you did with my gedcom? Now I did some more investigating and I found that Ancestry does not strip ALL diacriticals. My gedcom had diacriticals in the PLAC tags and in NOTE tags. But NOT (I repeat NOT) in the NAME tags.

So Tim [pretend there is a shaky leaf here] , if you or a reputation defender or some other minion skims the Internet (for your name) here is what  I hope You/Ancestry.com will do:

  1. Do NOT strip diacriticals from the NAME tag !!!
  2.  Fix the Export GEDCOM to create a gedcom file with diacriticals in NAME tags
  3. Fix the Export GEDCOM to create a valid CHAR tag value: UNICODE, UTF-8, ASCII, ANSEL. I put them in my prioritized/preferred order [from left-to-right]. I hope you will not use ASCII or ANSEL.
  4. Run a GEDCOM validator against the gedcom file your Export GEDCOM software creates to download and fix the other “little things” too  (Mystery NOTEs ???).
February 26, 2012

Responses – Exploring Gedcom — #Technology, #Genealogy

by C. Michael Eliasz-Solomon

Mail Room

The mailroom received three emails / comments from the “Exploring Gedcom” article.

Tamura Jones (Modern Software Experience), Louis Kessler (BeholdGenealogy.com),  and  Stan Mitchell (GenApps.net / ezGED Viewer).

Good solid GEDCOM experts all (unlike this jester who is only a journeyman apprentice to these fine men) and as you can see they are also bloggers themselves.

Here’s my summation:

  1.  WHITESPACE – All three disputed the whitespace proposal. Even though it was an optional feature accessed via on/off check-box — I yield to what I see as rising tide that I cannot swim against. I assume that XML will also be treated as badly by all for EXACTLY the same reasons — too verbose and makes for a poor data transfer mechanism because of  the bloat.
  2. UNICODE – Tamura pointed out that it was in the 5.5.1 standard. I said maybe so, but hardly implemented, needs to be mandatory. I also hate the two-hexabyte binary debris that makes an otherwise TEXT file into a partial binary file. Tamura points out that this byte-order indicator is commonly hidden (I am old school and use vi — nothing hidden) by PC editors. Besides, the HEAD tag CHAR sub-tag could be used to determine character set and keep file textual. Tamura said that would be a catch-22 (since you do not know the files encoding). Tamura points out that everyone does as I suppose and use ASCII (or UTF-16LE or UTF-16BE) to determine encoding.  Documentation needs to be updated too. Really almost no support for generating UNICODE chars in app — should be required, otherwise data entry is limited to clever users (i.e. tech-types or their friends).
  3. DATES – nobody liked DATES (or NAMES) as a zero level. I can live without dates, as I can always create a dimension with every possible date to slice & dice my genealogy facts. Names was also not part of my vision, other than I want a bunch of AKA names for an INDI.
  4. LOCN – Everyone agreed, the PLAC tag did provide a minimalist capability. 2/3 saw a good reason to have LOCN at the zero level, as I proposed. Let’s hope this feature gets in.  We may need to keep PLAC tag for backwards compatibility until all gedcoms have been converted.
  5. EVENTS as a zero level tag seemed to interest people, as MULTI-PERSON events (aka EVENT_TYPE_FAMILY) is NOT adequately dealt with in GEDCOM 5.5.1. I also think people want to standardize these events as much as possible and leave open the ability for a user to add their own events. EVENTS was also related to GROUPS which people seem to want in some fashion. The need to analyze a social network needs to have some better GROUP/EVENT/ROLE visibility that the current standard provides. I think we really we need EVENT and EVENT_TYPE tags to keep from adding a new GEDCOM tag every time someone says we need a new event (BIRT | CHR | BAPM | BARM | BASM | BLES | SLGC). All TYPE tags should be from a standard list that a user can add to. The list should be allowed to be localized (into a native langauge). This keeps parsing to a minimal while allowing for expansion. OLD tags are kept for backwards compatibility until a gedcom is upgraded to a later version. I think we also need a ROLE_TYPE to replace ROLE_IN_EVENT and add more standard roles, (i.e. GodMother, GodFather, Witness, Neighbor, MidWife, Rabbi, Brother, Sister, Aunt, Uncle, Cousin, Border, etc.) and this should also be localized and user upgradeable. Keep  EVENT_TYPE_FAMILY and EVENT_TYPE_INDIVIDUAL for backwards compatibility.
  6. DOCUMENTS (deferred) – The case was not made nor was the concept adequately explained. To Stanczyk this is related to multimedia and is for making it possible to locate all documents of a certain type (i.e.  Declaration of Intent) with its date and location and a locator to the multimedia  representation and pull these documents out in their own right regardless of the individual or family or group.
  7. GROUPS (deferred) – Some interest in defining a non-family group (ex. military unit, college fraternity/sororiety, religious society, etc.). These groups would be interesting in their own right to study. At RootsTech 2012, this seemed to be a novel idea that had a positive feel to the audience.

GROUPS is not intended for a non-traditional family unit which needs some thought and design in this Modern Family World.

Tamura chided me that many of my “wants” are a part of an offshoot of GEDCOM called GEDCOM 5.5EL (Extended Location GEDCOM derived from version 5.5). The only difference is  I want to get rid of the need for _LOC (a custom tag under GEDCOM definition) and use LOCN instead. Also I would want their undefined tag called NAMC (possibly renamed NAMA for NAMe Alias or NAMe Altertnate) be 0:M; meaning that you can have zero, one or many alternate/alias names for this LOCN (or INDI why not?).

Also the NAMC (or NAMA) should have a subtype FONE and FONETYPE (soundex, DM, Bieder-Morse, etc.) to aid in advanced searches or Google searches. But this is the argument for NAMES at zero level. The last names are usually where the  soundex/phonetic matching need to be stored. We do not need to repeat this data for each individual (INDI) just for each unique last name or alternate name. These things get created in Surname Index pages – how much easier if the NAMES (re surnames and alternate surname) had a zero level with the FONE, FONETYPE and the INDI had an XREF pointing to each NAME/NAME ALTERNATE s/he had used during their life. One might even hope that the zero level name had an XREF to each INDI too. If I were the Data Architect for GEDCOM, I have a zero level NAMES (for SURNAMES) and their soundex/phonetic codes, plus XREFs back and forth to INDI.

I need to cut this response short, but a great thanks to all who read the article and the above three for improving my thoughts by their comments/emails.

P.S. – If you follow the GEDCM 5.5EL link, you will notice they show their gedcom indented and then go on to say about the whitespace:

” improves readability … should (and will) not be performed at real Gedcom-output.”

My sentiments exactly. Sometimes you just need to show the gedcom (and indentation improves the presentation and understanding). I too never intended it to be processed into the gedcom output/exported to a file for transport (just for my own personal examination or writing purposes). However, I have capitulated on whitespace — so please no more email on whitespace and bloat.

February 23, 2012

Meme: Exploring GEDCOM – Gedcom Lines — #Genealogy, #Technology, #Mashup

by C. Michael Eliasz-Solomon

Stanczyk wants to introduce a new meme,  “Exploring GEDCOM“.   I was musing upon why is the state of a GEDCOM standard,  … so CHAOTIC?    GEDCOM has languished for about a decade and a half now with no new standard  — hence my article, “Is GEDCOM dead?” (2/5/2012) .  I was left in a perplexed state after RootsTech 2012. Why is FamilySearch working on a “standard” in a vacuum? Why is there so little communication with the existing software vendors — the purveyors of GEDCOM and why do the end users have no voice into what is needed in a GEDCOM standard?

So I decided that GEDCOM needed an Evangelist. I believe there are already a plethora of GEDCOM Evangelists so perhaps I will just add to the milieu (or is it the meme). To be frank, most GEDCOM Evangelists are really GEDCOM complainers — nay, I think we are all complainers, because there are no GEDCOM complimenters, not even amongst the GEDCOM purveyors. Even FamilySearch, which “owns” GEDCOM (how can that be a standard) wants to make their latest effort (GEDCOMX) a “clean sheet” project. No backwards compatibility even!

Is GEDCOM  just an ugly baby whose parentage is in doubt?

So this meme is on Exploring GEDCOM. What is it? How can it be improved? What should a TRUE gedcom  standard include?  I’ll probably write once to three or four times a month on this meme until I have exhausted myself on this topic. My goal is ultimately, is to get this to be a part of RootsTech and to be an OPEN STANDARD with an open, transparent definition and process for change, which I hope to have tied to RootsTech attendees voting on this, possibly via the RootsTech App.

Allow non-attendees to vote if they register who they are and their role: genealogist, technologist, software vendor, etc. and why they want to be a voter. I think conference attendees (genealogists, technologist, or vendor-of-any-kind, organizer) get an automatic vote, prior attendees get a vote, gedcom software vendors get a vote. All prior voters get to vote in all future votes on the open standard (as long as their email address works or when it is corrected again). OPEN STANDARD means that all stakeholders need to have an opportunity to influence the standard.

Let me start the Meme by revisiting graphic syntax diagrams  …

I started with this railroad track (2/16/2012) to define a gedcom file. Our discussion will focus upon gedcom v5.5.1 and launch from that rocket pad into some far flung future gedcom feature(s). This diagram was derived from the standard in PDF form. I have attempted to make the standard more “grammatical” and formularize/define ambiguities to my genealogical/technological world view. We see a HEAD tag, a TRLR tag and an option SUBN tag with a whole bunch of “gedcom lines”.

Gedcom Line

A V5.5.1 Gedcom Line

This is what a gedcom line looks like. I have added a wish for optional whitespace at the beginning of a line. That is my first proposal. The number at the beginning of each line is meant to be “an outline level”. So I wanted the option of outputting lines with leading blanks corresponding to the level of indentation appropriate for the outline level — to aid readability of seeing what inner outline indentations go  with which outermost level. Make the whitespace a checkbox on export (directed at you software vendor guys) and default it to off.

We see that a gedcom line at its (current) core describes: families, individuals, notes, repositories, sources, submitters & their multimedia (digital documents, notes, memories, etc.). This is still a very high level discussion. We have only spoken of 3 of the 136 tags. But already this jester has a suggestion/complaint.  Let me defer a discussion of Multimedia_Records to its own article as this requires many words, a lot of which are jargon. The complaint – we need more zero level tags!

So deferring multimedia, we have six types of records. A software vendor might think six different tables (or objects) that need to be described and stored as we “parse” each gedcom line in the file that stores our family tree. Do not lose sight that these files are family trees of some researcher — not abstract or theoretical data. These are research from current or prior genealogists and they need to be preserved …  without loss.

At its inner core is a set of individuals (INDI tag). I once wrote a PERL script to pull out all individuals with their vital data (B/M/D). Very easy thing to do. I mention this now to illustrate that these compact files are at the intersection of genealogy and technology. These gedcom files are emblematic of the technology / genealogy mashup that is RootsTech! They are also the way we can interface our genealogies with other non-family tree tools to do additional things. Lets call those gedcom ADD-ONS (or PLUG-INS or APPs) that,  I am hopeful, that with a standard API to be able pull this info out, just like my PERL script pulled out the individuals.  That is the essence of an INDIVIDUAL gedcom record.

There are also FAMILY gedcom records that are defined by FAM,  FAMC,  FAMS and the temple ordinance (i.e. LDS) FAMF tags. Likewise, we have NOTE (NOTE), SUBMITTER (SUBM),  and REPOSITORY (REPO)/SOURCE (SOUR) records too. I mentioned the FAMC/FAMS tags in addition to FAM which really equates to the FAMILY-RECORD, in order to point out that an individual is part of two families. S/He is a part of a family where they are a child(FAMC) and they are also part of the family where they are a parent (re SPOUSE, hence FAMS). This is evident when you realize that we are speaking of a family tree and that a tree really goes forward and backward linking the present to the past (and logically,  vice-versa).

What’s Missing? – A Proposal (the first of many)

I am still ignoring MULTIMEDIA — so that is not it. If we believe in Jay Verkler‘s RootsTech 2012 vision for genealogy, then we need to conform (i.e. standardize):  Dates, Locations, Names. I would also add: Events, Documents,  and possibly Groups. So that is six more zero level RECORDS.

DATES I assume need to be standardized because of the many problems: missing date, partial date, estimated date, various calendars, etc.

NAMES are also a problem area. For example, how do I record my ancestor’s name? Do I conform his name to ENGLISH (i.e. does Piotr become Peter)? Should I record it in his context, (i.e. Pawel for Paul)? Should I record it in the language of the record (my ancestors come in Latin (Paulus), Polish, and Russian. Oh, some of those names do not translate to the other language, so we have adopted names/name changes/nicknames. Latin alphabet versus Cyrillic characters versus Hebrew characters or even just recording diacritical letters like slashed-l (ł ).

UNICODE support is a MUST in any new standard.

We also need Locations, Events, Documents, and Groups as zero level “records”, so that we can pull those out of the file, just as I pulled Individuals out of the file. Locations (i.e. Biechów, Busko, Kielce, Poland) that is the administrative hierarchy of one of my ancestral villages. Of course, it changed over time or by whoever occupied Poland (or should I view it as Congress Poland/Vistulaland as a part of the Russian Empire’s many gubernias). Clearly locales have a time component.

I deferred MULTIMEDIA because it is technical and also because I want to make the case that we need EVENTS and/or DOCUMENTS instead and that MULTIMEDIA are just NOTES that are not textual and often this is congruent with the fact that this digital media is a representation of some document(s) that documented an event. I also propose GROUPS as a record because people want to record connections to MILITARY units, CHURCH SOCIETIES,  SCHOOLS, BUSINESSES/ORGANIZATIONS, REUNIONS, or GOVERNMENTAL/HISTORICAL units that may be of a historical or a strong emphasis within a family history. I think the GROUPS could all be user-defined, with maybe a conformed group-type (i.e. military, religious, government, historical, etc.). This does not feel like the same level of importance as the others: Names, Dates, Locations, Events or Documents.

Summary of Proposed GEDCOM Enhancements

(excluding MULTIMEDIA)
  1. whitespace – for readability
  2. UNICODE support so proper nouns can be recorded in their context with diacriticals or character sets (that are not Latin).
  3. New Zero Level TAGS:  NAME, DATE (not mine, but Jay Verkler’s emphasis)
  4. New Zero TAGS (that Stanczyk wants):  EVNT,  DOCS, &  LOCN (Jay also wanted locn).
  5. Possibly GRUP – to support development of non-familial group memberships in trees

The new zero level tags are to support future CONFORMATION (standardization) efforts and also are the most likely to be sought after via any future API for enhanced analyses or specialized output in reports/charts.

Stanczyk views the Zero Level TAGs as possible dimensions for slicing-dicing a genealogy cube, what Data Architects see as OLAP analysis/reporting   sorry that jargon just slipped out.

The vision is cross family tree bumping or cross website bumping of gedcom data against databases to accomplish new and novel approaches to searching, merging or analyzing. This genealogy data could also be of use to historians or scientists as new sources of data to be mined for their research.

That’s the gedcom exploration for today!

 

P.S. 

Please read the comments too. Apparently, I was wrong. There is a GEDCOM Evangelist who is not a gedcom complainer.

February 16, 2012

GEDCOM “RailRoad Tracks” (aka Graphic Syntax Diagram) – #Genealogy, #Technology

by C. Michael Eliasz-Solomon

The above diagram is what Stanczyk had been jabbering about since the #RootsTech conference. Isn’t that much easier on the eyes and the grey matter than a complex UML diagram? Who even knows what a UML diagram is or if it is correct or not?

What does it say is in a GEDCOM file (ex.  Eliasz.ged)?

A HEAD tag  optionally followed by a SUBmissioN Record followed by 1 or more GEDCOM lines followed by a TRLR tag.

ex. gedcom lines  that can be “traced” along the railroad tracks at the top.

 0 HEAD
 1 SOUR Stanczyk_Software
 1 SUBM @1@
 1 GEDC
 2 VERS   5.5.1
 2 FORM  LINEAGE-LINKED
 1 CHAR  UNICODE
 0 @1@ SUBM
 ...
 0 TRLR

OK Stanczyk_Software does not exist, but was made up as a fictitious valid SOURce System Identifier name. The GEDCOM file (*.ged) is a text file and you can view/edit the file with any text editor (vi | NotePad | WordPad | etc.). I do not recommend editing your gedcom outside of your family tree software, but there is certainly nothing stopping you from doing that ( DO NOT TRY THIS AT HOME). If you knew gedcom, you could correct those erroneous/buggy gedcom statements that are generated by so many programs — that cause poor Dallan Quass to ONLY acheive 94% compatibility with his GEDCOM parser.

Have you ever downloaded your gedcom from ANCESTRY and then uploaded it to RootsWeb? Then you might see all those crazy _APID  tags.   It is a custom tag (since it begins with an underscore  — GEDCOM rules dear boy/girl).   It really messed up my RootsWeb pages with gobbledygook. I finally decided to edit one gedcom and remove all of the _APID tags before I uploaded the file to RootsWeb. Aaah that is SO much better on the eyes. Oh I probably do not want to re-upload the edited gedcom into ANCESTRY, but at least my RootsWeb pages are so much better!   The _APID is just a custom tag for ANCESTRY (who knows what they do with it) so to appeal to my sense of aesthetics, I just removed them — no impact on the RootsWeb pages, other than improved readability. [If you try this, make a backup copy of the gedcom and edit the backup copy!]

Now obviously the above graphic syntax diagram is not complete. It needs to be resolved to a very low level of detail such that all valid GEDCOM lines can be traced. It also requires me/you to add in some definitional things (like exactly what is a level# — you know those numbers at the beginning of each line).

I have a somewhat mid-level  graphic syntax diagram that I generated using an Open Source (i.e. free) graphic syntax diagrammer, as I said in one my comments, I will send it to whoever asks (already sent it to Ryan Heaton & Tamura Jones). You can get a copy of Ryan Heaton’s presentation from RootsTech 2012 and compare it to his UML diagram (an object model). I think you will quickly realize that you cannot see how GEDCOM relates to the UML diagram — therefore it is difficult to ask questions or make suggestions. A skilled data architect/data modeler or a high-level object-oriented programmer could make the comparison and intuit what FamilySearch is proposing, but a genealogist without those technical skills could NOT.

I am truly asking the question, “Can a genealogist without a computer science degree or job read the above diagram?” and trace with his finger a valid path of correct GEDCOM syntax [ assuming a whole set of diagrams were published]. The idea is to see how the GEDCOM LINES (in v5.5.1 parlance FAMILY_RECORD, INDIVIDUAL_RECORD, SOURCE_RECORD, etc.) are defined and whether or not what FamilySearch is proposing something complete/usable and that advances the capabilities of the current generation of software without causing incompatibilities (ruining poor Dallan Quass’s 94% achievement). Will it finally allow us to move the images/audio/video multimedia types along with the textual portion of our family trees and keep those digital  objects connected to the correct people when moving between software programs?

 

GEDCOM files are like pictures of our beloved ancestors. They live on many years beyond those that created them. Let’s not lose any of them OK?

February 13, 2012

Blog Bigos …

by C. Michael Eliasz-Solomon

Stanczyk added a new Page (Tech Diary) to record my technology doings.

While doing that and reading from my blogroll (and emails), I discovered some history about the “defacto standard GEDCOM” (wiki: GEDCOM ). Now I strongly recommend you start from “defacto” link rather than the wikipedia link.

  • RootsTech 2012 – had two GEDCOM presentations by Ryan Heaton (FamilySearch, GEDCOMX project).
  • RootsTech 2012 – had one open source GEDCOM parser presentation by Dallan Quass. Dallan was quite remarkable in his efforts to achieve a 94% commonality amongst 7,000 different GEDCOM files. Dallan Quass has a GitHub project for his Open Source GEDCOM parser.
  • Modern Software Experience (Tamura Jones) had a couple articles that caused me to write this article. His most recent GEDCOM article that caught my eye was:  BetterGEDCOM (2/2/2012). I also noticed he had a GEDCOMX article from 12/12/2011. These two articles provide a good discussion. I also noticed that the BetterGEDCOM project had their own project blog. [also see his Gentle Introduction to GEDCOM  article].

I believe those provide the most recent current thoughts on GEDCOM (that I have not penned).

  • I have been studying GEDCOM v5.5 (the last GEDCOM standard).
  • I produced a partial Graphic Syntax Diagram of GEDCOM v5.5 [what I had been calling "Railroad Tracks"] just to demonstrate how I thought this diagram was a better vehicle to communicate the standard [than say UML object models].
  • I could not resist making slight tweaks to GEDCOM v5.5 even in my preliminary studies. Mostly so we could discuss GEDCOM in a readable fashion (i.e. whitespace for formatting, and comment lines ) or because the language cries out for consistency (i.e. requiring the HEAD tag to be a zero level, just like the TRLR tag).

My  Graphic Syntax Diagram of GEDCOM v5.5 was produced using an open source tool. It is partial and still high level. I did put in a construct so that you can clearly see all 128 standard tags. The Graphic Syntax Diagrammer is an excellent tool. I will have to offer the author a suggestion for the PNG images that it outputs. I need to take my diagram and manually edit it to make the drawing a better fit for 8.5″ x 11.0″ (aka A1) paper. I need to graphically wrap the railroad tracks and to add page breaks so that the image is itself usable for viewing/discussions. I will offer this sample drawing to any interested parties — including emailing the edited product to Ryan Heaton and Dallan Quass [who since they did not request it -- can feel free to ignore it].

My goal is to make minor tweaks to  GEDCOM v5.5 via this diagram [not programming] and try and get DallanQ to produce a one-off parser for it (call it, say GEDCOM 5.5.999) and hope that my tweaks will not lower Dallan’s hard work of achieving 94% compatibility. If it turns out to have virtually no effect on Dallan’s 94% compatibility in his Open Source parser, then I can think about  getting some software vendors to utilize the enhancements (via end user requests), since they are trivial, just to move the standard forward and to open an interest in the vendors to looking at how we create a new Open Standard for GEDCOM.

P.S.

Thanks to Tamura Jones, I now know I need to update my diagram to GEDCOM v5.5.1 first

February 12, 2012

GEDCOM Standards – Where Genealogy Meets Technology — #Genealogy, #Technology, #Standards

by C. Michael Eliasz-Solomon

Stanczyk, has been churning since about November of last year (2011).  I have a number of ideas rummaging around my brain for genealogy apps. For over a quarter century, I have been a computer professional and used and/or developed a lot of  programs using a myriad of technologies. At my core, I am a data expert: design it, store it, query it, manage it, analyze it and protect it. It being the data.

Before going to #RootsTech 2012, I knew GEDCOM was the core of our hobby/business/research. GEDCOM is our defacto standard. It is how data in exchanged between us and our various programs. I say defacto because as a standard goes it is not a very open standard (one organization “owns”   it, and  the rest of us go along with it). It also has not changed in about decade and a half; So Ryan Heaton was correct in calling it “stale”. It does still work .. mostly. Although if a standard does not progress then you get a lot of proprietary “enhancements” that prevent the interchange of data completely — since one vendor does not know how to deal with another vendor’s file in totality.

At present, GEDCOM maxes out at version 5.5, although there are various other variations you might  see. But 5.5 was the last standard version. I counted 128 total tags and a provision for creating non-standard tags (they start with an underscore).

[Mike thanks to Tamura Jones! Even though GEDCOM v5.5.1 was never finalized, it IS the defacto max version of GEDCOM. GEDCOM v5.5.1 added 9 tags, removed the BLOB tag, so we now have a total of 136 tags.   -- I will need to update even my high level graphic syntax diagram]

Tags are like:

INDI,   FAMC,   FAMS,   SOUR,   REPO,   HEAD,   TRLR    etc.   -or-      ALIA,   ANCE

The first bunch is familiar and are probably in your family tree (if you ever exported the GEDCOM file). The ALIA tag is one that Dallan Quass said was universally used wrong by all programs. After seeing its definition, I can see how it  is confusing.  As for the ANCE, tag I do not recall seeing any program letting me do any functionality that might utilize this tag. This tag is probably one of those tags that Dallan said is not used at all.

I looked at the “MULTIMEDIA” section of the standard. It looks like it is woefully out of date and probably not used at all (at least not in any standard way), which is probably why our pics, audio, and video (or any other media file like PDF, MS Word) do not move with the GEDCOM. Has any program ever used the ENCODING/DECODING of a multimedia file? The standard seems to imply a buffer of only 32K (for a line) and even if you used a large number of  CONC tags strung one after another you need 100 lines to store a 3.2MB file in-line in the GEDCOM. I do not think I have seen that in a GEDCOM. They probably stored these binary large objects (BLOBs) outside the gedcom and refer to their path on the computer/network.  I did some noodling. I have 890 MB (or approximately  890,000 KB) in pictures and scanned source documents for about 1,000 people in my family tree. So I use nearly a gigabyte (1GB) for my family tree and all other multimedia — and I do not have any audio or video!  So I use almost 1MB/person.

If we did have this magical new GEDCOM standard that could carry all of our multimedia from one GEDCOM program to another GEDCOM program, the copying would take a long time. If I uploaded/download it to/from the Internet, I might incur an overage on my ISP’s usage charges, if this were technically feasible!   Imagine if I did this multiple times a month (as I got updates). I am beginning to understand why no vendor has tackled the problem. I would also like to store PDFs and other documents besides GIF/JPG/PNG which can be displayed on the Internet web pages natively in a browser. Those are not a part of the existing GEDCOM standard. Let me sling some jargon — I’d want to store any file type that there is a MIME type definition for,  that I can currently embed in emails,  or utilize in Java programs or that the HTML5 standard will allow for multimedia.

The GEDCOM 5.5 was in its infancy on dealing with character sets. It was predominantly ASCII with some funky ANSEL coding of characters to handle latin alphabet diacriticals, although it is not clear how I would do the data entry for those and it looks incomplete. It did mention UNICODE, but only cursory and just to remind us that the lengths in the GEDCOM standard were in  ‘characters’ not bytes –which was correct. Although those multibyte characters (say in Hebrew, Russian or Japanese or Chinese) would quickly use up the 32K byte line buffer  limit, which would effectively become about 8K characters per line. In fact, GEDCOM 5.5 says it will only deal with LATIN alphabets and leave Cyrillic, Hebrew and Kanji for some far flung future. Stanczyk  is Slavic, I need UNICODE to represent my ancestor’s names and places. Fortunately, I do not feel the need for Cyrillic (Russian, Ukrainian, Belorussian, Macedonian, etc.) or I’d be out of luck. I’ll just use the Polish version of those names in their ‘Latinized’ forms.

Oh that is another area the standard needs to be enhanced. NAMES. Dallan mentioned that Personal Names do not get a thorough treatment in the standard (I am refusing to read the data model and I am a Data Architect). Location Names get almost no treatment — they do give you a place to store your locations  (PLAC tag). What language should I use, after all my ancestors are from POLAND for God’s sake. Besides the obvious Polish, I have German, Russian and Latin to deal with and being American I prefer English. Slavic names often do not translate well. For example Wladyslaw is Ladislaus in Latin, but in English there is no equivalent — maybe that is why my ancestors use ‘Walter’ instead. But the point is, how should I store the name? Can I store all of the equivalents and search on any of them? Nope.

Damn, Russian is Cyrillic.  GEDCOM doesn’t deal with non Latin alphabets;  And even though I can read the Russian genealogy records, I ‘d rather not nor would I want to try and do data entry that way either. Besides, the communists reformed the language in 1918 (making War & Peace considerably shorter in Russian); That reform eliminated several characters. Most modern software is not aware of the eliminated characters  much less able to generate them. This whole Language/Unicode/Name thing is complicated and I have not even mentioned the changing borders or the renaming of cities in different languages or over time or their changing jurisdictions. I cannot fault GEDCOM for all of these woes. I have them in my own research and I have not yet found any satisfying way to  handle them. I find it helps to have a very good memory and keep these things in my head — but there is no backup for that.

How are we ever going to arrive at the vision Jay Verkler put forth at #RootsTech?  GEDCOM needs to become an open standard. Once it is standardized again, then it needs to become modern again and deal with the current technology, so we can get around to the tough problems of conforming: names, places, sources/repositories, calendars/dates  and doing complex analyses like Social Network Analysis as a way to gather wayward ancestors into a family for which we lack documentation to prove (Genealogically). I hope the future includes Bieder-Morse phonetic matching and can deal with folding diacritical characters into a base character (ex.  change ę into e) for searches.

FamilySearch, if you are going to register GEDCOM tools, then please do a few more things for the NEW standard. First, make each vendor add to an APPENDIX the name and complete definition of their NON-STANDARD tags, in case anyone else wishes to implement or deal with them. Put a section in the header (HEAD tag) that lists all NON-STANDARD tags (just once each) along with its vendor so that someone else can go look at the standard and see what these tags mean and possibly implement the good ones. Forget that two byte thing before the HEAD tag. Just make the HEAD tag ‘s  CHAR sub-tag indicate the character set (ANSI | ANSEL | UNICODE ).  Please administer a #RootsTech keynote to vote on annual changes to the GEDCOM standard. Provide a GEDCOM validator and also a GEDCOM converter webpage to allow users/vendors to validate/convert their gedcom file(s).

Make multimedia be meta-data and allow users to define “LOCATIONS” where multimedia files can be found using either a PATH or a URL (or a relative path / URL). Make it a part of the standard that the meta-data must move, but the multimedia files can optionally stay put. Multimedia should be able to be placed on a LOCAL/NETWORK, or on the INTERNET or on a multimedia  removable volume(s) [thumb drives, CDs, DVDs, etc.]. Make the multimedia “LOCATIONS” editable so a user can switch between LOCAL/NETWORK, INTERNET, or REMOVABLE including using some of each type of LOCATION. Allows these files to exist or not (show “UNAVAILABLE” or some equivalent visual clue, if accessed and they do not exist).  The mapping between an Individual (INDI) or a family (FAM) or some other future GROUP and its multimedia file(s) must move as a part of the meta-data (even if the multimedia file(s) do not). That way the end-user need only edit his LOCATIONS meta-data (and ensure the files are in that/those location(s)) when he runs the software.

Define an API for GEDCOM plug-ins so that new software can access the GEDCOM without parsing the gedcom file. The API should give the external plug-in a wrapped interface to the underlying data model without having to know the data model, just the individual, family, or location, or a name list of individuals, families, or locations. This will allow new software to provide additional functionality to a family tree or to provide inter-operability between trees/websites. Obviously security/privacy rules would limit this kind of  plug-in access.

That’s Stanczyk’s vision of the GEDCOM future!

February 9, 2012

RootsTech 2012 Post Conference To Do List — #Genealogy, #Conference, #RootsTech

by C. Michael Eliasz-Solomon

Stanczyk has his work cut out for him…


I guess this is what happens when you come back from a conference. Your juices are flowing and you cannot wait to get to work on using the new knowledge you acquired to further your aims.

You can see the impact that Steve Morse had and Dallan Quass, Ryan Heaton, FamilySearch Cross Platform, Amy Johnson Crow, my Family History Library research results, Brooke Ganz and Google had upon me.

I have to admit Jay Verkler had a HUGE impact upon me, but I do not see what I can do about his vision???

Did you go to RootsTech 2012? If so, please comment/email on what you are doing.

February 5, 2012

Is GEDCOM Dead? Date/Place of Death, Please?

by C. Michael Eliasz-Solomon

The RootsTech Conference is living up to its name. Everywhere there was a sea of: iPhones/Androids, iPads (in huge numbers), and laptops. Even the very elderly were geared up. Google, Dell, and Microsoft were at RootsTech. — why not Apple, especially since their customers were present in LARGE numbers??? [note to Tim Cook have Apple sponsor and show up as a vendor.]

According to Ryan Heaton (FamilySearch), “GEDCOM is stale.” He went on to speak about GEDCOMX as the next standard as if GEDCOM were old and/or dead. They were not even going to make GEDCOMX backwards compatible! In a future session I had with Heaton I asked the Million dollar question, “How do I get my GEDCOM into GEDCOMX”? After a moments pause he said they’d write some sort of tool to import or convert the existing GEDCOM files. Well that was reassuring??? So they want GEDCOMX to be a standard but FamilySearch are the only ones working on it and they have not had the ability to reach out to the software vendors yet (I know I asked).

My suggestion was to publish the language (like HTML, SQL, or GEDCOM). I asked for “railroad tracks“, what we used to call finite state automata, and what Oracle uses to demonstrate SQL syntax, statements that are valid with options denoted and even APIs for embedding SQL into other programming languages. Easy to write a parser or something akin to a validator (like W3C has for HTML).

Dallan Quass  took a better tack on GEDCOM. His approach was more evolutionary, rather than revolutionary. He collected some 7,000+ gedcoms

GEDCOM Tags

and wrote an open source parser for the current GEDCOM standard (v5.5). He analyzed the flaws in the current standard and saw unused tags, tags like ALIA
that were always used wrong, custom tags and errors in applying the standard. He also pointed out that the concept of a NAME is not fully defined in the standard and so is left to developers (i.e. vendors) to implement as they want. These were the issues making gedcoms incompatible between vendors. He said his open source parser could achieve 94% round trip from one vendor to another vendor.

Now that made the GEDCOMX guys take notice — here was their possible import/conversion tool.

The users just want true portability of their own gedcoms and the ability to not have to re-enter pics, audio, movies over and over again. RootsTech’s vision of APIs that would allow the use of “authorities” to conform names, places, and sources would also help move genealogy to the utopian future Jay Verkler spoke of at the keynote. APIs would also provide bridges into the GEDCOM for chart/output tools, utilities(merge trees), Web 2.0 sharing across websites / search engines / databases (more utopian vision).

GEDCOM is the obvious path forward. Why not improve what is mostly working and focus on the end users and their needs?

FamilySearch get vendors involved and for God’s sake get Dallan Quass involved. Publish a new GEDCOM spec with RailRoad tracks (aka Graphic Syntax Diagrams) and then educate vendors and Users on the new gedcom/gedcomx.    Create a new gedcom validator and let users run their current gedcoms against it to produce new gedcoms (which should be backward compatible with old gedcom to get at least 94% compliance that Quass can already do)!

Ask users for new “segments” in the railroad tracks to get new features that real users and possibly vendors want in future gedcoms. Let there be an annual RootsTech keynote where all attendees can vote via the RootsTech app on the proposed new gedcom enhancements.

How about that FamilySearch? Is that doable? What do you my readers think? Email me (or comment below).


P.S.
       Do Not use UML models to communicate the standard. It is simply not accessible to genealogists. Trust me I am a Data Architect.

Tags: ,
August 7, 2011

#Polish, #Genealogy – The Pillars of the Eliasz Social Network of Pacanow

by C. Michael Eliasz-Solomon

tanczyk,

was very sleepy/tired when the last posting was written! As I looked at this Social Network Analysis  (SNA) that I performed and the resulting diagram from the data I realized two more things.

There were five old men, the pillars of this Social Network who were the progenitors of this data, if not literally, then at least figuratively. These august gentlemen, were Marcin Elijasz (about 1819),  Pawel  (abt. 1825) & Antoni (abt. 1830) [undoubtedly brothers] Odomski, Antoni Wojtys (abt. 1823) and Franciszek Zwolski (abt. 1823). In fact, Franciszek Zwolski & Antoni Wojtys were the witnesses at my 2great-grandfather Marcin Elijasz ‘s death in 1879. If you have one of those five men in your family tree, then welcome, for  we are surely relatives. Indeed it is true for just about everyone in the diagram.

Second, this SNA diagram – that messy scribble from my last posting, with the nodes and the connecting lines is properly viewed in two ways. First off, the SNA diagram is a road-map for reading these church records (in Pacanow and to some degree the adjoining parishes) and providing a much richer/complete context for understanding the families: Elijasz (Heliasz), Zasucha, Wojtys, Zwolski, Odomski, Siwiec, Paluch, Lewinski, Piotrowski and Major and Wlecialowski. However the SNA diagram is a bit unwieldy in being able to quickly read/find any single individual. So the Second view is that it is a database. Now Stanczyk is database architect and data analyst by trade. So I will reorganize this data from its visual representation into a more “tabular” data friendly representation that is searchable/sortable. I will also redraw the diagram and organize its visual presentation because that visual road-map is invaluable. It is easy to count the hops between nodes (people) and get a sense of connectedness or remoteness between two individuals in quick fashion.

I urge people to incur the pain of producing such a diagram and then re-viewing your church records and/or family group sheets again.  It also shows the clear import of transcribing witness names and AGEs, as well as the mother and father’s ages and the God Parents names. It is too bad that the GEDCOM, file format of our family trees,  mostly buries this info in NOTES/COMMENTS because it is hard to query/report/analyze these pieces of data that link/glue nuclear families together.

My family tree never indicated to me that it was important to take note of the ODOMSKICH. Nor really the Zwolski or Wojtys and certainly not the Zasucha. The Lewinski and Piotrowski were not even on the radar before. The SNA diagram really shows the rich/complex tapestry of the social network in Pacanow for my ancestors.

Follow

Get every new post delivered to your Inbox.

Join 416 other followers

%d bloggers like this: