Posts tagged ‘Opinion’

February 26, 2012

Responses – Exploring Gedcom — #Technology, #Genealogy

by C. Michael Eliasz-Solomon

Mail Room

The mailroom received three emails / comments from the “Exploring Gedcom” article.

Tamura Jones (Modern Software Experience), Louis Kessler (BeholdGenealogy.com),  and  Stan Mitchell (GenApps.net / ezGED Viewer).

Good solid GEDCOM experts all (unlike this jester who is only a journeyman apprentice to these fine men) and as you can see they are also bloggers themselves.

Here’s my summation:

  1.  WHITESPACE – All three disputed the whitespace proposal. Even though it was an optional feature accessed via on/off check-box — I yield to what I see as rising tide that I cannot swim against. I assume that XML will also be treated as badly by all for EXACTLY the same reasons — too verbose and makes for a poor data transfer mechanism because of  the bloat.
  2. UNICODE – Tamura pointed out that it was in the 5.5.1 standard. I said maybe so, but hardly implemented, needs to be mandatory. I also hate the two-hexabyte binary debris that makes an otherwise TEXT file into a partial binary file. Tamura points out that this byte-order indicator is commonly hidden (I am old school and use vi — nothing hidden) by PC editors. Besides, the HEAD tag CHAR sub-tag could be used to determine character set and keep file textual. Tamura said that would be a catch-22 (since you do not know the files encoding). Tamura points out that everyone does as I suppose and use ASCII (or UTF-16LE or UTF-16BE) to determine encoding.  Documentation needs to be updated too. Really almost no support for generating UNICODE chars in app — should be required, otherwise data entry is limited to clever users (i.e. tech-types or their friends).
  3. DATES – nobody liked DATES (or NAMES) as a zero level. I can live without dates, as I can always create a dimension with every possible date to slice & dice my genealogy facts. Names was also not part of my vision, other than I want a bunch of AKA names for an INDI.
  4. LOCN – Everyone agreed, the PLAC tag did provide a minimalist capability. 2/3 saw a good reason to have LOCN at the zero level, as I proposed. Let’s hope this feature gets in.  We may need to keep PLAC tag for backwards compatibility until all gedcoms have been converted.
  5. EVENTS as a zero level tag seemed to interest people, as MULTI-PERSON events (aka EVENT_TYPE_FAMILY) is NOT adequately dealt with in GEDCOM 5.5.1. I also think people want to standardize these events as much as possible and leave open the ability for a user to add their own events. EVENTS was also related to GROUPS which people seem to want in some fashion. The need to analyze a social network needs to have some better GROUP/EVENT/ROLE visibility that the current standard provides. I think we really we need EVENT and EVENT_TYPE tags to keep from adding a new GEDCOM tag every time someone says we need a new event (BIRT | CHR | BAPM | BARM | BASM | BLES | SLGC). All TYPE tags should be from a standard list that a user can add to. The list should be allowed to be localized (into a native langauge). This keeps parsing to a minimal while allowing for expansion. OLD tags are kept for backwards compatibility until a gedcom is upgraded to a later version. I think we also need a ROLE_TYPE to replace ROLE_IN_EVENT and add more standard roles, (i.e. GodMother, GodFather, Witness, Neighbor, MidWife, Rabbi, Brother, Sister, Aunt, Uncle, Cousin, Border, etc.) and this should also be localized and user upgradeable. Keep  EVENT_TYPE_FAMILY and EVENT_TYPE_INDIVIDUAL for backwards compatibility.
  6. DOCUMENTS (deferred) – The case was not made nor was the concept adequately explained. To Stanczyk this is related to multimedia and is for making it possible to locate all documents of a certain type (i.e.  Declaration of Intent) with its date and location and a locator to the multimedia  representation and pull these documents out in their own right regardless of the individual or family or group.
  7. GROUPS (deferred) – Some interest in defining a non-family group (ex. military unit, college fraternity/sororiety, religious society, etc.). These groups would be interesting in their own right to study. At RootsTech 2012, this seemed to be a novel idea that had a positive feel to the audience.

GROUPS is not intended for a non-traditional family unit which needs some thought and design in this Modern Family World.

Tamura chided me that many of my “wants” are a part of an offshoot of GEDCOM called GEDCOM 5.5EL (Extended Location GEDCOM derived from version 5.5). The only difference is  I want to get rid of the need for _LOC (a custom tag under GEDCOM definition) and use LOCN instead. Also I would want their undefined tag called NAMC (possibly renamed NAMA for NAMe Alias or NAMe Altertnate) be 0:M; meaning that you can have zero, one or many alternate/alias names for this LOCN (or INDI why not?).

Also the NAMC (or NAMA) should have a subtype FONE and FONETYPE (soundex, DM, Bieder-Morse, etc.) to aid in advanced searches or Google searches. But this is the argument for NAMES at zero level. The last names are usually where the  soundex/phonetic matching need to be stored. We do not need to repeat this data for each individual (INDI) just for each unique last name or alternate name. These things get created in Surname Index pages – how much easier if the NAMES (re surnames and alternate surname) had a zero level with the FONE, FONETYPE and the INDI had an XREF pointing to each NAME/NAME ALTERNATE s/he had used during their life. One might even hope that the zero level name had an XREF to each INDI too. If I were the Data Architect for GEDCOM, I have a zero level NAMES (for SURNAMES) and their soundex/phonetic codes, plus XREFs back and forth to INDI.

I need to cut this response short, but a great thanks to all who read the article and the above three for improving my thoughts by their comments/emails.

P.S. – If you follow the GEDCM 5.5EL link, you will notice they show their gedcom indented and then go on to say about the whitespace:

” improves readability … should (and will) not be performed at real Gedcom-output.”

My sentiments exactly. Sometimes you just need to show the gedcom (and indentation improves the presentation and understanding). I too never intended it to be processed into the gedcom output/exported to a file for transport (just for my own personal examination or writing purposes). However, I have capitulated on whitespace — so please no more email on whitespace and bloat.

Follow

Get every new post delivered to your Inbox.

Join 430 other followers

%d bloggers like this: