Archive for ‘Technology’

March 18, 2012

Dziennik Polski Detroit Newspaper Database App Search Page

by C. Michael Eliasz-Solomon


was finally able to use his training from Steve Morse’s presentation at RootsTech 2012 to create a One-Step Search App for the Dziennik Polski Detroit Newspaper Database.

To search on 30,920 Polish Vital Record Events, just go to the new Dziennik Polski Detroit Newspaper Database App Search page (on the right, under PAGES,  for future reference).


For more background on the Dziennik Polski Detroit Newspaper click on the link.

You can search on the following fields:

Last Name – exact means the full last name exactly as you typed it. You can also select the ‘starts with’ radio button and just provide the first few starting characters. Do not use any wild card characters!

First Name - exact means the full first name exactly as you typed it. You can also select the ‘starts with’ radio button and just provide the first few starting characters. Do not use any wild card characters!

Newspaper Date - exact means that you need to enter the full date. Dates are of the format:

06/01/1924 (for June 1st, 1924). Format is MM/DD/YYYY. Leading zeros are required for a match.

You can use ‘contains’ radio button to enter a partial date. The most useful partial is just to provide the Year (YYYY). Do not use any wild card characters!

Event Type - exact means the full event type. This is not recommended. You SHOULD select the ‘starts with’ radio button and just provide the first few starting characters. Do not use any wild card characters! Uppercase is not required.

Valid Events Types: BIRTH,  CONSULAR,  DEATH,  or MARRIAGE

Indexer - exact means the full indexer exactly as you typed it. You can also select the ‘starts with’ radio button and just provide the first few starting characters. Do not use any wild card characters!

The Indexer is meant to be informational only, but you could conceivably want to search on this field too, so it is provided.

March 17, 2012

1940 US Census – 16 Days Away — #Genealogy

by C. Michael Eliasz-Solomon

Stanczyk apologizes for being away for a few days. I have spent some of that time preparing for the 1940 US Census (sans index).

So I made extensive use of Drs:  Steve Morse & Joel Weintraub 1940 Census Tool .

I created a spreadsheet. I listed the most important people I wanted to find in 1940. I used the 1930 US Census and recorded their Enumerated District (ED). This is a necessary precursor to looking up the ED’s for 1940. The only other way is to start from a street address. Now use the link to the 1940 Census Tool [see above] to convert your 1930 EDs to 1940 EDs (or your last known address to 1940 EDs).

How are you preparing? This is what I used …

March 12, 2012

ScribeFire – Blog Software / Chrome Browser Extension — #Technology, #Blog

by C. Michael Eliasz-Solomon

     Stanczyk likes genealogy and Stanczyk loves technology. Hence why I had to go to RootsTech 2012. You are reading a blog article that I have created in ScribeFire 4.1 . Actually, I have been dabbling with ScribeFire, since I saw it mentioned by WordPress.

     It gives me a greater control over my fonts — somthing I have been missing, without getting my hands dirty with CSS/Styles. I have resisted doing too much HTML coding of my blog — I just want to muse and not have to do a lot of bit-fiddling to get my thoughts down on … uh CRT glass (or whatever glass you have on your mobile device).

Besides fonts and font sizes, superscripts, subscripts, it also gives me a convenient table tool and it will also  seek out related links for my article too (using Zemanata — see below) !

Related articles, courtesy of Zemanta:

So I recommend adding this extension to your browser if you use Chrome. I have gone back forth. I write some in ScribeFire and perhaps finish the article in WordPress (or vice versa). ScribeFire and WordPress,  both play well together and I get the best of both worlds. If you need these features, then get ScribeFire from the Chrome Store today .
March 12, 2012 Fixed My GEDCOM Export !!!

by C. Michael Eliasz-Solomon

Stanczyk is happy once again !

The folks at fixed my GEDCOM Export. It was about 10-14 days, but at least the job got done and my ability to export my research is back to normal.

The timing of the infinitely spinning icon could not have been worse. I had just imported a great deal of photos and I continued to do so even with the export problem. But all is well that ends well. So I did one more export (and it worked) to get myself to a valid checkpoint of my work.

Whew! What a relief. I did not want to have to once again re-enter my multimedia. Nor was I previously aware that I would also have lost my valued contributors too. Who knows if their emails have changed since I invited them???

At any rate, thank you for fixing my GEDCOM Export!

March 10, 2012 Broken ? Is Your GEDCOM Export OK? — #Genealogy, #Technology

by C. Michael Eliasz-Solomon

Stanczyk, wants to know if anyone else is having problems Exporting their GEDCOM from

 This is what I see when I try to export my gedcom from the tree settings screen. It never gets past 0% complete.

I have tried to submit a Help Ticket for technical support and so far I have not received any response. What gives Ancestry?

I can still work on my tree and updates appear to be saved. I can synch to the Ancestry App (on the iPhone) and the changes are there too. 

March 9, 2012

WordPress Blogs Now Have Stats By Country!

by C. Michael Eliasz-Solomon

WordPress - Views By Country

Most Recent Flag Counter

   Stanczyk, for a long time has been using Flag Counter to get some idea of the access my blog has to the Old World.

The image to the far left is WordPress and is just for today (so far). The image to the near left is a cumulative count by country of Flag Counter for the last year. So I am thankful to WordPress for providing this analytic for my blog. It was always my hope to reach Poland and the other Central European nations where potential family tree members still reside. When I look at the analytics for the last year from WordPress, it seems people from about 60-70% of world’s landmass visit this blog! Come on China, you can bring that percentage up.

Thanks WordPress!

– Stańczyk kocha Polskę!

March 7, 2012

Wordless Wednesday – Diacritcals/Cyrillic Glyphs On Your iPhone? — #Mobile, #Technology

by C. Michael Eliasz-Solomon

Here’s what I do …

Diacriticals Under The E

Russian/Cyrillic Keyboard

On the left (above), press & hold down a key like: a, e, o, c, l, n, s, -or-  z, …
On the left (below) is an Internation Keyboard for Russian/Cyrillic characters …

Do you enter diacriticals in your Family Tree?


March 4, 2012

To Tweet or Not To Tweet … That is the Question

by C. Michael Eliasz-Solomon

Pardon the Bard in me. But I had to soliloquy.

To tweet or not to tweet — that is the question! Whether it is nobler to suffer the slings and arrows of outrageous Ads or by opposing end them.

Stanczyk™ knew this would happen. First the IPO then the looting of your privacy and the deluge of adverts that must accomplish a $100 Billion justification of Twitter’s existence. I was frustrated by sponsored tweets spamming my tweet stream already … it just Zucks! There are better ways to monetize the website that are far less intrusive.     Zuckerberg call me!

Now we find Twitter wants to sell our tweets too??? Isn’t that our IP (Intellectual Property)?  Do I need to put a © [copyright] on my tweets — Now I am down to 139 characters to be a content provider. I expect royalties for my copyrighted tweets — please sign the licensing agreement Mr Zuckerberg before you sell mine. This is your only legal notice! … ©™®  [Date: 3/4/12]

Let’s Go To Google+

Oh, didn’t Google just change its privacy rules? I am sure they will do NO evil. Let’s see if this jester can summarize their new privacy statement briefly …. Hmmmm …

“You have no privacy if you use any of our software.”

That pretty much summarizes  the non-Evil gobbledygook. I’m here from Google and I am here to help you.    Damn you: Larry Page, Sergey Brin, and Eric Schmidt  (Google’s founders and Executive Chairman). Now we have to dump our search history and … everything else. Did everyone dump their search history by the March 1st deadline? Do not criticize Google’s efforts or our robo spiders will leave you in the dusty cobwebs of isolated Internet ignominy  (go use Bing you miscreants).

When did Silicon Valley become such lecherous corporate Privaphiles? Is their software so bug free they can GUARANTEE nobody will be harmed by their intrusiveness; Nobody will be slammed for their Internet address; Nobody will be crossmerged incorrectly with other similar named nefarious netizens ?

Time will tell whether we live happily ever after … or not. Right now its just … “Something is rotten in the state of Denmark”.

March 3, 2012

Library of Congress – Chronicling America — #Genealogy, #Newspapers

by C. Michael Eliasz-Solomon

Stanczyk is a Library of Congress (LOC) researcher. Mostly, I have done my research in the Madison building where they keep the Newspapers / Periodicals.

Today they (LOC) sent me an email announcing another 100+ newspapers digitized with another 550,000+ new digitzed pages available via their Chronicling America – Historical Newspaper program. I have written about this worthy program before. Whether you research history or genealogy, these newspapers can be of help and providing evidence or even just adding a context to your ancestors.

Did you know that the LOC has over 220 Polish language newspapers on microfilm (and/or digitized)? To help out the Polish Genealogists, I have  compiled and published a list of the LOC’s Polish Language Newspapers:  here .

Make newspapers a part of your research to fill the gaps or to provide context!


March 3, 2012

Google’s Chrome Browser For Genealogy — #Genealogy, #Technology

by C. Michael Eliasz-Solomon

     Stanczyk was a big Mozilla/Firefox browser user. On Mac or Windows it did not matter. So it was a shock that I switched to Chrome (Google’s browser).

I did so mostly on Google’s promise that “microdata” would be another widget that would greatly enhance the search experience for genealogy data. I waiting on that feature — still am waiting.

On Tuesday I mentioned Virtual Keyboard 1.45, for entering your diacriticals through your browser into say Today, I was reading Kathy Judge Nemaric’s blog – “Dead Reckoning” [nice name for a genealogy blog] and she mentioned an extension to the Chrome Browser. It is called Ancestry Family Search Extension 2.4 .

     Open up a new Tab (Ctrl-T works) and click on Chrome Web Store. In the “Search Store” field, type in “Ancestry Family Search” and press the Enter key to bring up the extension (see on the left).

Click on the Add to Chrome button and then click on the Install button in the dialog box that pops up to confirm your wish. Once you have installed the extensions into your Chrome browser, it will show like the following screen:

Now you are ready to reap the rewards of that hard work. Go to and perhaps open up your family tree on an individual you are working on. Now your browser’s address bar has a new  “widget”. Next to the STAR widget you have been using to Bookmark pages is a new widget shaped like a TREE.

See the red circle (and arrow)? Just click on that and it will bring up a new window on top the current TAB in your browser with (in my case) Tomasz Leszczynski result set from the Family Search databases. If you click on one result, then a new TAB will open to the exact record in Family Search.

This is a very nice synergy between the two websites. So I am thinking, that if Google produces their microdata widget, that 2012 will be the year of the widget in Genealogy and perhaps the year of the CHROME browser too.

There is one microdata Schema Explorer browser extension already in the Chrome Web Store. But you will want to wait for Google’s which will use the website: . I am guessing Google will use this website to develop schemas to guide its browser.

2012 is shaping up to be a very good year for genealogy and to switch to CHROME!

March 2, 2012

Diacritical Redux – Ancestry GEDCOM — #Genealogy, #Technology

by C. Michael Eliasz-Solomon

As Stanczyk, was writing about the GEDCOM standard since #RootsTech 2012, I began to pick apart my own GEDCOM file (*.ged). I did this as I was engaged with Tamura Jones (a favorite foil to debate Genealog Technology with). During our tête-á-tête, I noticed that my GEDCOM lacked diacriticals???

What happened? At first I thought it was the software that Tamura had recommended I use, but it was not the problem of that software (PAF). So I looked at the gedcom file that I had imported and the diacriticals were missing from there meaning, my export software was the culprit.

I looked at the GEDCOM’s  HEAD tag and the CHAR sub-tag, and it said “ANSI” [no quotes] was the value. That is not even a valid possible value! According to the GEDCOM 5.5.1 standard [on page 44 of the FamilySearch PDF document]:

CHARACTER_SET:= {Size=1:8}

Who is this dastardly purveyor of substandard GEDCOM that strips out your diacriticals (that I assumed you have been working so hard to add since my aritcle on Tuesday,  “Dying For Diacriticals“)? I’ll give you a HINT, it is the #1 Genealogy Website  – Yes,  it is ANCESTRY.COM !

Now what makes this error even more dastardly is that the website shows you the diacriticals in the User Interface (UI), but when you go to export/download the diacriticals are not there in the gedcom and unless you study things closely, you may be oblivious (as Stanczyk was for a long time) that these errors have crept into your research. I also found a spurious NOTE that I cannot find anywhere on anyone in my tree — which gets attributed to my home person (uh, me). This is very alarming to me too !!!

Tim Sullivan (CEO of, I expected better of you and your website. I entrusted my family tree to you and that is what you did with my gedcom? Now I did some more investigating and I found that Ancestry does not strip ALL diacriticals. My gedcom had diacriticals in the PLAC tags and in NOTE tags. But NOT (I repeat NOT) in the NAME tags.

So Tim [pretend there is a shaky leaf here] , if you or a reputation defender or some other minion skims the Internet (for your name) here is what  I hope You/ will do:

  1. Do NOT strip diacriticals from the NAME tag !!!
  2.  Fix the Export GEDCOM to create a gedcom file with diacriticals in NAME tags
  3. Fix the Export GEDCOM to create a valid CHAR tag value: UNICODE, UTF-8, ASCII, ANSEL. I put them in my prioritized/preferred order [from left-to-right]. I hope you will not use ASCII or ANSEL.
  4. Run a GEDCOM validator against the gedcom file your Export GEDCOM software creates to download and fix the other “little things” too  (Mystery NOTEs ???).
February 29, 2012

Wordless Wednesday – What You May Want To Do on the iPhone? — #Mobile, #Technology

by C. Michael Eliasz-Solomon

Here’s an interesting App (Flipboard – don’t forget to add this blog to your Flipboard pages !!)

Also, I know there are some tech types amongst the readers and those who use Big Tech in their genealogy. Try this URL (web link) to Tech Visualizer

– Stanczyk

February 22, 2012

Wordless Wednesday – What Do You Do On Your iPhone? — #Mobile, #Technology

by C. Michael Eliasz-Solomon

Here’s what I do …

Home - 1st Screen -- most used Apps

2nd Screen - Social, Genealogy, & Informed

3rd Screen - Some Tools & Some Classes

There’s still 3 more screens and part of another that I’ll spare you from …

What do you do on your iPhone?


February 19, 2012

Meme: #RootsTech — #Genealogy, #Technology

by C. Michael Eliasz-Solomon

A while ago, Stanczyk bemoaned iOS5. Therefore, I owe it an update …

  • Portable Genealogy is sound – Ancestry App better than ever
  • The Camera App in iOS5 does have a zoom. In fact if you use the familiar “pinch-gesture” you can zoom in/out and the old zoom slider appears too. Also you can use the Volume Up button (on the side of the phone to take a picture — helpful when the camera is rotated.
  • Just having the iPhone was very useful during the #RootsTech conference as my note taking device. Until iPad2(3) arrived(s) and it has both WiFi/G3 (LTE) I would have been without blogging capabilities in the Salt Palace convention center when its WiFi would go down. I utilized the #RootsTech App (for iPhone & there was one for Android too).
  • In the library it was my digital  camera.
  • In fact the ImageToText App came in handy to OCR an image of text for me
  • I used the Ancestry App to enter the transcribed text from the microfilm images right into the evidence (note area) of the app of an indivividual and attached the iPhone picture too.
  • In one case, I was able to get an immediate shaky leaf as a result of my data entry — much to my disbelief (and it was correct). So I could do an immediate on-site analysis and do further microfilm searching as a result.
  • I used the Bump App to swap contact info with one genealogist. I cannot wait until all genealogists become mobile-enabled and lose my business cards altogether. Hint to RootsTech Vendors you should use Bumps too to collect user info. Why do I have to drop a business card into a fishbowl??? Do a BUMP,  get a chotsky (swag). Leave the fishbowl for  the Luddites.
  • Are you a Slavic (Czech, Pole, Russian, etc.) genealogist? Then you must be dying for diacriticals. You could add an international keyboard. But why? In iOS5, just press and hold down the ‘ l ‘ key and up will come a list including the slashed-l. Just slide your finger over onto the slashed-l to enter that. Likewise, for entering ‘S, E, A, Z, C, N, etc.’ too — works upper/lower case. Of course if you have German ancestors, you can get your umlauts too in the same fashion. That trick is a Latin Alphabet data entry trick (sorry Cyrillic or Hebrew readers — try the International Keyboard trick).
February 16, 2012

1940 US Census – Blank Forms — #Genealogy, #US, #Census

by C. Michael Eliasz-Solomon

Legacy Family Tree has release blank US Census Forms (page1 | page2) for the 1940 US Census. April 2nd is coming, are you prepared? Is prepared?

At #RootsTech 2012, the 3rd keynote was an Ancestry talking-head panel. They joked about whether the website could withstand the crush on April 2nd. Let’s see how this experiment goes.

This is the first US Census to be released in an all digital format.


February 16, 2012

GEDCOM “RailRoad Tracks” (aka Graphic Syntax Diagram) – #Genealogy, #Technology

by C. Michael Eliasz-Solomon

The above diagram is what Stanczyk had been jabbering about since the #RootsTech conference. Isn’t that much easier on the eyes and the grey matter than a complex UML diagram? Who even knows what a UML diagram is or if it is correct or not?

What does it say is in a GEDCOM file (ex.  Eliasz.ged)?

A HEAD tag  optionally followed by a SUBmissioN Record followed by 1 or more GEDCOM lines followed by a TRLR tag.

ex. gedcom lines  that can be “traced” along the railroad tracks at the top.

 1 SOUR Stanczyk_Software
 1 SUBM @1@
 2 VERS   5.5.1
 0 @1@ SUBM

OK Stanczyk_Software does not exist, but was made up as a fictitious valid SOURce System Identifier name. The GEDCOM file (*.ged) is a text file and you can view/edit the file with any text editor (vi | NotePad | WordPad | etc.). I do not recommend editing your gedcom outside of your family tree software, but there is certainly nothing stopping you from doing that ( DO NOT TRY THIS AT HOME). If you knew gedcom, you could correct those erroneous/buggy gedcom statements that are generated by so many programs — that cause poor Dallan Quass to ONLY acheive 94% compatibility with his GEDCOM parser.

Have you ever downloaded your gedcom from ANCESTRY and then uploaded it to RootsWeb? Then you might see all those crazy _APID  tags.   It is a custom tag (since it begins with an underscore  – GEDCOM rules dear boy/girl).   It really messed up my RootsWeb pages with gobbledygook. I finally decided to edit one gedcom and remove all of the _APID tags before I uploaded the file to RootsWeb. Aaah that is SO much better on the eyes. Oh I probably do not want to re-upload the edited gedcom into ANCESTRY, but at least my RootsWeb pages are so much better!   The _APID is just a custom tag for ANCESTRY (who knows what they do with it) so to appeal to my sense of aesthetics, I just removed them — no impact on the RootsWeb pages, other than improved readability. [If you try this, make a backup copy of the gedcom and edit the backup copy!]

Now obviously the above graphic syntax diagram is not complete. It needs to be resolved to a very low level of detail such that all valid GEDCOM lines can be traced. It also requires me/you to add in some definitional things (like exactly what is a level# — you know those numbers at the beginning of each line).

I have a somewhat mid-level  graphic syntax diagram that I generated using an Open Source (i.e. free) graphic syntax diagrammer, as I said in one my comments, I will send it to whoever asks (already sent it to Ryan Heaton & Tamura Jones). You can get a copy of Ryan Heaton’s presentation from RootsTech 2012 and compare it to his UML diagram (an object model). I think you will quickly realize that you cannot see how GEDCOM relates to the UML diagram — therefore it is difficult to ask questions or make suggestions. A skilled data architect/data modeler or a high-level object-oriented programmer could make the comparison and intuit what FamilySearch is proposing, but a genealogist without those technical skills could NOT.

I am truly asking the question, “Can a genealogist without a computer science degree or job read the above diagram?” and trace with his finger a valid path of correct GEDCOM syntax [ assuming a whole set of diagrams were published]. The idea is to see how the GEDCOM LINES (in v5.5.1 parlance FAMILY_RECORD, INDIVIDUAL_RECORD, SOURCE_RECORD, etc.) are defined and whether or not what FamilySearch is proposing something complete/usable and that advances the capabilities of the current generation of software without causing incompatibilities (ruining poor Dallan Quass’s 94% achievement). Will it finally allow us to move the images/audio/video multimedia types along with the textual portion of our family trees and keep those digital  objects connected to the correct people when moving between software programs?


GEDCOM files are like pictures of our beloved ancestors. They live on many years beyond those that created them. Let’s not lose any of them OK?

February 13, 2012

Blog Bigos …

by C. Michael Eliasz-Solomon

Stanczyk added a new Page (Tech Diary) to record my technology doings.

While doing that and reading from my blogroll (and emails), I discovered some history about the “defacto standard GEDCOM” (wiki: GEDCOM ). Now I strongly recommend you start from “defacto” link rather than the wikipedia link.

  • RootsTech 2012 – had two GEDCOM presentations by Ryan Heaton (FamilySearch, GEDCOMX project).
  • RootsTech 2012 – had one open source GEDCOM parser presentation by Dallan Quass. Dallan was quite remarkable in his efforts to achieve a 94% commonality amongst 7,000 different GEDCOM files. Dallan Quass has a GitHub project for his Open Source GEDCOM parser.
  • Modern Software Experience (Tamura Jones) had a couple articles that caused me to write this article. His most recent GEDCOM article that caught my eye was:  BetterGEDCOM (2/2/2012). I also noticed he had a GEDCOMX article from 12/12/2011. These two articles provide a good discussion. I also noticed that the BetterGEDCOM project had their own project blog. [also see his Gentle Introduction to GEDCOM  article].

I believe those provide the most recent current thoughts on GEDCOM (that I have not penned).

  • I have been studying GEDCOM v5.5 (the last GEDCOM standard).
  • I produced a partial Graphic Syntax Diagram of GEDCOM v5.5 [what I had been calling "Railroad Tracks"] just to demonstrate how I thought this diagram was a better vehicle to communicate the standard [than say UML object models].
  • I could not resist making slight tweaks to GEDCOM v5.5 even in my preliminary studies. Mostly so we could discuss GEDCOM in a readable fashion (i.e. whitespace for formatting, and comment lines ) or because the language cries out for consistency (i.e. requiring the HEAD tag to be a zero level, just like the TRLR tag).

My  Graphic Syntax Diagram of GEDCOM v5.5 was produced using an open source tool. It is partial and still high level. I did put in a construct so that you can clearly see all 128 standard tags. The Graphic Syntax Diagrammer is an excellent tool. I will have to offer the author a suggestion for the PNG images that it outputs. I need to take my diagram and manually edit it to make the drawing a better fit for 8.5″ x 11.0″ (aka A1) paper. I need to graphically wrap the railroad tracks and to add page breaks so that the image is itself usable for viewing/discussions. I will offer this sample drawing to any interested parties — including emailing the edited product to Ryan Heaton and Dallan Quass [who since they did not request it -- can feel free to ignore it].

My goal is to make minor tweaks to  GEDCOM v5.5 via this diagram [not programming] and try and get DallanQ to produce a one-off parser for it (call it, say GEDCOM 5.5.999) and hope that my tweaks will not lower Dallan’s hard work of achieving 94% compatibility. If it turns out to have virtually no effect on Dallan’s 94% compatibility in his Open Source parser, then I can think about  getting some software vendors to utilize the enhancements (via end user requests), since they are trivial, just to move the standard forward and to open an interest in the vendors to looking at how we create a new Open Standard for GEDCOM.


Thanks to Tamura Jones, I now know I need to update my diagram to GEDCOM v5.5.1 first

February 12, 2012

GEDCOM Standards – Where Genealogy Meets Technology — #Genealogy, #Technology, #Standards

by C. Michael Eliasz-Solomon

Stanczyk, has been churning since about November of last year (2011).  I have a number of ideas rummaging around my brain for genealogy apps. For over a quarter century, I have been a computer professional and used and/or developed a lot of  programs using a myriad of technologies. At my core, I am a data expert: design it, store it, query it, manage it, analyze it and protect it. It being the data.

Before going to #RootsTech 2012, I knew GEDCOM was the core of our hobby/business/research. GEDCOM is our defacto standard. It is how data in exchanged between us and our various programs. I say defacto because as a standard goes it is not a very open standard (one organization “owns”   it, and  the rest of us go along with it). It also has not changed in about decade and a half; So Ryan Heaton was correct in calling it “stale”. It does still work .. mostly. Although if a standard does not progress then you get a lot of proprietary “enhancements” that prevent the interchange of data completely — since one vendor does not know how to deal with another vendor’s file in totality.

At present, GEDCOM maxes out at version 5.5, although there are various other variations you might  see. But 5.5 was the last standard version. I counted 128 total tags and a provision for creating non-standard tags (they start with an underscore).

[Mike thanks to Tamura Jones! Even though GEDCOM v5.5.1 was never finalized, it IS the defacto max version of GEDCOM. GEDCOM v5.5.1 added 9 tags, removed the BLOB tag, so we now have a total of 136 tags.   -- I will need to update even my high level graphic syntax diagram]

Tags are like:

INDI,   FAMC,   FAMS,   SOUR,   REPO,   HEAD,   TRLR    etc.   -or-      ALIA,   ANCE

The first bunch is familiar and are probably in your family tree (if you ever exported the GEDCOM file). The ALIA tag is one that Dallan Quass said was universally used wrong by all programs. After seeing its definition, I can see how it  is confusing.  As for the ANCE, tag I do not recall seeing any program letting me do any functionality that might utilize this tag. This tag is probably one of those tags that Dallan said is not used at all.

I looked at the “MULTIMEDIA” section of the standard. It looks like it is woefully out of date and probably not used at all (at least not in any standard way), which is probably why our pics, audio, and video (or any other media file like PDF, MS Word) do not move with the GEDCOM. Has any program ever used the ENCODING/DECODING of a multimedia file? The standard seems to imply a buffer of only 32K (for a line) and even if you used a large number of  CONC tags strung one after another you need 100 lines to store a 3.2MB file in-line in the GEDCOM. I do not think I have seen that in a GEDCOM. They probably stored these binary large objects (BLOBs) outside the gedcom and refer to their path on the computer/network.  I did some noodling. I have 890 MB (or approximately  890,000 KB) in pictures and scanned source documents for about 1,000 people in my family tree. So I use nearly a gigabyte (1GB) for my family tree and all other multimedia — and I do not have any audio or video!  So I use almost 1MB/person.

If we did have this magical new GEDCOM standard that could carry all of our multimedia from one GEDCOM program to another GEDCOM program, the copying would take a long time. If I uploaded/download it to/from the Internet, I might incur an overage on my ISP’s usage charges, if this were technically feasible!   Imagine if I did this multiple times a month (as I got updates). I am beginning to understand why no vendor has tackled the problem. I would also like to store PDFs and other documents besides GIF/JPG/PNG which can be displayed on the Internet web pages natively in a browser. Those are not a part of the existing GEDCOM standard. Let me sling some jargon — I’d want to store any file type that there is a MIME type definition for,  that I can currently embed in emails,  or utilize in Java programs or that the HTML5 standard will allow for multimedia.

The GEDCOM 5.5 was in its infancy on dealing with character sets. It was predominantly ASCII with some funky ANSEL coding of characters to handle latin alphabet diacriticals, although it is not clear how I would do the data entry for those and it looks incomplete. It did mention UNICODE, but only cursory and just to remind us that the lengths in the GEDCOM standard were in  ‘characters’ not bytes –which was correct. Although those multibyte characters (say in Hebrew, Russian or Japanese or Chinese) would quickly use up the 32K byte line buffer  limit, which would effectively become about 8K characters per line. In fact, GEDCOM 5.5 says it will only deal with LATIN alphabets and leave Cyrillic, Hebrew and Kanji for some far flung future. Stanczyk  is Slavic, I need UNICODE to represent my ancestor’s names and places. Fortunately, I do not feel the need for Cyrillic (Russian, Ukrainian, Belorussian, Macedonian, etc.) or I’d be out of luck. I’ll just use the Polish version of those names in their ‘Latinized’ forms.

Oh that is another area the standard needs to be enhanced. NAMES. Dallan mentioned that Personal Names do not get a thorough treatment in the standard (I am refusing to read the data model and I am a Data Architect). Location Names get almost no treatment — they do give you a place to store your locations  (PLAC tag). What language should I use, after all my ancestors are from POLAND for God’s sake. Besides the obvious Polish, I have German, Russian and Latin to deal with and being American I prefer English. Slavic names often do not translate well. For example Wladyslaw is Ladislaus in Latin, but in English there is no equivalent — maybe that is why my ancestors use ‘Walter’ instead. But the point is, how should I store the name? Can I store all of the equivalents and search on any of them? Nope.

Damn, Russian is Cyrillic.  GEDCOM doesn’t deal with non Latin alphabets;  And even though I can read the Russian genealogy records, I ‘d rather not nor would I want to try and do data entry that way either. Besides, the communists reformed the language in 1918 (making War & Peace considerably shorter in Russian); That reform eliminated several characters. Most modern software is not aware of the eliminated characters  much less able to generate them. This whole Language/Unicode/Name thing is complicated and I have not even mentioned the changing borders or the renaming of cities in different languages or over time or their changing jurisdictions. I cannot fault GEDCOM for all of these woes. I have them in my own research and I have not yet found any satisfying way to  handle them. I find it helps to have a very good memory and keep these things in my head — but there is no backup for that.

How are we ever going to arrive at the vision Jay Verkler put forth at #RootsTech?  GEDCOM needs to become an open standard. Once it is standardized again, then it needs to become modern again and deal with the current technology, so we can get around to the tough problems of conforming: names, places, sources/repositories, calendars/dates  and doing complex analyses like Social Network Analysis as a way to gather wayward ancestors into a family for which we lack documentation to prove (Genealogically). I hope the future includes Bieder-Morse phonetic matching and can deal with folding diacritical characters into a base character (ex.  change ę into e) for searches.

FamilySearch, if you are going to register GEDCOM tools, then please do a few more things for the NEW standard. First, make each vendor add to an APPENDIX the name and complete definition of their NON-STANDARD tags, in case anyone else wishes to implement or deal with them. Put a section in the header (HEAD tag) that lists all NON-STANDARD tags (just once each) along with its vendor so that someone else can go look at the standard and see what these tags mean and possibly implement the good ones. Forget that two byte thing before the HEAD tag. Just make the HEAD tag ‘s  CHAR sub-tag indicate the character set (ANSI | ANSEL | UNICODE ).  Please administer a #RootsTech keynote to vote on annual changes to the GEDCOM standard. Provide a GEDCOM validator and also a GEDCOM converter webpage to allow users/vendors to validate/convert their gedcom file(s).

Make multimedia be meta-data and allow users to define “LOCATIONS” where multimedia files can be found using either a PATH or a URL (or a relative path / URL). Make it a part of the standard that the meta-data must move, but the multimedia files can optionally stay put. Multimedia should be able to be placed on a LOCAL/NETWORK, or on the INTERNET or on a multimedia  removable volume(s) [thumb drives, CDs, DVDs, etc.]. Make the multimedia “LOCATIONS” editable so a user can switch between LOCAL/NETWORK, INTERNET, or REMOVABLE including using some of each type of LOCATION. Allows these files to exist or not (show “UNAVAILABLE” or some equivalent visual clue, if accessed and they do not exist).  The mapping between an Individual (INDI) or a family (FAM) or some other future GROUP and its multimedia file(s) must move as a part of the meta-data (even if the multimedia file(s) do not). That way the end-user need only edit his LOCATIONS meta-data (and ensure the files are in that/those location(s)) when he runs the software.

Define an API for GEDCOM plug-ins so that new software can access the GEDCOM without parsing the gedcom file. The API should give the external plug-in a wrapped interface to the underlying data model without having to know the data model, just the individual, family, or location, or a name list of individuals, families, or locations. This will allow new software to provide additional functionality to a family tree or to provide inter-operability between trees/websites. Obviously security/privacy rules would limit this kind of  plug-in access.

That’s Stanczyk’s vision of the GEDCOM future!


Get every new post delivered to your Inbox.

Join 368 other followers

%d bloggers like this: