Archive for ‘GEDCOM’

April 1, 2013

World Backup Day

by C. Michael Eliasz-Solomon

1 April 2013 – Dateline Philadelphia

Yes, this jester knows its April Fool’s day; But who better than a jester to speak truth to the people (uh … genealogists, librarians, archivists, & researchers) on this day? The first of April has become the impetus for backup and preservation.

20130401-051142.jpg

You need only look at today’s world of crazy dictators or Mali terrorists to see that cultural/historical artifacts can disappear in an instant. Cyberwarfare can claim your harddisk. The cloud could crash or hurricane Sandy can happen (please donate to Ellis island Foundation to help in that restoration effort). Libraries and Archives need to safeguard your artifacts too! Are you motivated yet? Good!

There are backup solutions, including some free options to the “cloud”. Apple even provides a free 5GB iCloud. So save your GEDCOM file. If you still have free space then backup pictures or scans that are CRITICAL. You can save/backup to media: CDs, USB thumb drives, etc. But be aware that backup to electronic media needs to be refreshed yearly to avoid stranding your backups on outmoded technology (i.e. 8Track tapes or even floppy disks).

Be careful out there and have a Happy April Fool’s Day!

March 9, 2013

Niagara Falls Gazette – 1937 — #Genealogy, #Newspaper

by C. Michael Eliasz-Solomon

Today’s blog is an intersection of some prior Social Network Analysis (aka Cluster Genealogy) and  EOGN‘s mention of FultonHistory.com (the website of Historic Newspapers). Stanczyk, waaay back discovered FultonHistory.com – An Historical Newspaper  (mostly NY) website. I was not aware that the owner (Tom Tryniski) was still adding content and that the content had grown to about 21.8 million pages, rivaling the Library of Congress’s efforts of digitized newspapers.  Each scan is a single page PDF document that is zoomable.

19370119_ZasuchaAndrew_deathNoticeSo  the idea presented itself, why not see if any ZASUCHA in Niagara Falls can be located in those 21.8 million scanned pages. I am happy to report a very good success. Take a look at the image. It is from Tuesday, January 19th, 1937 edition of the Niagara Falls Gazette. [You will need to click to read death notices - Jacobs, Geraud, Kochan, Laydon, Mahoney, Morrison and ZASUCHA].

Now I said this was a part of a long standing (i.e. “incomplete”) SNA project of mine. I am trying to do ELIASZ/ELIJASZ research by analyzing the affiliated families in the ELIASZ Social Network in Biechow/Pacanow (Poland) and Detroit/Toledo/Cleveland/Buffalo/Niagara Falls/Syracuse (USA).  My thesis is that all of these people are closely inter-related from Poland and they continued/extended their villages in the USA.

So by following these “genetic markers” (literally) of my family tree, the affiliated families, that I would be led to new facts about my direct lineage and possibly artifacts (pictures, etc.) of my ancestors. I was also hoping to lure my distant 2nd/3rd/4th cousins to me via this blog and my research in hopes of a second bump beyond my circumstantial info of the SNA. You see they would see their family names and realize the connection and we would be able to do that genealogy swapping of intelligence and/or pictures and documents.

First, an aside [skip ahead to next paragraph if you are not a ZASUCHA], the death notice transcription:

ZASUCHA – Died in Mount St Mary’s hospital, January 19, 1937, Andrew Zasucha, beloved husband of Catherine, father of Helen and Joseph, son of Martin in Poland; brother of Roman of this city. Funeral services at 9:30 Thursday, January 21, from his home, 423 Eighteenth street and 10 o’clock in Holy Trinity church. Burial at Holy Trinity cemetery.

That is some excellent genealogy info there for Andrew Zasucha of Niagara Falls who was born in Pacanow, [old wojewodztwo Kielce], Poland !

Now I am spending many hours in Ancestry/Ellis Island ship manifests, Ancestry city directories, censuses, WWI draft registrations,  etc. and now historic NY newspaper scans. I am matching people up (my nodes in the picture) and drawing lines connecting the people(nodes) to other people. I have to take some care to get the nodes right in order to draw inferences, so I tend to a conservative approach of keeping nodes separate until I have a high degree of certainty they are the same node. I use spreadsheets to collect a timeline of data and then match up people before drawing the picture. This is my SNA methodology.

I did this current project because I noticed that my grand-aunt Mary arrived to my grand-uncle John Eliasz and were in Niagara Falls (not Buffalo/Depew like most and not Detroit). I was always puzzled about why Niagara Falls. Who or What drew them there (Niagara Falls) before their sojourn to Detroit? Now grand-aunt Mary came from Ksiaznice in Pacanow parish from her brother-in-law Jan Leszczynski to her brother Jan Eliasz in Niagara Falls in 1910. All of these facts matched my family tree (except for the Niagara Falls which nobody alive had any memory of anyone living there). None the less, I slavishly recorded the address: 235 11th Street, Niagara Falls, NY.

Now let me digress. This is why I want the PLAC tag in GEDCOM to be elevated to a Level 1 tag. I want to do these analyses in my family tree. I want to find people who shared the same/similar places for family events and see if there is any connection that I am not aware of — i.e. SNA (aka Cluster Genealogy). I need it in the genealogy file and I need reports to allow me to search on place and to conform these places into a hiearchy for analysis.

Fortunately, Stanczyk still has a good memory. I was gathering data about: Zasucha, Zdziebko, Zwolski, Hajek, Leszczynski, Eliasz/Elijasz, etc. These are all families found in Pacanow parish who came to the USA and settled in: Buffalo/Depew, Niagara Falls, Syracuse, some moving onward to Cleveland, Toledo and my grandparents moving onward further from Toledo to Detroit. When I was recording addresses from the city directories, I noticed a few Zasucha being at the 235 11th street address. That address rang a bell in my memory and I went back through my family’s ship manifests to see who had been at that same address. That is when I saw that my grand-aunt and my grand-uncle had been there. So now I had a thesis that any ZASUCHA at 235 11th street the surrounding environs, would close family to my grand-aunt/grand-uncle and be direct ancestors of ANNA ZASUCHA, my great-great-grandmother, wife of MARTIN ELIASZ of Pacanow. In fact, I am pretty certain now that I have gotten this far in my SNA, that ANNA ZASUCHA had a brother(s) who had sons:   Martin,   Adam,    Josef,    Jan.  These four men had children as follows who came to Niagara Falls:

Martin (father of Andrew in the above death notice) – Andrew(the deceased), Roman, and Jan

Adam – Michal, John, Karol, Marya, and Feliks

Josef – Benedykt (son of Josef), Feliks (a 2nd much-younger Feliks, son of Benedykt)

Jan – Roman (a 2nd Roman), Teofil, Josef, and Pawel

Now the ones of greatest interest to me are the children of Adam. This is because Karol and his brother Feliks lived at 235 11th street, the same address that my ELIASZ ancestors had lived at, in the same year! That shows a pretty strong family connection in my family tree (I cannot say for your tree or not) whenever I find it happening. Of course, the other ZASUCHA of Niagara Falls are also of some interest to me as they ALL came from Pacanow. I can be pretty sure that everyone from Pacanow (or Biechow) parish is likely to share a distant (non-linear) family relationship as determined by connecting family trees.

So I owe some thanks to FultonHistory.com - An Historical Newspaper  (mostly NY) website and its creator  Tom Tryniski. Tom’s efforts have provided my the above death notice. I also found an Emil C. Mrozek (a physician) from Erie County, NY and his exploits of winning a bronze star in WWII. I also found an article of a Richard (aka Ryszard) Kryszewski who died tragically at the age of 18 in a car-train crash in Depew, NY. Now I had Richard’s cause of death from the newspaper article. So some articles are uplifting and some are tragic, but I collect them all for my ancestors.

Some people mock my genealogical research as chasing down dead people. My wife, Teréza, takes the learned Jewish position that I am doing a good deed (mitzvah) in keeping these ancestral memories alive. Tereza likes to call me the “Soul Keeper”. This blog of my musings is filled with my genealogical / family stories. Besides being a “cousin magnet”, this blog is my effort to record these stories.

 

PLACes: Biechow, Pacanow [in Poland],  Detroit, Toledo, Cleveland, Buffalo/Depew, Niagara Falls, Syracuse

NAMEs: ELIASZ/Elijasz, Kedzierski/Kendzierski, Leszczynski, Sobieszczanski, Fras(s), Mylek, Hajek, Mrozek, Kryszewski

March 1, 2013

Thinking About @Ancestrydotcom ‘s GEDCOM — #Genealogy, #GEDCOM

by C. Michael Eliasz-Solomon

GorillaFamilyTreeAncestry.com (Twitter: @Ancestrydotcom ) is the proverbial 800 lb (362.87 kg) gorilla in the genealogical archive. You cannot miss him — mostly he’s lovable. So today after you read this blog post, Stanczyk wants you to tweet at him (see Twitter link above). I am hoping the big ape will make some improvements to their software. Hint .. Hint !

A couple of days ago (25-Feb-2013), I ran my PERL program against the GEDCOM file I exported from my family tree on Ancestry.com ‘s  website. That tree, the RootsWeb tree, and this blog are Stanczyk’s main tools for collaboration with near and distant cousin-genealogists (2nd cousins, 3rd, 4th, 5th cousins — all are welcome).

Quick Facts —

  1. No invalid tags  - Good
  2. Five custom tags – Also Good
  3. CHAR tag misused – ANSI [not good]
  4. My Ancestry Family Tree uses diacriticals: ą ć ę ł ń ó ś ź ż   in proper nouns [not good]
  5. Phantom Notes ??? [really not good]

So, Mr. Ancestry (sir) can you please fix #’s 3, 4, and 5, please?

CHAR -  I think Ancestry should use what is in the standards: ANSEL | UTF-8 | UNICODE | ASCII . I think this is easily do-able (even if all you do is just substitute ASCII).

This is not a picayune, nit-picky, persnickety, or snarky complaint. In fact, it leads right into the next problem (#4 above). Not only does Ancestry export the GEDCOM file as “ANSI”, it strips out my diacriticals too (as a result?). So now I have potentially lost valuable information from my research. For Slavic researchers, these diacriticals can be vital to finding an ancestor as they guide how original name was pronounced and how it might have been misspelled or mistranscribed in the many databases. Without the diacriticals that vital link is lost.

The last criticism is an insidious problem. Every time I exported the GEDCOM, I would get a note on one person in the tree. I would carefully craft the note on Ancestry, but what I received in the GEDCOM file downloaded would be different ???

I reported the problem to no avail and no response. This is not very good for an 800 lb gorilla.

Digging Deeper

I have since gone on to do some experiments and the results may astound you (or not). I copied the NOTE I was getting in my GEDCOM and saved it off to a text file, perplexed as to where it came from, since it was not the NOTE I was editing on Ancestry??? Now I did something bold. I deleted the note from that person on Ancestry and then downloaded the GEDCOM file again. Do you what I got? Wrong! I did not get my carefully crafted NOTE, I got yet another NOTE. I copied that note’s text and repeated my process of deleting the note and downloading the GEDCOM file a 3rd time. This time when I edited my GEDCOM file, I found MY note!!! But where/how did the other two notes come about? Why were there three notes? Why could I see and edit the 3rd note, but only get the first note when I downloaded the GEDCOM file? How did notes 2 & 3 get there? Why did I not get all three notes when I downloaded the GEDCOM? All good questions that I have no answer to. My suspicion is that Ancestry should not allow more than one EDITOR on a tree, other contributors should only be allowed to comment or maybe provide an ability to leave sticky-notes on a person [that does not go into a GEDCOM file]. I do not think the notes were created by their mobile app since I always saw my NOTE (and not the other two notes). I am chalking this up to an Ancestry.com bug and urging others who see strange things in their notes to take deliberate steps to unravel their notes. I hope Ancestry will fix this and let people know. I hope they fix all of items #’s: 3, 4, and 5.

So, my dear readers, I am asking you to tweet to Ancestry (as I will too) and  ask them for bug fixes. Perhaps if enough people tweet at @Ancestrydotcom, they will respond and not give us the cold  gorilla shoulder.

February 25, 2013

Thinking About Gedcom — #Meme, #Genealogy, #RootsTech

by C. Michael Eliasz-Solomon

Stanczyk has been thinking about GEDCOM a lot these days. As you may know, GEDCOM is the de facto standard format for a genealogical family tree file, in order for it to be shared amongst the many genealogical software programs / websites / apps. Most genealogy programs still use their own proprietary format for storing data but will import / export the data in the GEDCOM standard for you to exchange data with another program or genealogist.

Did you catch the phrase ‘de facto standard’ ? OK it is NOT an open standard maintained by ISO or ANSI standards organizations. But it is widely supported and in fact you should NOT buy or use software that does not support the export and import of GEDCOM files!

Well we are coming up on RootsTech 2013 and my mind is turning back to the technical part of genealogy again!

Today’s blog is about the GEDCOM used by Ancestry.com. Were you aware that you can export your family tree from Ancestry.com? You can by selecting/clicking on ‘Tree Settings‘ under the ‘Tree pages‘ drop down menu (Tree Settings will be the second from the bottom in the menu list). If you click on ‘Tree Settings’ you will see a screen similar to:

ANCESTRY_TreeSettings

Notice that after you click on the ‘Export tree‘ button, that you get a new button named, ‘Download your GEDCOM file‘  in that same place.

In all likelihood if you click on the  ‘Download your GEDCOM file‘ button you will get a file in your Downloads directory on your local hard drive. It will have a name of:

<your-family-tree-name>.GED

Now the phrase ‘<your-family-tree-name>’  will actually be something like ‘Eliasz Family Tree.GED’ . So your Downloads directory will have a similar named file (complete with blanks in the file name). The size of the file will be dependent on how many individuals, families, sources, etc. that you have recorded in your family tree. Figure on a file size of 2MB for about 1,100 people.

Now this file you just downloaded from Ancestry.com is really just a plain text file with a set of standardized ‘tags’ defined by the GEDCOM standard. Software vendors are free to define their own custom tags too. Although CUSTOM tags must begin with an underscore (‘_’). I was curious as to how well Ancestry.com implements/adheres to the GEDCOM standard, so I wrote a little program (in PERL for you programmer types) to analyze my GEDCOM file that I just downloaded.

ReadGedcom_ANCESTRY

My program, read_gedcom.pl, spits out a slew stats about the GEDCOM including the tags used. As you may be able to see from the screenshot, there sorted at the end were 5 custom tags:

_APID,  _FREL,  _MILT,  _MREL,  _ORIG

These names do not have any meaning except to Ancestry.com and their website’s program(s). What you also see are that in 48,538 lines (in the GEDCOM file downloaded), that 5,158 lines have one of these five custom tags. Normally, I will just ignore these tags and import the GEDCOM file into my laptop’s genealogy software (REUNION, RootsMagic, PAF, etc.) and let that software ignore these non-understandable tags and within seconds I have my Ancestry.com family tree imported in to my computer’s genealogy software. That is fine  – no problems.

But what do you think happens you if turn right around and upload that GEDCOM file into your RootsWeb family tree? If you use RootsWeb, then you know you get a LOT of _APID notes across all of your ancestors and sometimes, if you have many facts/citations for any ancestor, then the RootsWeb page for him/her will be horribly marred by all of these _APID tags!

TIP

Remember I said the GEDCOM file is a TEXT file. As such it can be edited by whatever your favorite text editor that you use. If your editor does global search/replace, then you can easily remove these CUSTOM tags (_APID, etc.). That will make your RootsWeb family tree individual pages look MUCH better.

Now I know what you are thinking. Do NOT go editing your GEDCOM file!  I agree.  Make a copy of your GEDCOM file and edit the copy of the downloaded GEDCOM file to remove the lines with ‘_APID’ on them. You can remove all custom tags, but I just bother with the _APID which are so irksome. If your editor can remove the lines with ‘_APID’ then that is what you should do. But if all your editor can do is replace the lines that have _APID on them with a blank line then that is OK too. Make those edits and save the edited (copy) file.  The blank lines seem to be ignored by RootsWeb – thank goodness.

Now you can upload the edited file, with the _APID custom tags removed to RootsWeb and your family tree will again look the way it used to before,  without these irksome custom tags.

Next time I will tell you what I found when I looked closely at what ANCESTRY.com was putting into the downloaded GEDCOM file.

January 26, 2013

RootsMagic iPhone/iPad App — #Genealogy #Software #Review, #RootsTech

by C. Michael Eliasz-Solomon

RootsMagicAppThis jester has been a big proponent of Ubiquitous Genealogy – i.e. genealogy is portable and everywhere. I have used the Ancestry App for a long while and am well satisfied. They use a concept of synching the App with  your tree and their website. Now that the kinks are worked out, I am well satisfied. There are also MyHeritage and Heredis  Apps too. These do not synch over the “air”. You need to use your iTunes application on your desktop/laptop to move files into the App’s “sandbox” via synching your iPhone/iPad with the laptop/desktop over the iPhone/iPad cable. Tethered synching is ok but a hindrance.

Ok so the new App on the block is an offering called RootsMagic.   Stanczyk likes the Roots Magic laptop application as a full blown offering for working on your genealogy and documenting the tree and finding data on the Internet and keeping track of to-do lists, publishing your tree on CD/Web and all sorts of work that you do when your research spans years (or decades) – does anyone ever finish their genealogy? Its modern and uses Universal Character sets (so us Slavic Researchers can use our slashed Ł’s  or Cyrillic  Я’s) and other features that the Internet Genealogists have grown up with.  So I was hopeful when I received an email from Roots Magic touting their iOS offerings – Its free!

The App starts with the familiar Roots Magic splash screen that you may have grown accustomed to from the laptop application. You are then presented with a list of files from their sandbox (ugh, tethered synching). Once you select a file to work with, your family tree is presented in a Pedigree form (with three generations visible on iPhone/ four generations on iPad). At the top left is a green/white button with three lines (see image)  that will allow you to pick a particular person with whom you wish to work upon. At the bottom of the screen are four buttons:

Files,   Views,   Lists,   Tools

Files – Lets you select the family tree file you want to on from your Device or from DropBox (a cloud-based file storage service). It also has HELP (files??) which tell you how to use your Device or DropBox to get a file loaded into the App. Sadly,  the RootsMagic app does not read standard gedcom (ged)  files. It only reads files with rmgc extension (i.e. created by Roots Magic laptop application). However, it does load their database extremely fast from those rmgc files.

Views – Lets you choose to view the data in a PEDIGREE tree  or a FAMILY tree or in a DESCENDENTS outline  or in the detailed FACTS (events), NOTE, direct family members of the current INDIVIDUAL. I prefer working in FAMILY (as seen in image) view mode, then switching to INDIVIDUAL view mode for any details on that person. Clicking on NOTE really gives you access to NOTE(S), SOURCE(S), and MEDIA for that individual (and a BACK button at the top to return to INDIVIDUAL view mode).

Lists – This just gives a list of your: Sources, To-Dos, Research, Media, Addresses, Repositories, Correspondences, and PLACES. I liked places (which showed that this jester really needs to make his Places (Locations) conform to some kind of standard).

Tools – Date Calculator, Relationship Calculator, Soundex Calculator, and Calendar. Unimpressive to say the least. Lest you get your hope up, the Calendar tool only displays the Calendar for a Month/Year of your choice [I did not verify the Julian/Gregorian boundary to see if it calculates a proper month calendar for dates before 1582]. It was not worth the effort as I did not see why I would want to see what day  June 3rd, 1700 would fall on (Thursday) if you are eyes are young or your glasses are a good prescription to read the day name. Otherwise, you not notice the day names on an iPhone  [perhaps a black font, instead of gray, would give better contrast]. The Soundex is only American Soundex – why not Daitsch-Mokotoff or Bieder-Morse codes too? Really, we Slavic researchers get short shrift in the software world.  Never fear, just create a desktop icon of Steve Morse’s Soundex page to see all three Soundex/Pattern Matching methods for your family names.

iPad vs iPhone

For some reason the iPad interface treats the buttons (Files, Views, Lists, Tools) differently on the iPad. That was a bit confusing until I got used to the difference. Rotating the iPad to landscape, also brings the Surnames/Search view along side whatever view you are in. The Calendar is a bit easier to see on the iPad, but I’d still like to see the day name text in black (or at least a MUCH darker gray).

UbiquitousGenealogyThe app is strictly for viewing your family tree (et. al.). There are no tools for modifying the tree for re-import into the desktop application. Shortcomings aside, it is still a very good first effort by Roots Magic. If you have the Roots Magic laptop application, then download the free app for your smartphone or tablet and go Ubiquitous. If you do not have the Roots Magic laptop application and do not have a way to get your ged converted to Roots Magic format (rmgc) then do NOT bother to download the app – you will not be able to use it.

P.S.  Do you spell  “DESCENDENT”  – as  D-E-S-C-E-N-D-E-N-T   or    D-E-S-C-E-N-D-A-N-T ?  Both spellings are correct, but I guess I use “DESCENDANT” all of the time and so the Roots Magic use startled me.

I also would love to see the REUNION app (the Mac Software vendor) make their iOS App free or low-cost – then I’d review it here too. I am a BIG Apple eco-system fan and as such have used REUNION Mac software for a very LONG time. I would be remiss not to mention that REUNION App does exist, but its cost is a bit steep relative to the other iPhone genealogy APPs in this article.

September 6, 2012

Fras | Frass | Frasowa | Frasskosz — #Genealogy, #Cousin, #NewLineOfResearch

by C. Michael Eliasz-Solomon

A week or two ago, Stanczyk got a bolt out of the blue. It was another genealogist; She was inquiring after my Leszczynski lineage — specifically Agnieszka Leszczynski.

Well a long time ago I got used to the fact that there were so MANY Leszczynskich out there that the possibility that any were directly related was infinitesimally small. Now to be sure a few second cousins have re-connected and it was good to get updates on the American branches. But in my 17 years as a genealogist — I had not received an inquiry on the line of Leszczynskich from my great-grandfather, Tomasz Leszczynski’s first wife or their children.

Old Tomasz lived a long time … to be 104 years of age from about 1835 to 1939 (give or take). He had two wives and bless his heart he had 14 children by them. From his first wife, he started to have children in 1860. Agnieszka (or Agnes as the inquiry was for) was born 9th December 1866. I had her birth record from the church in that lovely Latin Box format and I had deciphered all that was written. But I had no idea if Agnes made it to adulthood or married or even when she died.

Well this genealogist said her-great-grandfather had a mother named Agnes Leszczynski (from his death certificate). Yes, I said, but there are so many Leszczynski families, where was your great-grandfather from. She had a vague idea of the area and the names seemed to be close to a village that I had ancestors from but it was horribly misspelled if it was from that area at all. I was still skeptical, but she sent me an Ellis Island ship manifest (actually a tiny bit of transcription from one). So I thought I would go take a look and see if I could decipher where her ancestor was from — it would be an RAOGK. I was going to help her out.

Well imagine my surprise! Her great-grandfather was from an ancestral village of mine, coming from his father Wladyslaw Fras in Piesciec [sic  -> Piestrzec, today; Piersiec back then, although I had seen it spelled Piersciec many times too]. Now I had never seen any Fras before in those villages, maybe some Franc (Frąc) which was close. But then I went to page two of the ship manifest and he was going to Depew, NY to his uncle, Teofil Lezczynski!!! That was my grand-uncle. OK, I was now getting interested in Jozef Fras.

Now, I had to do some research, but I found him with his family in Toledo, Ohio. Well I had some family from Toledo. In fact, my grandmother’s sister Antonina Leszczynska Sobieszczanski lived there. Well this jester had a few St Anthony, baptismal register images that I could peruse. Now I was even more amazed. Jozef Fras’ wife, BENIGNA (not a common name) was the god-mother of one of Antonina’s sons. Benigina Fras was god-mother to Matthew Sobieszczanski. Those percentages kept going up. I said, perhaps the Fras had children baptised in St Anthony too. I examined their birth years and looked in the register images and there was their first child Helen Fras whose god-mother was my Antonina Sobieszczanski (to Jozef and Benigna’s daughter). Ok, in my head, we are now at 99+% related.

1 Wladyslaw FRAS d: 11 Feb 1919
  + Agnieszka LESZCZYNSKI b: 12/9/1866
    2 Josef Edward FRAS b: 16 Mar 1893 d: 08 Aug 1935
      + Benigna PALICKI FRASS b: abt 1897
        3 Helen FRASS b: 25 October 1917 d: 23 May 1982
        3 Joseph Radislaus FRASS b: 25 March 1922 d: 14 March 1934
        3 Eleanor FRASS b: 15 Jan 1926 d: 25 Oct 1988
        3 Melvin R FRASS b: 15 Jun 1930 d: 10 Dec 2006

So now my next goal is to find the church marriage record of Wladyslaw Fras and Agnieszka Leszczynski (probably in Biechow parish), since Jozef Fras’ ship manifest said he was born in Piestrzec. This would give me the certain Genealogical Standard of Proof — but I have already added the above to my tree.

Thanks second cousin, twice removed, Mindy! By the way, this line of reasoning I am leaning on is again the Social Network Analysis (what Thomas MacEntee calls cluster genealogy).

Don’t you wish you could search Ellis Island by whom people were going to or coming from? Better database search capabilites are needed and the GEDCOM standard needs to be enhanced to handle these social network/cluster analyses

May 4, 2012

BIG Genealogy — #Genealogy, #FamilyTree, #GEDCOM

by C. Michael Eliasz-Solomon

When Stanczyk, wrote the title, he was not referring to Ancestry.com or any other endeavor by genealogical companies from the western USA. No, Stanczyk is fascinated with numbers .. of people.

Yesterday, this jester wrote about the Confuscius Family Tree. It is commonly accepted to be the largest genealogy (family tree). But I had to wonder … Why?

It is an old genealogy, dating back to Confucius’ birth in 551 BCE. It is now 2012, so we have a genealogy that is 2,563 years old. My much beloved wife/kids are Jewish. In the Hebrew calendar we are presently in the year 5772. Despite my having been to a Jewish Genealogical Conference and meeting a man who told me his genealogy went back to King David. [This jester resisted the rude/snarky comment that if he researched using both Old & New Testaments he could push his research back to Adam.]

I also did not ask him to show me his documentation, but assuming he could, his genealogy would have been another 500 years earlier (~ 1050BCE) and therefore this tree mathematically speaking (assuming there are other Judeo-Christian couplings before I & my wife) his tree had the potential if you could/would follow all/many branches and not just the direct lineal trunk you have a tree with approximately 100 generations (adding another 17 generations to the 83 for Confucius). This assumes a generation is 30 years. Now if we look at Confucius and see 2560 years = 83 generations, we see an average of 30.84 years per generation — so 30 years per generation is not a bad estimate.

What genealogy could be older still? Well according to the Bible we record the Jewish peoples in Babylonia. So perhaps we can extend King David and/or one of his citizens back to King Hammurabi of Babylonia — that would yield another 650 years (~1700BCE) or about another 22 generations. Let me see if Confucius’ family tree is about 2 Million for 83 generations we get about 24,096 people per generation. So by adding 39 more generations then Hammurabi’s Family Tree should contain approximately another 940,000 people. So come on Iraq produce your family tree of nearly 3 Million people!

What genealogy could be older than that? There is a quote that goes something like, “History knows no time when the Egyptians were not highly developed both physically and intellectually.” True enough, recorded history does go back furthest in the Pharaohnic dynasties. That takes genealogy back to the first dynasty King (Pharoah) Menes, who sure enough had a son who wrote about Astronomy [source: Timechart History Of The World, ISBN 0-7607-6534-0 ]. That takes us to approximately, 3,000 BCE, another 1300 years/44 generations/1.06Million people! Ok, since there is no recorded history earlier than that, we will not have a properly sourced genealogy older than this. So people who are Elizabeth Shown Mills devotees turn your heads away …

What genealogy could possibly be older than that? I read that the indigenous peoples of Australia have an oral history of 48,000 generations! The aboriginal people of Australia date back to about 50,000 BCE, which would be 52,000 years ago/1734 generations/41.8Million people in their family tree. That’s not 48,000 generations, but that is more than twice as much as genealogy researchers test using their FAN24.ged file which has 24 completely full generations with 16.8Million pseudo people.

Now that is what I call BIG Genealogy. But where is that family tree (not FAN24.ged)? Why has no genealogy older than Confucius’ genealogy been found and carried forward to the present day? Is it possible that such a family tree exists?

–Email me!

Related Blog Articles …

Random Musings” (10-March-2010, see musing #2)

March 10, 2012

Ancestry.com Broken ? Is Your GEDCOM Export OK? — #Genealogy, #Technology

by C. Michael Eliasz-Solomon

Stanczyk, wants to know if anyone else is having problems Exporting their GEDCOM from Ancestry.com?


 This is what I see when I try to export my gedcom from the tree settings screen. It never gets past 0% complete.

I have tried to submit a Help Ticket for technical support and so far I have not received any response. What gives Ancestry?

I can still work on my tree and updates appear to be saved. I can synch to the Ancestry App (on the iPhone) and the changes are there too. 

March 2, 2012

Diacritical Redux – Ancestry GEDCOM — #Genealogy, #Technology

by C. Michael Eliasz-Solomon

As Stanczyk, was writing about the GEDCOM standard since #RootsTech 2012, I began to pick apart my own GEDCOM file (*.ged). I did this as I was engaged with Tamura Jones (a favorite foil to debate Genealog Technology with). During our tête-á-tête, I noticed that my GEDCOM lacked diacriticals???

What happened? At first I thought it was the software that Tamura had recommended I use, but it was not the problem of that software (PAF). So I looked at the gedcom file that I had imported and the diacriticals were missing from there meaning, my export software was the culprit.

I looked at the GEDCOM’s  HEAD tag and the CHAR sub-tag, and it said “ANSI” [no quotes] was the value. That is not even a valid possible value! According to the GEDCOM 5.5.1 standard [on page 44 of the FamilySearch PDF document]:

CHARACTER_SET:= {Size=1:8}
[ ANSEL |UTF-8 | UNICODE | ASCII ]

Who is this dastardly purveyor of substandard GEDCOM that strips out your diacriticals (that I assumed you have been working so hard to add since my aritcle on Tuesday,  “Dying For Diacriticals“)? I’ll give you a HINT, it is the #1 Genealogy Website  – Yes,  it is ANCESTRY.COM !

Now what makes this error even more dastardly is that the website shows you the diacriticals in the User Interface (UI), but when you go to export/download the diacriticals are not there in the gedcom and unless you study things closely, you may be oblivious (as Stanczyk was for a long time) that these errors have crept into your research. I also found a spurious NOTE that I cannot find anywhere on anyone in my tree — which gets attributed to my home person (uh, me). This is very alarming to me too !!!

Tim Sullivan (CEO of Ancestry.com), I expected better of you and your website. I entrusted my family tree to you and that is what you did with my gedcom? Now I did some more investigating and I found that Ancestry does not strip ALL diacriticals. My gedcom had diacriticals in the PLAC tags and in NOTE tags. But NOT (I repeat NOT) in the NAME tags.

So Tim [pretend there is a shaky leaf here] , if you or a reputation defender or some other minion skims the Internet (for your name) here is what  I hope You/Ancestry.com will do:

  1. Do NOT strip diacriticals from the NAME tag !!!
  2.  Fix the Export GEDCOM to create a gedcom file with diacriticals in NAME tags
  3. Fix the Export GEDCOM to create a valid CHAR tag value: UNICODE, UTF-8, ASCII, ANSEL. I put them in my prioritized/preferred order [from left-to-right]. I hope you will not use ASCII or ANSEL.
  4. Run a GEDCOM validator against the gedcom file your Export GEDCOM software creates to download and fix the other “little things” too  (Mystery NOTEs ???).
February 28, 2012

Dying For Diacriticals … Beyond ASCII — #HowTo, #Genealogy, #Polish

by C. Michael Eliasz-Solomon

Stanczyk mused recently upon a few of the NAMEs in my genealogy:

Bębel, Elijasz, Guła, Leszczyński, Kędzierski, Wątroba, Wleciał, Biechów, Pacanów, Żabiec

If you want to write Elijasz (or any of its variants) you are golden. But each of the other names require a diacritic (aka diacritical mark). Early on, I had to drop the diacritics, because I did not have computer software to generate these characters (aka glyphs). So my genealogy research and my family tree were recorded in ASCII characters. For the most part that is not a concern unless you are like John Rys and trying to find all of the possibly ways your Slavic name can be spelled/misspelled/transliterated and eventually recorded in some document and/or database that you will need to search for. Then the import becomes very clear. Also letters with an accent character (aka diacritic) sort differently than  letters without the diacritic mark. For years, I thought Żabiec was not in a particular Gazetteer I use, until I realized there was a dot above the Z and the dotted-Z named villages came after all of the plain Z (no dot) villages and there was Żabiec many pages later! The dot was not recorded in the Ship Manifest, nor in a Declaration of Intent document. So I might not have found the parish so easily that Żabiec belongs to. I hope you are beginning to see the import of recording diacritics in your family tree.

How?

The rest of my article today teaches you how to do this. Mostly we are in a browser, surfing the ‘net, in all its www glory. After my “liberal indoctrination” (aka #RootsTech 2012), I have switched browsers to Google’s Chrome (from Mozilla Firefox) browser. Now I did this to await the promised “microdata” technology that will improve my genealogical search experience.  I am still waiting,  Mr Google !!!   But while I am waiting, I did find a new browser extension that I am rather fond of that solves my diacritical problem: Virtual Keyboard Interface 1.45. I just double-click in a text field and a keyboard pops-up:

Just double-click on a text field, say at Ancestry.com . Notice the virtual keyboard has a drop down (see “Polski“), so I could have picked Русский (for Russian) if I was entering Cyrillic characters into my family tree.

But I want to keep using my browser …            OK!  Now I used to prepare an MS Word document or maybe a Wordpad document with just the diacriticals I need (say Polish, Russian, and Hebrew) then I can cut & paste them from that editor into my browser or computer application as needed — a bit tedious and how did I create those diacritical characters anyway?

I use  Character Map in Windows and Character Palette -or- Keyboard Viewer  on the MAC:

Now if I use one of these Apps, then I can forgo the Wordpad document  ( of special chars. ) altogether and just copy / paste from these to generate my diacritical characters.

What I would like to see from web 2.0 pages and websites is what Logan Kleinwaks did on his WONDERFUL GenealogyIndexer.org website. Give us a keyboard widget like Logan’s, please ! What does a near perfect solution look like …

Logan has thoughtfully provided ENglish, HEbrew, POlish, HUngarian, ROmanian, DEutsche (German),  Slavic, and RUssian characters. Why is it only nearly perfect? Logan, may I please have a SHIFT (CAPITAL) key on the BKSP / ENTER line for uppercase characters? That’s it [I know it is probably a tedious bit of work to this].

Beyond ASCII ?

The title said  beyond Ascii. So is everything we have spoken about. Ascii is a standard that is essentially a typewriter keyboard,  plus the extra keys (ex. Backspace, Enter, Ctrl-F, etc.) that do special things on a computer. So what is beyond Ascii? Hebrew characters (), Chinese/Japanese  glyphs (串), Cyrillic (Я), Polish slashed-L (Ł), or Dingbats (❦ – Floral Heart). You can now enter of these beyond ascii characters (UNICODE)  in any program with the above suggestions.

Programmer Jargon – others  proceed with caution …

The above are all UNICODE character sets.  UTF-8 can encode all of the UNICODE characters (1.1 Million so far) in nice and easy 8bit bytes (called octets — this is why UTF-8 is not concerned with big/little endianess). In fact, UTF-8‘s first 128 characters is an exact 1:1 mapping of ASCII making ascii a valid UNICODE characters set. In fact, more than half of all web pages out on the WWW (‘Net) are encoded with UTF-8. Makes sense that our gedcom files are too! In fact UTF-8 can have that byte-order-mark (BOM) at the front of our gedcom or not and it is still UTF-8. In fact the UTF-8 standard prefers there be no byte order mark [see Chapter 2 of UNICODE] at the beginning of a file. So please FamilySearch remove the BOM from the GEDCOM standard.

If FamilySearch properly defines the newline character in the gedcom grammar [see Chapter 5, specifically 5.8 of UNICODE] then there is nothing in the HEAD tag that would be unreadable to a program written in say Java (which is UTF-16 capable to represent any character U+0000 to U+FFFF) unless there is an invalid character which then makes the gedcom invalid. Every character in the HEAD tag is actually defined within 8bit ascii which can be read by UTF-8 and since UTF-8 can read all UNICODE encodings you could use any computer language that is at least UTF-8  compliant to read/parse the HEAD tag (which has the CHAR tag and its value that defines the character set). Everything in the HEAD tag, with the exception of the BOM is within the 8bit  ascii character set. Using UTF-8 as a default encoding to read the HEAD will work even if there is a BOM.

February 23, 2012

Meme: Exploring GEDCOM – Gedcom Lines — #Genealogy, #Technology, #Mashup

by C. Michael Eliasz-Solomon

Stanczyk wants to introduce a new meme,  “Exploring GEDCOM“.   I was musing upon why is the state of a GEDCOM standard,  … so CHAOTIC?    GEDCOM has languished for about a decade and a half now with no new standard  – hence my article, “Is GEDCOM dead?” (2/5/2012) .  I was left in a perplexed state after RootsTech 2012. Why is FamilySearch working on a “standard” in a vacuum? Why is there so little communication with the existing software vendors — the purveyors of GEDCOM and why do the end users have no voice into what is needed in a GEDCOM standard?

So I decided that GEDCOM needed an Evangelist. I believe there are already a plethora of GEDCOM Evangelists so perhaps I will just add to the milieu (or is it the meme). To be frank, most GEDCOM Evangelists are really GEDCOM complainers — nay, I think we are all complainers, because there are no GEDCOM complimenters, not even amongst the GEDCOM purveyors. Even FamilySearch, which “owns” GEDCOM (how can that be a standard) wants to make their latest effort (GEDCOMX) a “clean sheet” project. No backwards compatibility even!

Is GEDCOM  just an ugly baby whose parentage is in doubt?

So this meme is on Exploring GEDCOM. What is it? How can it be improved? What should a TRUE gedcom  standard include?  I’ll probably write once to three or four times a month on this meme until I have exhausted myself on this topic. My goal is ultimately, is to get this to be a part of RootsTech and to be an OPEN STANDARD with an open, transparent definition and process for change, which I hope to have tied to RootsTech attendees voting on this, possibly via the RootsTech App.

Allow non-attendees to vote if they register who they are and their role: genealogist, technologist, software vendor, etc. and why they want to be a voter. I think conference attendees (genealogists, technologist, or vendor-of-any-kind, organizer) get an automatic vote, prior attendees get a vote, gedcom software vendors get a vote. All prior voters get to vote in all future votes on the open standard (as long as their email address works or when it is corrected again). OPEN STANDARD means that all stakeholders need to have an opportunity to influence the standard.

Let me start the Meme by revisiting graphic syntax diagrams  …

I started with this railroad track (2/16/2012) to define a gedcom file. Our discussion will focus upon gedcom v5.5.1 and launch from that rocket pad into some far flung future gedcom feature(s). This diagram was derived from the standard in PDF form. I have attempted to make the standard more “grammatical” and formularize/define ambiguities to my genealogical/technological world view. We see a HEAD tag, a TRLR tag and an option SUBN tag with a whole bunch of “gedcom lines”.

Gedcom Line

A V5.5.1 Gedcom Line

This is what a gedcom line looks like. I have added a wish for optional whitespace at the beginning of a line. That is my first proposal. The number at the beginning of each line is meant to be “an outline level”. So I wanted the option of outputting lines with leading blanks corresponding to the level of indentation appropriate for the outline level — to aid readability of seeing what inner outline indentations go  with which outermost level. Make the whitespace a checkbox on export (directed at you software vendor guys) and default it to off.

We see that a gedcom line at its (current) core describes: families, individuals, notes, repositories, sources, submitters & their multimedia (digital documents, notes, memories, etc.). This is still a very high level discussion. We have only spoken of 3 of the 136 tags. But already this jester has a suggestion/complaint.  Let me defer a discussion of Multimedia_Records to its own article as this requires many words, a lot of which are jargon. The complaint – we need more zero level tags!

So deferring multimedia, we have six types of records. A software vendor might think six different tables (or objects) that need to be described and stored as we “parse” each gedcom line in the file that stores our family tree. Do not lose sight that these files are family trees of some researcher — not abstract or theoretical data. These are research from current or prior genealogists and they need to be preserved …  without loss.

At its inner core is a set of individuals (INDI tag). I once wrote a PERL script to pull out all individuals with their vital data (B/M/D). Very easy thing to do. I mention this now to illustrate that these compact files are at the intersection of genealogy and technology. These gedcom files are emblematic of the technology / genealogy mashup that is RootsTech! They are also the way we can interface our genealogies with other non-family tree tools to do additional things. Lets call those gedcom ADD-ONS (or PLUG-INS or APPs) that,  I am hopeful, that with a standard API to be able pull this info out, just like my PERL script pulled out the individuals.  That is the essence of an INDIVIDUAL gedcom record.

There are also FAMILY gedcom records that are defined by FAM,  FAMC,  FAMS and the temple ordinance (i.e. LDS) FAMF tags. Likewise, we have NOTE (NOTE), SUBMITTER (SUBM),  and REPOSITORY (REPO)/SOURCE (SOUR) records too. I mentioned the FAMC/FAMS tags in addition to FAM which really equates to the FAMILY-RECORD, in order to point out that an individual is part of two families. S/He is a part of a family where they are a child(FAMC) and they are also part of the family where they are a parent (re SPOUSE, hence FAMS). This is evident when you realize that we are speaking of a family tree and that a tree really goes forward and backward linking the present to the past (and logically,  vice-versa).

What’s Missing? – A Proposal (the first of many)

I am still ignoring MULTIMEDIA — so that is not it. If we believe in Jay Verkler‘s RootsTech 2012 vision for genealogy, then we need to conform (i.e. standardize):  Dates, Locations, Names. I would also add: Events, Documents,  and possibly Groups. So that is six more zero level RECORDS.

DATES I assume need to be standardized because of the many problems: missing date, partial date, estimated date, various calendars, etc.

NAMES are also a problem area. For example, how do I record my ancestor’s name? Do I conform his name to ENGLISH (i.e. does Piotr become Peter)? Should I record it in his context, (i.e. Pawel for Paul)? Should I record it in the language of the record (my ancestors come in Latin (Paulus), Polish, and Russian. Oh, some of those names do not translate to the other language, so we have adopted names/name changes/nicknames. Latin alphabet versus Cyrillic characters versus Hebrew characters or even just recording diacritical letters like slashed-l (ł ).

UNICODE support is a MUST in any new standard.

We also need Locations, Events, Documents, and Groups as zero level “records”, so that we can pull those out of the file, just as I pulled Individuals out of the file. Locations (i.e. Biechów, Busko, Kielce, Poland) that is the administrative hierarchy of one of my ancestral villages. Of course, it changed over time or by whoever occupied Poland (or should I view it as Congress Poland/Vistulaland as a part of the Russian Empire’s many gubernias). Clearly locales have a time component.

I deferred MULTIMEDIA because it is technical and also because I want to make the case that we need EVENTS and/or DOCUMENTS instead and that MULTIMEDIA are just NOTES that are not textual and often this is congruent with the fact that this digital media is a representation of some document(s) that documented an event. I also propose GROUPS as a record because people want to record connections to MILITARY units, CHURCH SOCIETIES,  SCHOOLS, BUSINESSES/ORGANIZATIONS, REUNIONS, or GOVERNMENTAL/HISTORICAL units that may be of a historical or a strong emphasis within a family history. I think the GROUPS could all be user-defined, with maybe a conformed group-type (i.e. military, religious, government, historical, etc.). This does not feel like the same level of importance as the others: Names, Dates, Locations, Events or Documents.

Summary of Proposed GEDCOM Enhancements

(excluding MULTIMEDIA)
  1. whitespace – for readability
  2. UNICODE support so proper nouns can be recorded in their context with diacriticals or character sets (that are not Latin).
  3. New Zero Level TAGS:  NAME, DATE (not mine, but Jay Verkler’s emphasis)
  4. New Zero TAGS (that Stanczyk wants):  EVNT,  DOCS, &  LOCN (Jay also wanted locn).
  5. Possibly GRUP – to support development of non-familial group memberships in trees

The new zero level tags are to support future CONFORMATION (standardization) efforts and also are the most likely to be sought after via any future API for enhanced analyses or specialized output in reports/charts.

Stanczyk views the Zero Level TAGs as possible dimensions for slicing-dicing a genealogy cube, what Data Architects see as OLAP analysis/reporting   sorry that jargon just slipped out.

The vision is cross family tree bumping or cross website bumping of gedcom data against databases to accomplish new and novel approaches to searching, merging or analyzing. This genealogy data could also be of use to historians or scientists as new sources of data to be mined for their research.

That’s the gedcom exploration for today!

 

P.S. 

Please read the comments too. Apparently, I was wrong. There is a GEDCOM Evangelist who is not a gedcom complainer.

Follow

Get every new post delivered to your Inbox.

Join 370 other followers

%d bloggers like this: