Here’s what I do …
There’s still 3 more screens and part of another that I’ll spare you from …
What do you do on your iPhone?
–Stanczyk
A Muse-ing
A while ago, Stanczyk bemoaned iOS5. Therefore, I owe it an update …
Legacy Family Tree has release blank US Census Forms (page1 | page2) for the 1940 US Census. April 2nd is coming, are you prepared? Is Ancestry.com prepared?
At #RootsTech 2012, the 3rd keynote was an Ancestry talking-head panel. They joked about whether the website could withstand the crush on April 2nd. Let’s see how this experiment goes.
This is the first US Census to be released in an all digital format.
The above diagram is what Stanczyk had been jabbering about since the #RootsTech conference. Isn’t that much easier on the eyes and the grey matter than a complex UML diagram? Who even knows what a UML diagram is or if it is correct or not?
What does it say is in a GEDCOM file (ex. Eliasz.ged)?
A HEAD tag optionally followed by a SUBmissioN Record followed by 1 or more GEDCOM lines followed by a TRLR tag.
ex. gedcom lines that can be “traced” along the railroad tracks at the top.
0 HEAD 1 SOUR Stanczyk_Software 1 SUBM @1@ 1 GEDC 2 VERS 5.5.1 2 FORM LINEAGE-LINKED 1 CHAR UNICODE 0 @1@ SUBM ... 0 TRLR
OK Stanczyk_Software does not exist, but was made up as a fictitious valid SOURce System Identifier name. The GEDCOM file (*.ged) is a text file and you can view/edit the file with any text editor (vi | NotePad | WordPad | etc.). I do not recommend editing your gedcom outside of your family tree software, but there is certainly nothing stopping you from doing that ( DO NOT TRY THIS AT HOME). If you knew gedcom, you could correct those erroneous/buggy gedcom statements that are generated by so many programs — that cause poor Dallan Quass to ONLY acheive 94% compatibility with his GEDCOM parser.
Have you ever downloaded your gedcom from ANCESTRY and then uploaded it to RootsWeb? Then you might see all those crazy _APID tags. It is a custom tag (since it begins with an underscore – GEDCOM rules dear boy/girl). It really messed up my RootsWeb pages with gobbledygook. I finally decided to edit one gedcom and remove all of the _APID tags before I uploaded the file to RootsWeb. Aaah that is SO much better on the eyes. Oh I probably do not want to re-upload the edited gedcom into ANCESTRY, but at least my RootsWeb pages are so much better! The _APID is just a custom tag for ANCESTRY (who knows what they do with it) so to appeal to my sense of aesthetics, I just removed them — no impact on the RootsWeb pages, other than improved readability. [If you try this, make a backup copy of the gedcom and edit the backup copy!]
Now obviously the above graphic syntax diagram is not complete. It needs to be resolved to a very low level of detail such that all valid GEDCOM lines can be traced. It also requires me/you to add in some definitional things (like exactly what is a level# — you know those numbers at the beginning of each line).
I have a somewhat mid-level graphic syntax diagram that I generated using an Open Source (i.e. free) graphic syntax diagrammer, as I said in one my comments, I will send it to whoever asks (already sent it to Ryan Heaton & Tamura Jones). You can get a copy of Ryan Heaton’s presentation from RootsTech 2012 and compare it to his UML diagram (an object model). I think you will quickly realize that you cannot see how GEDCOM relates to the UML diagram — therefore it is difficult to ask questions or make suggestions. A skilled data architect/data modeler or a high-level object-oriented programmer could make the comparison and intuit what FamilySearch is proposing, but a genealogist without those technical skills could NOT.
I am truly asking the question, “Can a genealogist without a computer science degree or job read the above diagram?” and trace with his finger a valid path of correct GEDCOM syntax [ assuming a whole set of diagrams were published]. The idea is to see how the GEDCOM LINES (in v5.5.1 parlance FAMILY_RECORD, INDIVIDUAL_RECORD, SOURCE_RECORD, etc.) are defined and whether or not what FamilySearch is proposing something complete/usable and that advances the capabilities of the current generation of software without causing incompatibilities (ruining poor Dallan Quass’s 94% achievement). Will it finally allow us to move the images/audio/video multimedia types along with the textual portion of our family trees and keep those digital objects connected to the correct people when moving between software programs?
GEDCOM files are like pictures of our beloved ancestors. They live on many years beyond those that created them. Let’s not lose any of them OK?
Stanczyk added a new Page (Tech Diary) to record my technology doings.
While doing that and reading from my blogroll (and emails), I discovered some history about the “defacto standard GEDCOM” (wiki: GEDCOM ). Now I strongly recommend you start from “defacto” link rather than the wikipedia link.
I believe those provide the most recent current thoughts on GEDCOM (that I have not penned).
My Graphic Syntax Diagram of GEDCOM v5.5 was produced using an open source tool. It is partial and still high level. I did put in a construct so that you can clearly see all 128 standard tags. The Graphic Syntax Diagrammer is an excellent tool. I will have to offer the author a suggestion for the PNG images that it outputs. I need to take my diagram and manually edit it to make the drawing a better fit for 8.5″ x 11.0″ (aka A1) paper. I need to graphically wrap the railroad tracks and to add page breaks so that the image is itself usable for viewing/discussions. I will offer this sample drawing to any interested parties — including emailing the edited product to Ryan Heaton and Dallan Quass [who since they did not request it -- can feel free to ignore it].
My goal is to make minor tweaks to GEDCOM v5.5 via this diagram [not programming] and try and get DallanQ to produce a one-off parser for it (call it, say GEDCOM 5.5.999) and hope that my tweaks will not lower Dallan’s hard work of achieving 94% compatibility. If it turns out to have virtually no effect on Dallan’s 94% compatibility in his Open Source parser, then I can think about getting some software vendors to utilize the enhancements (via end user requests), since they are trivial, just to move the standard forward and to open an interest in the vendors to looking at how we create a new Open Standard for GEDCOM.
Thanks to Tamura Jones, I now know I need to update my diagram to GEDCOM v5.5.1 first
Stanczyk, has been churning since about November of last year (2011). I have a number of ideas rummaging around my brain for genealogy apps. For over a quarter century, I have been a computer professional and used and/or developed a lot of programs using a myriad of technologies. At my core, I am a data expert: design it, store it, query it, manage it, analyze it and protect it. It being the data.
Before going to #RootsTech 2012, I knew GEDCOM was the core of our hobby/business/research. GEDCOM is our defacto standard. It is how data in exchanged between us and our various programs. I say defacto because as a standard goes it is not a very open standard (one organization “owns” it, and the rest of us go along with it). It also has not changed in about decade and a half; So Ryan Heaton was correct in calling it “stale”. It does still work .. mostly. Although if a standard does not progress then you get a lot of proprietary “enhancements” that prevent the interchange of data completely — since one vendor does not know how to deal with another vendor’s file in totality.
At present, GEDCOM maxes out at version 5.5, although there are various other variations you might see. But 5.5 was the last standard version. I counted 128 total tags and a provision for creating non-standard tags (they start with an underscore).
[Mike thanks to Tamura Jones! Even though GEDCOM v5.5.1 was never finalized, it IS the defacto max version of GEDCOM. GEDCOM v5.5.1 added 9 tags, removed the BLOB tag, so we now have a total of 136 tags. -- I will need to update even my high level graphic syntax diagram]
Tags are like:
INDI, FAMC, FAMS, SOUR, REPO, HEAD, TRLR etc. -or- ALIA, ANCE
The first bunch is familiar and are probably in your family tree (if you ever exported the GEDCOM file). The ALIA tag is one that Dallan Quass said was universally used wrong by all programs. After seeing its definition, I can see how it is confusing. As for the ANCE, tag I do not recall seeing any program letting me do any functionality that might utilize this tag. This tag is probably one of those tags that Dallan said is not used at all.
I looked at the “MULTIMEDIA” section of the standard. It looks like it is woefully out of date and probably not used at all (at least not in any standard way), which is probably why our pics, audio, and video (or any other media file like PDF, MS Word) do not move with the GEDCOM. Has any program ever used the ENCODING/DECODING of a multimedia file? The standard seems to imply a buffer of only 32K (for a line) and even if you used a large number of CONC tags strung one after another you need 100 lines to store a 3.2MB file in-line in the GEDCOM. I do not think I have seen that in a GEDCOM. They probably stored these binary large objects (BLOBs) outside the gedcom and refer to their path on the computer/network. I did some noodling. I have 890 MB (or approximately 890,000 KB) in pictures and scanned source documents for about 1,000 people in my family tree. So I use nearly a gigabyte (1GB) for my family tree and all other multimedia — and I do not have any audio or video! So I use almost 1MB/person.
If we did have this magical new GEDCOM standard that could carry all of our multimedia from one GEDCOM program to another GEDCOM program, the copying would take a long time. If I uploaded/download it to/from the Internet, I might incur an overage on my ISP’s usage charges, if this were technically feasible! Imagine if I did this multiple times a month (as I got updates). I am beginning to understand why no vendor has tackled the problem. I would also like to store PDFs and other documents besides GIF/JPG/PNG which can be displayed on the Internet web pages natively in a browser. Those are not a part of the existing GEDCOM standard. Let me sling some jargon — I’d want to store any file type that there is a MIME type definition for, that I can currently embed in emails, or utilize in Java programs or that the HTML5 standard will allow for multimedia.
The GEDCOM 5.5 was in its infancy on dealing with character sets. It was predominantly ASCII with some funky ANSEL coding of characters to handle latin alphabet diacriticals, although it is not clear how I would do the data entry for those and it looks incomplete. It did mention UNICODE, but only cursory and just to remind us that the lengths in the GEDCOM standard were in ’characters’ not bytes –which was correct. Although those multibyte characters (say in Hebrew, Russian or Japanese or Chinese) would quickly use up the 32K byte line buffer limit, which would effectively become about 8K characters per line. In fact, GEDCOM 5.5 says it will only deal with LATIN alphabets and leave Cyrillic, Hebrew and Kanji for some far flung future. Stanczyk is Slavic, I need UNICODE to represent my ancestor’s names and places. Fortunately, I do not feel the need for Cyrillic (Russian, Ukrainian, Belorussian, Macedonian, etc.) or I’d be out of luck. I’ll just use the Polish version of those names in their ‘Latinized’ forms.
Oh that is another area the standard needs to be enhanced. NAMES. Dallan mentioned that Personal Names do not get a thorough treatment in the standard (I am refusing to read the data model and I am a Data Architect). Location Names get almost no treatment — they do give you a place to store your locations (PLAC tag). What language should I use, after all my ancestors are from POLAND for God’s sake. Besides the obvious Polish, I have German, Russian and Latin to deal with and being American I prefer English. Slavic names often do not translate well. For example Wladyslaw is Ladislaus in Latin, but in English there is no equivalent — maybe that is why my ancestors use ‘Walter’ instead. But the point is, how should I store the name? Can I store all of the equivalents and search on any of them? Nope.
Damn, Russian is Cyrillic. GEDCOM doesn’t deal with non Latin alphabets; And even though I can read the Russian genealogy records, I ‘d rather not nor would I want to try and do data entry that way either. Besides, the communists reformed the language in 1918 (making War & Peace considerably shorter in Russian); That reform eliminated several characters. Most modern software is not aware of the eliminated characters much less able to generate them. This whole Language/Unicode/Name thing is complicated and I have not even mentioned the changing borders or the renaming of cities in different languages or over time or their changing jurisdictions. I cannot fault GEDCOM for all of these woes. I have them in my own research and I have not yet found any satisfying way to handle them. I find it helps to have a very good memory and keep these things in my head — but there is no backup for that.
How are we ever going to arrive at the vision Jay Verkler put forth at #RootsTech? GEDCOM needs to become an open standard. Once it is standardized again, then it needs to become modern again and deal with the current technology, so we can get around to the tough problems of conforming: names, places, sources/repositories, calendars/dates and doing complex analyses like Social Network Analysis as a way to gather wayward ancestors into a family for which we lack documentation to prove (Genealogically). I hope the future includes Bieder-Morse phonetic matching and can deal with folding diacritical characters into a base character (ex. change ę into e) for searches.
FamilySearch, if you are going to register GEDCOM tools, then please do a few more things for the NEW standard. First, make each vendor add to an APPENDIX the name and complete definition of their NON-STANDARD tags, in case anyone else wishes to implement or deal with them. Put a section in the header (HEAD tag) that lists all NON-STANDARD tags (just once each) along with its vendor so that someone else can go look at the standard and see what these tags mean and possibly implement the good ones. Forget that two byte thing before the HEAD tag. Just make the HEAD tag ‘s CHAR sub-tag indicate the character set (ANSI | ANSEL | UNICODE ). Please administer a #RootsTech keynote to vote on annual changes to the GEDCOM standard. Provide a GEDCOM validator and also a GEDCOM converter webpage to allow users/vendors to validate/convert their gedcom file(s).
Make multimedia be meta-data and allow users to define “LOCATIONS” where multimedia files can be found using either a PATH or a URL (or a relative path / URL). Make it a part of the standard that the meta-data must move, but the multimedia files can optionally stay put. Multimedia should be able to be placed on a LOCAL/NETWORK, or on the INTERNET or on a multimedia removable volume(s) [thumb drives, CDs, DVDs, etc.]. Make the multimedia “LOCATIONS” editable so a user can switch between LOCAL/NETWORK, INTERNET, or REMOVABLE including using some of each type of LOCATION. Allows these files to exist or not (show “UNAVAILABLE” or some equivalent visual clue, if accessed and they do not exist). The mapping between an Individual (INDI) or a family (FAM) or some other future GROUP and its multimedia file(s) must move as a part of the meta-data (even if the multimedia file(s) do not). That way the end-user need only edit his LOCATIONS meta-data (and ensure the files are in that/those location(s)) when he runs the software.
Define an API for GEDCOM plug-ins so that new software can access the GEDCOM without parsing the gedcom file. The API should give the external plug-in a wrapped interface to the underlying data model without having to know the data model, just the individual, family, or location, or a name list of individuals, families, or locations. This will allow new software to provide additional functionality to a family tree or to provide inter-operability between trees/websites. Obviously security/privacy rules would limit this kind of plug-in access.
That’s Stanczyk’s vision of the GEDCOM future!
Last week Stanczyk attended Roots Tech 2012. The Roots Tech conference was started in 2011 and was attended by 3,000 people. Their concept was to mash-up genealogy Users & Developers in one conference and see what would happen — a grand experiment. I missed the first one so I cannot compare 2012 with 2011, except to note that there was 40% increase in attendance — since 2012 had roughly 4,200 registered attendees.
Now I have been to genealogy conferences before. Most are smaller, much smaller and not held in Convention Centers. This conference was also higher energy/excitement. So what were the highlights from this year’s mash-up ?
Jay Verkler delivered the day 1 keynote and as I have written before he was the most effective speaker and laid out the vision for our hobby (uh industry) in a highly charged and entertaining fashion. A genealogical visionary — he should work for Apple.
Josh Coates was another impressive keynoter, who delivered the day 2 talk. His theme was Big Space and talked about Exabyte monsters. It was funny and erudite. It reminded me very much of Isaac Asimov’s essays collected into his book, “Assimov on Numbers” and in particular his essay on large numbers, entitled “T-Formation“. Josh’s only failure was to tailor his talk to the genealogical audience and topics. He was a very humorous speaker and his intelligence is unimpeachable.
There was a third day’s keynote from Ancestry.com but I did not like form nor the function of their address, although they did have some nice technology to demo. Perhaps they should stick to demoing what is new or coming. The talking panel format (for a keynote ??) is dead [date of death: 4-Feb-2012, place of death: Salt Lake City, UT].
I was pleasantly surprised that Roots Tech streamed some sessions. That was a positive thing to encourage future participation. I was also surprised that there were ASL signers for the deaf at some sessions — nice outreach. I hope they keep the Late Night At The Library going — some stayed until midnight (this jester wilted about 9:45).
Google was at RootsTech 2012. Google was a Keynoter, Google was a Vendor and Google was a presenter. Google was in the house. The tech gear had some Android devices in the audience too.
Only Apple had more technology there. Unfortunately, it was among the users, developers, and presenters. Tim Cook bring Apple to RootsTech 2013!!! Your customers deserve Apple to give the same presence as Google. As I said in my last article, iPads, iPhones, MacBooks (mostly Pro, but some Air) — the attendees were so tech laden you would have thought Ubiquitous Computing had arrived. Isn’t there a recession? Where did all these tech warriors come from? These were users a bit more than developers. Bloggers were numerous, most wore Mardi Gras beaded necklaces so they were recogizable. Then you had secret bloggers such as Stanczyk. Everyone was a genealogist. Users encouraged Vendors/Developers with praise and requests for more/better technology. Oh and make the tech transparent.
But this is about Google. Before the conference I had written the Google tech off as too low brow to bother with. Then Jay Verkler showed up — who is apparently the Steve Jobs of genealogy. He was the Keynoter on day one. Stanczyk is a genealogist and I have been to genealogy conferences before. These are usually staid affairs. Genealogists are … how should I put it … umm, old. It is not unusual to see octogenarians and nonogenarians (90′s). But the energy in the auditorium of 4,200 conference attendees was electric. These were not stodgy, Luddites. Notebooks and pens were almost nonexistent!! People were excited and very much anticipating — what, I do not think we had a clue, but expectations were off the charts.
Jay did not disappoint. He was personable and masterful in his presentation skills. Mr Verkler is a Visionary like Steve Jobs and the audience knew it and responded. It was Jay who weaved the vision which everyone now wants ASAP. He brought up Google and my eyes were prepared to glaze over. I did not even record the Google execs’ names [shame on me]. They were good! They had prepared for RootsTech and they showed brand new tech and also Microcode. I do not have words to express what I saw, but everyone in the audience wanted it.
Google showed Microcode which would be a Google Chrome plug-in and appear as a widget/icon in the address bar that can do amazing search/exchange tricks in a Web 2.0+ way. It would utilize Historical-Data.org in some unspecified way to do this genealogy magic. It was beyond amazing. Google created a genealogy plug-in!! Google is apparently also coordinating in an API-like way to transfer these search result magics into other websites like FamilySearch, Ancestry, etc. that put this magic into the beyond amazing realm.
Firefox and Safari take note if you do not want to see a massive shift to Chrome. I am pretty sure all genealogists will use Chrome when Microcode widget arrives.
The RootsTech Conference is living up to its name. Everywhere there was a sea of: iPhones/Androids, iPads (in huge numbers), and laptops. Even the very elderly were geared up. Google, Dell, and Microsoft were at RootsTech. — why not Apple, especially since their customers were present in LARGE numbers??? [note to Tim Cook have Apple sponsor and show up as a vendor.]
According to Ryan Heaton (FamilySearch), “GEDCOM is stale.” He went on to speak about GEDCOMX as the next standard as if GEDCOM were old and/or dead. They were not even going to make GEDCOMX backwards compatible! In a future session I had with Heaton I asked the Million dollar question, “How do I get my GEDCOM into GEDCOMX”? After a moments pause he said they’d write some sort of tool to import or convert the existing GEDCOM files. Well that was reassuring??? So they want GEDCOMX to be a standard but FamilySearch are the only ones working on it and they have not had the ability to reach out to the software vendors yet (I know I asked).
My suggestion was to publish the language (like HTML, SQL, or GEDCOM). I asked for “railroad tracks“, what we used to call finite state automata, and what Oracle uses to demonstrate SQL syntax, statements that are valid with options denoted and even APIs for embedding SQL into other programming languages. Easy to write a parser or something akin to a validator (like W3C has for HTML).
Dallan Quass took a better tack on GEDCOM. His approach was more evolutionary, rather than revolutionary. He collected some 7,000+ gedcoms

GEDCOM Tags
and wrote an open source parser for the current GEDCOM standard (v5.5). He analyzed the flaws in the current standard and saw unused tags, tags like ALIA
that were always used wrong, custom tags and errors in applying the standard. He also pointed out that the concept of a NAME is not fully defined in the standard and so is left to developers (i.e. vendors) to implement as they want. These were the issues making gedcoms incompatible between vendors. He said his open source parser could achieve 94% round trip from one vendor to another vendor.
Now that made the GEDCOMX guys take notice — here was their possible import/conversion tool.
The users just want true portability of their own gedcoms and the ability to not have to re-enter pics, audio, movies over and over again. RootsTech’s vision of APIs that would allow the use of “authorities” to conform names, places, and sources would also help move genealogy to the utopian future Jay Verkler spoke of at the keynote. APIs would also provide bridges into the GEDCOM for chart/output tools, utilities(merge trees), Web 2.0 sharing across websites / search engines / databases (more utopian vision).
GEDCOM is the obvious path forward. Why not improve what is mostly working and focus on the end users and their needs?
FamilySearch get vendors involved and for God’s sake get Dallan Quass involved. Publish a new GEDCOM spec with RailRoad tracks (aka Graphic Syntax Diagrams) and then educate vendors and Users on the new gedcom/gedcomx. Create a new gedcom validator and let users run their current gedcoms against it to produce new gedcoms (which should be backward compatible with old gedcom to get at least 94% compliance that Quass can already do)!
Ask users for new “segments” in the railroad tracks to get new features that real users and possibly vendors want in future gedcoms. Let there be an annual RootsTech keynote where all attendees can vote via the RootsTech app on the proposed new gedcom enhancements.
How about that FamilySearch? Is that doable? What do you my readers think? Email me (or comment below).
P.S. Do Not use UML models to communicate the standard. It is simply not accessible to genealogists. Trust me I am a Data Architect.
To Stanczyk, it appears that 2012 has gotten off to a sluggish start (genealogically speaking). How about for you genealogists (email or comment)? Well that is all about to change ! Lisa Kudrow‘s Who Do You Think You Are?, returns this Friday with Martin Sheen as the subject.
RootsTech 2012 kicks off this week too. Did you notice, they have an app (its free) for that? Even better they will STREAM some of the conference for the benefit of all genealogists ! Kudos to Roots Tech — All Conferences (genealogical or not should do these two things: app and stream conference proceedings). This should definitely jump start genealogy.
Read these blogs. Yes, I am telling you its ok to read other blogs than this one. These people are “official Roots Tech bloggers”.
I discovered that I missed one of my holiday blogs (in my backlog) about the happy married couples in Pacanów parish from 1881. So I will post the names of 40 Happy couples and what record # (Akt #) they are in the Pacanów parish church book. This is two years after my great-grandparents got married, but there is still a Jozef & Mary who are getting married (Jozef Elijasz). I once had to sort out the two Jozef Elijasz from 1879 and the one from 1881 who all married women named Mary in the village of Pacanów! Genealogy is hard.
Oh and Punxsutawney Phil will make an appearance this week and offer his weather prognostication skills (I really think his predecessor Pete was much better and more alliterative too). I am pretty sure Phil & Pete are German, so you will need a German genealogy site for their lineage. Quaint tradition (Pennsylvania), dragging a Ground Hog from its home to ask him about weather. I think Bill Murray’s movie captured it well. So be careful what you do this week, or you may be repeating it a few times.
If you follow Stanczyk‘s posts, then you know the first 2012 Genealogical Website Ratings were published yesterday. I wanted to follow-up on that article’s meme with yet a further muse.
The ratings show that there was quite a bit of a shuffling around. Overall though, genealogy websites are nascent. That is my meme for today: The State of Genealogy is Very Good and Is Improving. In a little over a week, RootsTech 2012 conference will happen. The convention shows many of the top web sites are attending: Ancestry.com, Fold3.com, FamilySearch.org, Mocavo.com, LegacyFamilyTree, MyHeritage, RootsMagic, Geni.com, AgesOnline, etc. In the middle of this conference, the “Who Do You Think You Are“, show will debut (3-Feb-2012). Late March brings us PBS’s “FINDING YOUR ROOTS…” So the first quarter looks promising. Do you doubt this jester?
Perhaps the Baron’s Online article, ” ‘Tis the Season For Ancestry.com” will convince you. Bob O’Brien (the author) analyzes the stock performance of Ancestry in light this convergence. He does not reference RootsTech nor PBS — but this jester does. Also adding to the synergy for 2012 Genealogy is the release of the 1940 US Census on April 2. So 2012 has all the makings for genealogy’s best year ever. Baron’s does mention the 1940 Census too.
Now a successful business climate for genealogy – software, hardware, and services can only mean many good things will be coming for us genealogists. Let me urge you to greater heights in your research by lending your efforts in your research and also in collaborating on the Internet. We can all push our own research (and of course those distantly related to us) forward and ride the rising tides of the 2012 Genealogy Surge.
For good measure the biennial United Polish Genealogical Societies Conference in late April is also happening this year. So Polish Genealogy should be able to ride the tide of popularity too.
RootsTech looks like it will have its emphasis on the Internet with its evolving collaborative tools (social networks, HTML5, new databases, blogs, developer tools/frameworks/standards to enhance the collaborative/connection making nature of genealogy and provide richer search/match tools/techniques, etc.). Catch this break-out year!
That’s the Meme – The State of Genealogy in 2012 is very promising.
Welcome to Stanczyk’s 2012 First Quarter Genealogy Website Rankings. I know I am a week early — c’est la vie! Since my last rankings an array of rank postings [uh, pun partly intended] have appeared. Stanczyk has also received exactly one request for inclusion in his rankings, from .. Tamura Jones about his website: www.tamurajones.net [#58 on the new Rankings]. He also has a worthy Twitter page too. Keep sending in recommendations — I will keep thinking about them or including them if they are worthy. I liked Tamura’s stuff so MUCH, that I added his genealogy page to my blogroll [Modern Software Experience at the right].
I really liked the survey from the Canadian website: Genealogy In Time. I added their magazine/website (#13) as well to my rankings. I found them because they produced an excellent Genealogy Website Ranking (mid January 2012), that included a very thorough discussion of their methodology. They neglected a few Polish Websites that SHOULD have made their list. Also they list Ancestry.com in all of its many global incarnations and this eats up an unnecessary number of the top 125 poll slots. But aside from those minor criticisms, their rankings is very GLOBAL and very good. Who knew there was a Chinese (make sense, considering their billion plus citizens and their excellent genealogical records) genealogy website or a Finnish website too in the top 125???
OK, Stanczyk will keep his Rankings list, because of the emphasis on Polish / Slavic genealogical websites. Stanczyk also has many in the range 100-125 that are very useful though not popular enough to be the Genealogy in Time Rankings. However, the Genealogy-In-Time-Poll, makes a very useful tool in another way. They have graciously included the website links (URLs) of each site, making it rather easy to build a genealogical Favorites/Bookmarks list that is broadly useful. Stanczyk admits to his list being somewhat selective in the lower 1/3 in order to be more valuable to Polish Researchers (in particular to English speaking, though not exclusively so). On a personal note, this blog you are reading is in the top 5.8Million (of all websites world-wide) and is #120 on my Website rankings — come on readers give me a boost, please!
Needless to say, all website rankings I read, agree on the top 20-40 websites (putting aside the multiple listing of Ancestry.com).
Here is a snippet of the Rankings and the rest are on the Rankings Page:
To: Jim Delany (Big 10) John D. Swofford (ACC) Larry Scott (PAC 12)
An Open Letter to: Jim Delany (Big 10), John D. Swofford (ACC), Larry Scott (PAC 12)
12/6/2011
Re: BCS Poll
You should immediately quit the BCS. It is rigged against you and your three conferences. If you read my letter then you should see from my analysis, that the “computer polls” are inherently biased (and perhaps worse than the two human polls that make up the other 2/3 of the BCS rankings).
First off, I used the Human Polls (Harris Poll & USA Today/ESPN) as the normative index. If you say this ok then you can accept my analysis. If you reject it, then you should be pitting LSU against Oklahoma State in the BCS Championship Bowl Game, because that is what the Computer Polls would have made the result if there were no human polls as a part of the BCS Index.
My analysis clearly shows that the computer polls OVERWHELMINGLY favor the BIG12 and have a strong bias in favor of the SEC too. At the same time it is OVERWHELMINGLY rigged against the BIG 10 and strongly biased against the ACC and the PAC 12 conferences.
The analysis shows that the Big 10, ACC and the PAC 12 would have to overcome a huge bias by the computer polls via the Human Polls to have any chance to reach the BCS Championship Game. You should realize that by selecting the SEC every year to play in the BCS Championship Game, you keep the bias in the computer polls and it will become a self-fulfilling prophecy each and every year. That means the BIG Money will continue to flow unchecked into the SEC (and also to lesser degree to the BIG 12) as it is a “virtuous cycle” upwards for these two conferences who get the best recruits and booster money because they are ALWAYS in the BCS Championship Game.
Now that you have given in to the precedent of two teams from the same conference in the BCS Championship Game (should be a rule against this) you will see a heavy bias to that year after year, since that is all new recruits will see and the “virtuous cycle” will persist. Also, did you realize that the computer and the human polls will emphasize the next year’s polls based upon the previous year, via the pre-season polls?
The root cause you will see is that two computer polls in particular: Kenneth Massey & Jeff Sagarin strongly overemphasize Big 12 teams and SEC teams also have a strong positive bias, while at the same time, these two same computer polls also demonstrate an under-emphasis of the Big 10 and a strong negative bias against the ACC and PAC 12. The effect is what we have seen for the last few years and culminating in this years SEC-only Championship.
If you want to keep the BCS Polls, then you will need to do five things to improve them and their perception as fair:
Mind you the Anderson & Hester computer poll exhibits some bias too, but it at least it is not in COMPLETE lock step with the Kenneth Massey or Jeff Sagarin polls. Otherwise, please dismantle the BCS system and just have 4 super football conferences and take the conference champion from each and have these four teams play a semi-final and a final game to determine the national champion fairly. See the attached spreadsheet data, cut/pasted into the next page and do you your own analysis to validate my findings and see if you reach the same conclusion. Please pay special attention to TEXAS in the final rankings if you wish to be totally disgusted by the computer polls – there is no mathematics that can justify that conclusion by computers, unless there is a BIG12 bias. The computer polls would have made TEXAS, a 7-5 team, the 19th ranked team overall in the whole country and the two offending computer polls would have made TEXAS 13th in the country and eligible for a BCS at Large Bowl Game. Can you imagine? Only TEXAS and AUBURN (BIG12 & SEC) have 5 losses in the BCS Top 25. In fact there are no other 5 loss or any 4 loss teams!
Someone should commend the Richard Billingsley, Colley Matrix and Peter Wolfe computer polls for their ability to keep bias from skewing their rankings.
Anderson & Hester can and should do better in their computer algorithm.
| 2011 | FINAL | BCS | POLL | ||||||||
| Human Polls | A/H | RB | CM | KM | JS | PW |
Comp Polls |
Comp Diff | Diff Summ | ||
| LSU | SEC | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| BAMA | SEC | 2 | -1 | -1 | -1 | -1 | 0 | 0 | 3 | -1 | -4 |
| OKLA St | B12 | 3 | 1 | 1 | 1 | 1 | 0 | 0 | 2 | 1 | 4 |
| Stanford | P12 | 4 | 0 | 0 | -1 | -4 | -6 | -3 | 5 | -1 | -14 |
| Oregon | P12 | 5 | -7 | 0 | -3 | -5 | -4 | -1 | 8 | -3 | -20 |
| Arkansas | SEC | 6 | -1 | -2 | -6 | 1 | 2 | 2 | 5 | 1 | -4 |
| Boise St | MWC | 7 | -2 | 1 | 0 | -6 | -6 | -1 | 9 | -2 | -14 |
| Kans. St | B12 | 8 | 3 | 1 | 4 | 4 | 3 | 3 | 4 | 4 | 18 |
| SCaro | SEC | 9 | -1 | 1 | -2 | 0 | 1 | 0 | 10 | -1 | -1 |
| Wisc | B10 | 10 | -5 | 0 | -5 | -6 | -9 | -2 | 14 | 4 | -27 |
| VaTech | ACC | 11 | -2 | 0 | -2 | -3 | -10 | -6 | 13 | 2 | -23 |
| Baylor | B12 | 12 | 1 | 2 | -5 | 2 | 6 | 5 | 11 | -1 | 11 |
| UMich | B10 | 13 | 2 | -3 | 4 | -6 | -9 | -5 | 15 | 2 | -17 |
| OKLA | B12 | 14 | 8 | 5 | 8 | 7 | 8 | 4 | 7 | -7 | 40 |
| Clemson | ACC | 15 | -4 | 0 | -3 | -5 | -2 | 2 | 16 | 1 | -12 |
| Georgia | SEC | 16 | 2 | -4 | 0 | 5 | 5 | 2 | 12 | -4 | 10 |
| Mich St. | B10 | 17 | -3 | 4 | -4 | -7 | -7 | -5 | 21 | 4 | -22 |
| TCU | MWC | 18 | -4 | 4 | -1 | -5 | 0 | 3 | 17 | -1 | -3 |
| Houston | CUSA | 19 | 3 | 0 | 5 | -2 | -6 | 0 | 18 | -1 | 0 |
| Nebraska | B10 | 20 | 3 | 2 | 3 | -5 | -3 | 0 | 19 | -1 | 0 |
| So. Miss | CUSA | 21 | 25 | -1 | -1 | 25 | 25 | 5 | 24 | 3 | 78 |
| Penn St. | B10 | 22 | 1 | 1 | 2 | 25 | 25 | -1 | 23 | 1 | 53 |
| West VA | Beast | 23 | 25 | 25 | -1 | 25 | 25 | 25 | 25 | 2 | 124 |
| Texas | B12 | 24 | 7 | 25 | 2 | 11 | 11 | 0 | 19 | -5 | 56 |
| Auburn | SEC | 25 | 0 | 1 | 25 | 8 | 11 | 4 | 21 | -4 | 49 |
| -7 | 11 | -8 | -35 | -37 | -2 | -17 | -78 | ||||
| Skew By | Conference | ||||||||||
| ACC | -6 | 0 | -5 | -8 | -12 | -4 | -35 | ||||
| B10 | -2 | 4 | 0 | -24 | -28 | -13 | -63 | ||||
| B12 | 20 | 7 | 10 | 25 | 28 | 12 | 102 | ||||
| PAC12 | -7 | 0 | -4 | -9 | -10 | -4 | -34 | ||||
| SEC | -1 | -5 | -9 | 13 | 19 | 8 | 25 | ||||
Source: 12/5/2011 Philadelphia Inquirer Final BCS Standings
The bottom five teams were unranked in one or more computer polls making their data unfit for some of the analyses – these were not used in the bottom analysis of Skew By Conference.
Stanczyk is very old … My portrait by Jan Matejko dates back to 1862 alone. So perhaps you can forgive me if I blog about an antiquarian notion today … BOOKS. First off, I hope everyone had a Blessed and Family/Food Filled Thanksgiving Holiday (4th Thursday in November in the USA).
As I was saying I want to write about books today. I provided a handy photo for the reference of my younger readers who may need a refresher on the concept. Before you run off … Here’s my list:
No Amazon.com or Barnes & Noble today, although they are worthy purveyors — nor will I speak of Antiquarian Books, though I reserve that topic for another day.
Google Books (books.google.com/books) – I adore to find public domain books or snippets of books under copyright that I can search and perhaps get at least a snippet view of my search topic. Google now lets you keep the public domain books on their “Cloud” (no space on your hard disk). At present, my Google eBooks include:
Stanczyk has written about Stanislaw Lem before (http://mikeeliasz.wordpress.com/2011/07/17/thingsifind-when-looking-up-other-things-stanislaw-lem-1956-przekroj/). So in another bit of cognitive resonance, I find that Google has a Stanislaw Lem Doodle (a rather complex Google Doodle). Now before you scurry off to verify this factoid, be forewarned that here in the USA, we only see a Turkey Doodle. Here is the UK Google Doodle (http://www.google.co.uk/) for Stanislaw Lem.
A Few Articles on the Lem Google Doodle:
The last two are European newspapers, as it is not readily apparent in the USA that Goggle has done this tribute. You need to visit a Google mirror in Europe to see the Stanislaw Lem Doodle (or click on the first link above). The doodle ends with the message that the art was inspired by the drawings of Daniel Mroz for Lem’s short story collection The Cyberiad, published in 1965. This Google Doodle is interactive, allowing users to participate in a series of games. This doodle marks the 60th anniversary of the publication of his Stanislaw Lem’s first book, The Astronauts in 1951.
Since he is Polish son, go Googling in the UK today.
Yesterday (20th-November-2011), Stanczyk’s iPhone flagged his attention that his Ancestry.com App had an update available (Version 3.0.1).
I tried it on a new tree with a few people. When I download the tree and used Roots Magic 4.x to display the Gedcom, I still get a tree without the proper family linkages. This bug appeared before iOS5 and still persists. I do not get it on my older pre-iOS5 trees that existed on Ancestry.com (before the bug). This bug is not an Ancestry App bug. So early adopters will not see this bug unless you create a new tree and download the Gedcom file for use in another family tree program. The tree appears just fine on Ancestry.com and also in the Ancestry App. I am not certain what is happening in the GEDCOM format of the file. I can use Roots Magic 4.x on older Ancestry.com trees (downloaded Gedcoms) and the family relationships are fine.
So I am leaning towards this being an Ancestry.com bug (not a Roots Magic bug).
There was NO mention of whether this makes the Ancestry App iOS5 compatible. It says, it requires iOS4 or later to run the App. It is a 15.9MB download so it takes a bit of time and bandwidth to download. Still it is under the 20MB that forces an iTunes on the computer download. Synching works fine in both directions, so you can create or modify your family tree on the web or in the smartphone App and both sides stay in synch. Because you update to 3.0.1, your entire tree will need to be downloaded. If you get to be about 1,000 people this does take a noticeable amount of time. For 100 people trees or less the delay is miniscule.
Download the new version. Portable Genealogy is back. But please Ancestry, can you fix the Gedcom issue, so I do not need to see people complaining on the Ancestry-app-mailing-list any more? Your website should work interoperable with other genealogy programs that support the GEDCOM standard or Ancestry should remove the feature “Supports GEDCOM”.
Besides all the issues I have previously detailed in my last article (iOS5 First Impressions), I have a new issue. This is my sternest recommendation:
DO NOT UPDATE to iOS5 if you use Ancestry APP or CAMERA APP !
On the Mailing List: ANCESTRY-TREE-TO-GO-APP , people have been complaining that Ancestry APP does not work. Now Stanczyk knew it worked and it worked well … But that was BEFORE iOS5 came out.
I confirmed the problem exists on iOS5. It does not download the tree / GEDCOM properly (you get a synch error). If you had a previously downloaded a tree (before iOS5) then you can use that tree. Obviously any changes made on ANCESTRY.com will not be able to synched to your iPhone/iPad.
HOWEVER, if you update your family tree on the iOS5 device, then your changes can be synched in that direction and saved on the Internet and accessed at Ancestry.com. In fact, after you do that you can then get around the above problem. But you had to have a tree on your iOS device BEFORE you upgraded to do this work-around. After synching from iPhone to web. I am NOW able to synch in both directions again.
I suspect this is an Ancestry problem and not an Apple problem. However for portable genealogy this is a PROBLEM. This is a case where an early adopter is fine and the person who just got his/her first iOS device and it came with iOS5 is not able to participate in the portable genealogy revolution.
–Stanczyk