Archive for ‘Data’

February 28, 2012

Dying For Diacriticals … Beyond ASCII — #HowTo, #Genealogy, #Polish

Stanczyk mused recently upon a few of the NAMEs in my genealogy:

Bębel, Elijasz, Guła, Leszczyński, Kędzierski, Wątroba, Wleciał, Biechów, Pacanów, Żabiec

If you want to write Elijasz (or any of its variants) you are golden. But each of the other names require a diacritic (aka diacritical mark). Early on, I had to drop the diacritics, because I did not have computer software to generate these characters (aka glyphs). So my genealogy research and my family tree were recorded in ASCII characters. For the most part that is not a concern unless you are like John Rys and trying to find all of the possibly ways your Slavic name can be spelled/misspelled/transliterated and eventually recorded in some document and/or database that you will need to search for. Then the import becomes very clear. Also letters with an accent character (aka diacritic) sort differently than  letters without the diacritic mark. For years, I thought Żabiec was not in a particular Gazetteer I use, until I realized there was a dot above the Z and the dotted-Z named villages came after all of the plain Z (no dot) villages and there was Żabiec many pages later! The dot was not recorded in the Ship Manifest, nor in a Declaration of Intent document. So I might not have found the parish so easily that Żabiec belongs to. I hope you are beginning to see the import of recording diacritics in your family tree.

How?

The rest of my article today teaches you how to do this. Mostly we are in a browser, surfing the ‘net, in all its www glory. After my “liberal indoctrination” (aka #RootsTech 2012), I have switched browsers to Google’s Chrome (from Mozilla Firefox) browser. Now I did this to await the promised “microdata” technology that will improve my genealogical search experience.  I am still waiting,  Mr Google !!!   But while I am waiting, I did find a new browser extension that I am rather fond of that solves my diacritical problem: Virtual Keyboard Interface 1.45. I just double-click in a text field and a keyboard pops-up:

Just double-click on a text field, say at Ancestry.com . Notice the virtual keyboard has a drop down (see “Polski“), so I could have picked Русский (for Russian) if I was entering Cyrillic characters into my family tree.

But I want to keep using my browser …            OK!  Now I used to prepare an MS Word document or maybe a Wordpad document with just the diacriticals I need (say Polish, Russian, and Hebrew) then I can cut & paste them from that editor into my browser or computer application as needed — a bit tedious and how did I create those diacritical characters anyway?

I use  Character Map in Windows and Character Palette -or- Keyboard Viewer  on the MAC:

Now if I use one of these Apps, then I can forgo the Wordpad document  ( of special chars. ) altogether and just copy / paste from these to generate my diacritical characters.

What I would like to see from web 2.0 pages and websites is what Logan Kleinwaks did on his WONDERFUL GenealogyIndexer.org website. Give us a keyboard widget like Logan’s, please ! What does a near perfect solution look like …

Logan has thoughtfully provided ENglish, HEbrew, POlish, HUngarian, ROmanian, DEutsche (German),  Slavic, and RUssian characters. Why is it only nearly perfect? Logan, may I please have a SHIFT (CAPITAL) key on the BKSP / ENTER line for uppercase characters? That’s it [I know it is probably a tedious bit of work to this].

Beyond ASCII ?

The title said  beyond Ascii. So is everything we have spoken about. Ascii is a standard that is essentially a typewriter keyboard,  plus the extra keys (ex. Backspace, Enter, Ctrl-F, etc.) that do special things on a computer. So what is beyond Ascii? Hebrew characters (), Chinese/Japanese  glyphs (串), Cyrillic (Я), Polish slashed-L (Ł), or Dingbats (❦ – Floral Heart). You can now enter of these beyond ascii characters (UNICODE)  in any program with the above suggestions.

Programmer Jargon – others  proceed with caution …

The above are all UNICODE character sets.  UTF-8 can encode all of the UNICODE characters (1.1 Million so far) in nice and easy 8bit bytes (called octets — this is why UTF-8 is not concerned with big/little endianess). In fact, UTF-8‘s first 128 characters is an exact 1:1 mapping of ASCII making ascii a valid UNICODE characters set. In fact, more than half of all web pages out on the WWW (‘Net) are encoded with UTF-8. Makes sense that our gedcom files are too! In fact UTF-8 can have that byte-order-mark (BOM) at the front of our gedcom or not and it is still UTF-8. In fact the UTF-8 standard prefers there be no byte order mark [see Chapter 2 of UNICODE] at the beginning of a file. So please FamilySearch remove the BOM from the GEDCOM standard.

If FamilySearch properly defines the newline character in the gedcom grammar [see Chapter 5, specifically 5.8 of UNICODE] then there is nothing in the HEAD tag that would be unreadable to a program written in say Java (which is UTF-16 capable to represent any character U+0000 to U+FFFF) unless there is an invalid character which then makes the gedcom invalid. Every character in the HEAD tag is actually defined within 8bit ascii which can be read by UTF-8 and since UTF-8 can read all UNICODE encodings you could use any computer language that is at least UTF-8  compliant to read/parse the HEAD tag (which has the CHAR tag and its value that defines the character set). Everything in the HEAD tag, with the exception of the BOM is within the 8bit  ascii character set. Using UTF-8 as a default encoding to read the HEAD will work even if there is a BOM.

February 27, 2012

PA Act 110 – Public Records (formerly known as Senate Bill 361)

This bill amends the Act of June 29, 1953 (P.L. 304, No. 66), known as the Vital Statistics Law of 1953, to provide for public access to certain birth and death certificates after a fixed amount of time has passed. This legislation provides that such documents become public records 105 years after the date of birth or 50 years after the date of death.

This is a mixed bag, but at least its consistent. I wish it was 72 years  (like the census) instead of 105. Also the 50 years after death is way too long. Dead is dead. Maybe you could make a case for 5-10 years. By doing greater than 30-35 years you are forcing genealogy research to skip generations since the current generation would die before gaining access. Genealogists will have to will research plans to children in PA.

The indexes (I hate the word indices) are here: Birth Index (1906 — so far that’s it) | Death Index (1906-1961).   By the way, you will need the American Soundex of the last name as this is how the records are sorted:  American Soundex of Surname, followed by alphabetical on FirstName. Use Steve Morse’s Soundex One-Step page.

February 19, 2012

Meme: #RootsTech — #Genealogy, #Technology

A while ago, Stanczyk bemoaned iOS5. Therefore, I owe it an update …

• Portable Genealogy is sound – Ancestry App better than ever
• The Camera App in iOS5 does have a zoom. In fact if you use the familiar “pinch-gesture” you can zoom in/out and the old zoom slider appears too. Also you can use the Volume Up button (on the side of the phone to take a picture — helpful when the camera is rotated.
• Just having the iPhone was very useful during the #RootsTech conference as my note taking device. Until iPad2(3) arrived(s) and it has both WiFi/G3 (LTE) I would have been without blogging capabilities in the Salt Palace convention center when its WiFi would go down. I utilized the #RootsTech App (for iPhone & there was one for Android too).
• In the library it was my digital  camera.
• In fact the ImageToText App came in handy to OCR an image of text for me
• I used the Ancestry App to enter the transcribed text from the microfilm images right into the evidence (note area) of the app of an indivividual and attached the iPhone picture too.
• In one case, I was able to get an immediate shaky leaf as a result of my data entry — much to my disbelief (and it was correct). So I could do an immediate on-site analysis and do further microfilm searching as a result.
• I used the Bump App to swap contact info with one genealogist. I cannot wait until all genealogists become mobile-enabled and lose my business cards altogether. Hint to RootsTech Vendors you should use Bumps too to collect user info. Why do I have to drop a business card into a fishbowl??? Do a BUMP,  get a chotsky (swag). Leave the fishbowl for  the Luddites.
• Are you a Slavic (Czech, Pole, Russian, etc.) genealogist? Then you must be dying for diacriticals. You could add an international keyboard. But why? In iOS5, just press and hold down the ‘ l ‘ key and up will come a list including the slashed-l. Just slide your finger over onto the slashed-l to enter that. Likewise, for entering ‘S, E, A, Z, C, N, etc.’ too — works upper/lower case. Of course if you have German ancestors, you can get your umlauts too in the same fashion. That trick is a Latin Alphabet data entry trick (sorry Cyrillic or Hebrew readers — try the International Keyboard trick).
February 16, 2012

1940 US Census – Blank Forms — #Genealogy, #US, #Census

Legacy Family Tree has release blank US Census Forms (page1 | page2) for the 1940 US Census. April 2nd is coming, are you prepared? Is Ancestry.com prepared?

At #RootsTech 2012, the 3rd keynote was an Ancestry talking-head panel. They joked about whether the website could withstand the crush on April 2nd. Let’s see how this experiment goes.

This is the first US Census to be released in an all digital format.

February 16, 2012

GEDCOM “RailRoad Tracks” (aka Graphic Syntax Diagram) – #Genealogy, #Technology

The above diagram is what Stanczyk had been jabbering about since the #RootsTech conference. Isn’t that much easier on the eyes and the grey matter than a complex UML diagram? Who even knows what a UML diagram is or if it is correct or not?

What does it say is in a GEDCOM file (ex.  Eliasz.ged)?

A HEAD tag  optionally followed by a SUBmissioN Record followed by 1 or more GEDCOM lines followed by a TRLR tag.

ex. gedcom lines  that can be “traced” along the railroad tracks at the top.

 0 HEAD
1 SOUR Stanczyk_Software
1 SUBM @1@
1 GEDC
2 VERS   5.5.1
1 CHAR  UNICODE
0 @1@ SUBM
...
0 TRLR

OK Stanczyk_Software does not exist, but was made up as a fictitious valid SOURce System Identifier name. The GEDCOM file (*.ged) is a text file and you can view/edit the file with any text editor (vi | NotePad | WordPad | etc.). I do not recommend editing your gedcom outside of your family tree software, but there is certainly nothing stopping you from doing that ( DO NOT TRY THIS AT HOME). If you knew gedcom, you could correct those erroneous/buggy gedcom statements that are generated by so many programs — that cause poor Dallan Quass to ONLY acheive 94% compatibility with his GEDCOM parser.

Have you ever downloaded your gedcom from ANCESTRY and then uploaded it to RootsWeb? Then you might see all those crazy _APID  tags.   It is a custom tag (since it begins with an underscore  — GEDCOM rules dear boy/girl).   It really messed up my RootsWeb pages with gobbledygook. I finally decided to edit one gedcom and remove all of the _APID tags before I uploaded the file to RootsWeb. Aaah that is SO much better on the eyes. Oh I probably do not want to re-upload the edited gedcom into ANCESTRY, but at least my RootsWeb pages are so much better!   The _APID is just a custom tag for ANCESTRY (who knows what they do with it) so to appeal to my sense of aesthetics, I just removed them — no impact on the RootsWeb pages, other than improved readability. [If you try this, make a backup copy of the gedcom and edit the backup copy!]

Now obviously the above graphic syntax diagram is not complete. It needs to be resolved to a very low level of detail such that all valid GEDCOM lines can be traced. It also requires me/you to add in some definitional things (like exactly what is a level# — you know those numbers at the beginning of each line).

I have a somewhat mid-level  graphic syntax diagram that I generated using an Open Source (i.e. free) graphic syntax diagrammer, as I said in one my comments, I will send it to whoever asks (already sent it to Ryan Heaton & Tamura Jones). You can get a copy of Ryan Heaton’s presentation from RootsTech 2012 and compare it to his UML diagram (an object model). I think you will quickly realize that you cannot see how GEDCOM relates to the UML diagram — therefore it is difficult to ask questions or make suggestions. A skilled data architect/data modeler or a high-level object-oriented programmer could make the comparison and intuit what FamilySearch is proposing, but a genealogist without those technical skills could NOT.

I am truly asking the question, “Can a genealogist without a computer science degree or job read the above diagram?” and trace with his finger a valid path of correct GEDCOM syntax [ assuming a whole set of diagrams were published]. The idea is to see how the GEDCOM LINES (in v5.5.1 parlance FAMILY_RECORD, INDIVIDUAL_RECORD, SOURCE_RECORD, etc.) are defined and whether or not what FamilySearch is proposing something complete/usable and that advances the capabilities of the current generation of software without causing incompatibilities (ruining poor Dallan Quass’s 94% achievement). Will it finally allow us to move the images/audio/video multimedia types along with the textual portion of our family trees and keep those digital  objects connected to the correct people when moving between software programs?

GEDCOM files are like pictures of our beloved ancestors. They live on many years beyond those that created them. Let’s not lose any of them OK?

February 13, 2012

Blog Bigos …

Stanczyk added a new Page (Tech Diary) to record my technology doings.

While doing that and reading from my blogroll (and emails), I discovered some history about the “defacto standard GEDCOM” (wiki: GEDCOM ). Now I strongly recommend you start from “defacto” link rather than the wikipedia link.

• RootsTech 2012 – had two GEDCOM presentations by Ryan Heaton (FamilySearch, GEDCOMX project).
• RootsTech 2012 – had one open source GEDCOM parser presentation by Dallan Quass. Dallan was quite remarkable in his efforts to achieve a 94% commonality amongst 7,000 different GEDCOM files. Dallan Quass has a GitHub project for his Open Source GEDCOM parser.
• Modern Software Experience (Tamura Jones) had a couple articles that caused me to write this article. His most recent GEDCOM article that caught my eye was:  BetterGEDCOM (2/2/2012). I also noticed he had a GEDCOMX article from 12/12/2011. These two articles provide a good discussion. I also noticed that the BetterGEDCOM project had their own project blog. [also see his Gentle Introduction to GEDCOM  article].

I believe those provide the most recent current thoughts on GEDCOM (that I have not penned).

• I have been studying GEDCOM v5.5 (the last GEDCOM standard).
• I produced a partial Graphic Syntax Diagram of GEDCOM v5.5 [what I had been calling “Railroad Tracks”] just to demonstrate how I thought this diagram was a better vehicle to communicate the standard [than say UML object models].
• I could not resist making slight tweaks to GEDCOM v5.5 even in my preliminary studies. Mostly so we could discuss GEDCOM in a readable fashion (i.e. whitespace for formatting, and comment lines ) or because the language cries out for consistency (i.e. requiring the HEAD tag to be a zero level, just like the TRLR tag).

My  Graphic Syntax Diagram of GEDCOM v5.5 was produced using an open source tool. It is partial and still high level. I did put in a construct so that you can clearly see all 128 standard tags. The Graphic Syntax Diagrammer is an excellent tool. I will have to offer the author a suggestion for the PNG images that it outputs. I need to take my diagram and manually edit it to make the drawing a better fit for 8.5″ x 11.0″ (aka A1) paper. I need to graphically wrap the railroad tracks and to add page breaks so that the image is itself usable for viewing/discussions. I will offer this sample drawing to any interested parties — including emailing the edited product to Ryan Heaton and Dallan Quass [who since they did not request it — can feel free to ignore it].

My goal is to make minor tweaks to  GEDCOM v5.5 via this diagram [not programming] and try and get DallanQ to produce a one-off parser for it (call it, say GEDCOM 5.5.999) and hope that my tweaks will not lower Dallan’s hard work of achieving 94% compatibility. If it turns out to have virtually no effect on Dallan’s 94% compatibility in his Open Source parser, then I can think about  getting some software vendors to utilize the enhancements (via end user requests), since they are trivial, just to move the standard forward and to open an interest in the vendors to looking at how we create a new Open Standard for GEDCOM.

P.S.

Thanks to Tamura Jones, I now know I need to update my diagram to GEDCOM v5.5.1 first

February 7, 2012

Genealogical Finds @RootsTech 2012

LDS Microfilm # 1192352 – Pacanów 1876-1877

Stanczyk’s, first find (after some time) was the 1876 Marriage of Walenty Paluch to Magdalena Major. I research mostly: Eliasz, Leszczynski, Solomon, and Wolf (my grandparent’s lines & my wife’s grandparents lines). Of a necessity, I record affiliated families and siblings in order to break through the brick walls, but mostly I trace direct lineage, with additions for lineages of 2nd/3rd cousins’ lines who are genealogists (since we work collaboratively and I wish to record these genealogists in the tree and preserve the connection to me). Also since my Social Network Analysis experiment proved out,  in my mind,  I keep an eye out for the affiliated families now.

Well when I saw a marriage record (Akt Małżeństwa/Брак запись) index in 1876 Pacanów parish (parafia) that names: Paluch & Major — I was very interested to see  who might be involved.

Record (in Russian/Cyrillic)

#15 – Paluch Walenty

Major Maryanna

Pacanów

The names are written in reverse order (fairly common in this parish, but not quite universally done).

On 15 March 1876 (Gregorian date, 2nd date of the double dates) there was a marriage between Walenty Paluch age 20 (born about 1856), born in Beszówa and the son of Jan & Agnieszka Paluch — wait a minute Jan & Agnieszka are my great-grandmother Maryanna Paluch Elijasz ‘s parents, therefore Walenty Paluch is my great-grandmother’s brother.

Ok so now this affiliated family name is of interest to me! Who is Walenty Paluch  marrying ?

Magdalena Major, age 18 (born about 1858), born in Dobrowodzie, but living in Pacanów parish, who is the daughter of Martin & Katarzyna Major — wait a minute these are my great-grandmother Aniela Major Leszczyński  ‘s parents. That means that Magdalena Major is my great-grandmother Aniela’s sister!  Wow this is amazing that two of my great-grandparent’s siblings are marrying each other!

OK so this is a marriage between an Elijasz great-grandfather’s affiliated  relative (brother-in-law) and a Leszczyński  great-grandfather’s affiliated relative (sister-in-law). I guess that Social Network Analysis pays another dividend to my research.

So how cool is that? If I had ignored Paluch and/or Major as not in my direct lineage, I would not have found this record and found two previously unknown siblings to my great-grandmothers. I also see that I need to research Beszówa parish for Paluch family data and that Dobrowodz village (I do not know this village or if it is a parish) is a place to go search for Major family data.

— RootsTech 2012 Treasure

February 5, 2012

Google Me Some Shiny New Genealogical Data

Google was at RootsTech 2012. Google was a Keynoter, Google was a Vendor and Google was a presenter. Google was in the house. The tech gear had some Android devices in the audience too.

Only Apple had more technology there. Unfortunately, it was among the users, developers, and presenters. Tim Cook bring Apple to RootsTech 2013!!! Your customers deserve Apple to give the same presence as Google. As I said in my last article, iPads, iPhones, MacBooks (mostly Pro, but some Air) — the attendees were so tech laden you would have thought Ubiquitous Computing had arrived. Isn’t there a recession? Where did all these tech warriors come from? These were users a bit more than developers. Bloggers were numerous, most wore Mardi Gras beaded necklaces so they were recogizable. Then you had secret bloggers such as Stanczyk. Everyone was a genealogist. Users encouraged Vendors/Developers with praise and requests for more/better technology. Oh and make the tech transparent.

But this is about Google. Before the conference I had written the Google tech off as too low brow to bother with. Then Jay Verkler showed up — who is apparently the Steve Jobs of genealogy. He was the Keynoter on day one. Stanczyk is a genealogist and I have been to genealogy conferences before. These are usually staid affairs. Genealogists are … how should I put it … umm, old. It is not unusual to see octogenarians and nonogenarians (90’s). But the energy in the auditorium of 4,200 conference attendees was electric. These were not stodgy, Luddites. Notebooks and pens were almost nonexistent!! People were excited and very much anticipating — what, I do not think we had a clue, but expectations were off the charts.

Jay did not disappoint. He was personable and masterful in his presentation skills. Mr Verkler is a Visionary like Steve Jobs and the audience knew it and responded. It was Jay who weaved the vision which everyone now wants ASAP. He brought up Google and my eyes were prepared to glaze over. I did not even record the Google execs’ names [shame on me]. They were good! They had prepared for RootsTech and they showed brand new tech and also Microcode. I do not have words to express what I saw, but everyone in the audience wanted it.

Google showed Microcode which would be a Google Chrome plug-in and appear as a widget/icon in the address bar that can do amazing search/exchange tricks in a Web 2.0+ way. It would utilize Historical-Data.org in some unspecified way to do this genealogy magic. It was beyond amazing. Google created a genealogy plug-in!! Google is apparently also coordinating in an API-like way to transfer these search result magics into other websites like FamilySearch, Ancestry, etc. that put this magic into the beyond amazing realm.

Firefox and Safari take note if you do not want to see a massive shift to Chrome. I am pretty sure all genealogists will use Chrome when Microcode widget arrives.

February 5, 2012

The RootsTech Conference is living up to its name. Everywhere there was a sea of: iPhones/Androids, iPads (in huge numbers), and laptops. Even the very elderly were geared up. Google, Dell, and Microsoft were at RootsTech. — why not Apple, especially since their customers were present in LARGE numbers??? [note to Tim Cook have Apple sponsor and show up as a vendor.]

According to Ryan Heaton (FamilySearch), “GEDCOM is stale.” He went on to speak about GEDCOMX as the next standard as if GEDCOM were old and/or dead. They were not even going to make GEDCOMX backwards compatible! In a future session I had with Heaton I asked the Million dollar question, “How do I get my GEDCOM into GEDCOMX”? After a moments pause he said they’d write some sort of tool to import or convert the existing GEDCOM files. Well that was reassuring??? So they want GEDCOMX to be a standard but FamilySearch are the only ones working on it and they have not had the ability to reach out to the software vendors yet (I know I asked).

My suggestion was to publish the language (like HTML, SQL, or GEDCOM). I asked for “railroad tracks“, what we used to call finite state automata, and what Oracle uses to demonstrate SQL syntax, statements that are valid with options denoted and even APIs for embedding SQL into other programming languages. Easy to write a parser or something akin to a validator (like W3C has for HTML).

Dallan Quass  took a better tack on GEDCOM. His approach was more evolutionary, rather than revolutionary. He collected some 7,000+ gedcoms

GEDCOM Tags

and wrote an open source parser for the current GEDCOM standard (v5.5). He analyzed the flaws in the current standard and saw unused tags, tags like ALIA
that were always used wrong, custom tags and errors in applying the standard. He also pointed out that the concept of a NAME is not fully defined in the standard and so is left to developers (i.e. vendors) to implement as they want. These were the issues making gedcoms incompatible between vendors. He said his open source parser could achieve 94% round trip from one vendor to another vendor.

Now that made the GEDCOMX guys take notice — here was their possible import/conversion tool.

The users just want true portability of their own gedcoms and the ability to not have to re-enter pics, audio, movies over and over again. RootsTech’s vision of APIs that would allow the use of “authorities” to conform names, places, and sources would also help move genealogy to the utopian future Jay Verkler spoke of at the keynote. APIs would also provide bridges into the GEDCOM for chart/output tools, utilities(merge trees), Web 2.0 sharing across websites / search engines / databases (more utopian vision).

GEDCOM is the obvious path forward. Why not improve what is mostly working and focus on the end users and their needs?

FamilySearch get vendors involved and for God’s sake get Dallan Quass involved. Publish a new GEDCOM spec with RailRoad tracks (aka Graphic Syntax Diagrams) and then educate vendors and Users on the new gedcom/gedcomx.    Create a new gedcom validator and let users run their current gedcoms against it to produce new gedcoms (which should be backward compatible with old gedcom to get at least 94% compliance that Quass can already do)!

Ask users for new “segments” in the railroad tracks to get new features that real users and possibly vendors want in future gedcoms. Let there be an annual RootsTech keynote where all attendees can vote via the RootsTech app on the proposed new gedcom enhancements.

How about that FamilySearch? Is that doable? What do you my readers think? Email me (or comment below).

P.S.
Do Not use UML models to communicate the standard. It is simply not accessible to genealogists. Trust me I am a Data Architect.

Tags: ,
January 30, 2012

Genealogy This Week … #Genealogy, #Technology, #Polish, #GroundHog

To Stanczyk, it appears that 2012 has gotten off to a sluggish start (genealogically speaking). How about for you genealogists (email or comment)? Well that is all about to change !   Lisa Kudrow‘s Who Do You Think You Are?, returns this Friday with Martin Sheen as the subject.

RootsTech 2012 kicks off this week too. Did you notice, they have an app (its free) for that? Even better they will STREAM some of the conference for the benefit of all genealogists !   Kudos to Roots Tech — All Conferences (genealogical or not should do these two things: app and stream conference proceedings). This should definitely jump start genealogy.

Read these blogs. Yes, I am telling you its ok to read other blogs than this one. These people are “official Roots Tech bloggers”.

I discovered that I missed one of my holiday blogs (in my backlog) about the happy married couples in Pacanów parish from 1881. So I will post the names of 40 Happy couples and what record # (Akt #) they are in the Pacanów parish church book.  This is two years after my great-grandparents got married, but there is still a Jozef & Mary who are getting married (Jozef Elijasz). I once had to sort out the two Jozef Elijasz from 1879 and the one from 1881 who all married women named Mary in the village of Pacanów! Genealogy is hard.

Oh and Punxsutawney Phil will make an appearance this week and offer his weather prognostication skills (I really think his predecessor Pete was much better and more alliterative too). I am pretty sure Phil & Pete are German, so you will need a German genealogy site for their lineage. Quaint tradition (Pennsylvania), dragging a Ground Hog from its home to ask him about weather. I think Bill Murray’s movie captured it well. So be careful what you do this week, or you may be repeating it a few times.

January 27, 2012

Pathways & Passages – Journal of PGSCT&NE — #Polish, #Genealogy, #Society

Stanczyk, thinks he just got the new issue of Pathways & Passages. I’m not certain because it says 2011 on the cover and in the page footers. But of course, who doesn’t have a hard time writing the new year on their checks.

That aside, their column, “Online Resources” was particularly good this issue (whichever it was, Summer 2011 -or- Summer 2012).

For the PA Polonia …

They had two online resources. For Schuykill County, PA (moje zona has family from that county — in fact I stumbled upon this site a few years ago). So I can state it is a very good from my own experience.

They did mention Lackawanna County, PA (but did not give the URL — so off to Google for you). There are marriages: 1885-1995 and an index to wills.

The Next Online Resources …

Passaic County, NJ – Naturalizations.  This turned out to be an EXCELLENT find! I found a Jozef Zwolski  whose ship manifest I had found before. Now Jozef was a brother of Roman Zwolski and both of these men are sons Jan Zwolski & Petronella Elijasz ! They happen to be from both Biechow & Pacanow parishes. Joseph’s Declaration of Intent was listed and you could view the image (and download a PDF of the document)! So I now have a birthday for Joseph and it matches up well to his ship manifest and his residences in Russian-Poland match up well too — so I am pretty convinced I have my ancestor.

Joseph apparently served in WWI and is taking advantage of privileges as a citizen soldier to become an American.

Antwerp Police Immigration Index. This last resource given, I would not have thought to look into (not having any Belgians in my direct lineage). But apparently, if you stayed longer than normal before your passage to American (from Antwerp port), you had to register with Antwerp Police. A good many Polish must have fell into that category. I did not find any of mine, but did find some whose last names match those in my family tree. If you do find your ancestor — you have a name and a village to ascertain that you have the correct person. But you will gain a birthday. This is another nice database from  FamilySearch.org.

I am glad I belong to some of the various Polish Genealogical Societies — these little resources sometimes pay big dividends.

January 24, 2012

Genealogy 2012 – State of the Union

If you follow Stanczyk‘s posts, then you know the first 2012 Genealogical Website Ratings were published yesterday. I wanted to follow-up on that article’s meme with yet a further muse.

The ratings show that there was quite a bit of a shuffling around. Overall though, genealogy websites are nascent. That is my meme for today:  The State of Genealogy is Very Good and Is Improving. In a little over a week, RootsTech 2012 conference will happen. The convention shows many of the top web sites are attending: Ancestry.com, Fold3.com, FamilySearch.org, Mocavo.com, LegacyFamilyTree, MyHeritage, RootsMagic, Geni.com, AgesOnline, etc. In the middle of this conference, the “Who Do You Think You Are“, show will debut (3-Feb-2012). Late March brings us PBS’s “FINDING YOUR ROOTS…” So the first quarter looks promising. Do you doubt this jester?

Perhaps the Baron’s Online article, ” ‘Tis the Season For Ancestry.com” will convince you. Bob O’Brien (the author) analyzes  the stock performance of Ancestry in light this convergence. He does not reference RootsTech nor PBS — but this jester does. Also adding to the synergy for 2012 Genealogy is the release of the 1940 US Census on April 2. So 2012 has all the makings for genealogy’s best year ever. Baron’s does mention the 1940 Census too.

Now a successful business climate for genealogy – software, hardware, and services can only mean many good things will be coming for us genealogists. Let me urge you to greater heights in your research by lending your efforts in your research and also in collaborating on the Internet. We can all push our own research (and of course those distantly related to us) forward and ride the rising tides of the 2012 Genealogy Surge.

For good measure the biennial United Polish Genealogical Societies Conference in late April is also happening this year. So Polish Genealogy should be able to ride the tide of popularity too.

RootsTech looks like it will have its emphasis on the Internet with its evolving collaborative tools (social networks, HTML5, new databases, blogs, developer tools/frameworks/standards to enhance the collaborative/connection making nature of genealogy and provide richer search/match tools/techniques, etc.). Catch this break-out year!

That’s the Meme – The State of Genealogy in 2012 is very promising.

January 23, 2012

2012 1st Quarter – Genealogy Website Rankings — #Genealogy, #Rankings, #Website

Welcome to Stanczyk’s  2012 First Quarter Genealogy Website Rankings. I know I am a week early — c’est la vie! Since my last rankings an array of rank postings [uh, pun partly intended] have appeared. Stanczyk has also received exactly one request for inclusion in his rankings, from .. Tamura Jones about his website: www.tamurajones.net [#58 on the new Rankings]. He also has a worthy Twitter page too. Keep sending in recommendations — I will keep thinking about them or including them if they are worthy. I liked Tamura’s stuff so MUCH, that I added his genealogy page to my blogroll [Modern Software Experience at the right].

I really liked the survey from the Canadian website: Genealogy In Time. I added their magazine/website (#13)  as well to my rankings.  I found them because they produced an excellent Genealogy Website Ranking (mid January 2012), that included a very thorough discussion of their methodology. They neglected a few Polish Websites that SHOULD have made their list. Also they list Ancestry.com in all of its many global incarnations and this eats up an unnecessary number of the top 125 poll slots.   But aside from those minor criticisms, their rankings is very GLOBAL and very good. Who knew there was a Chinese (make sense, considering their billion plus citizens and their excellent genealogical records) genealogy website or a Finnish website too in the top 125???

OK, Stanczyk will keep his Rankings  list, because of the emphasis on Polish / Slavic genealogical websites. Stanczyk also has many in the range 100-125 that are very useful though not popular enough to be the Genealogy in Time Rankings. However, the Genealogy-In-Time-Poll, makes a very useful tool in another way. They have graciously included the website links (URLs) of each site, making it rather easy to build a genealogical Favorites/Bookmarks list that is broadly useful. Stanczyk admits to his list being somewhat selective in the lower 1/3 in order to be more valuable to Polish Researchers (in particular to English speaking, though not exclusively so). On a personal note, this blog you are reading is in the top 5.8Million (of all websites world-wide) and is #120 on my Website rankings — come on readers give me a boost, please!

Needless to say, all website rankings I read, agree on the top 20-40 websites (putting aside the multiple listing of Ancestry.com).

Here is a snippet of the Rankings and the rest are on the Rankings Page:

January 13, 2012

Pacanow Marriage Statistics 1878-1884 – #Polish, #Genealogy

Stanczyk is obsessed with learning and understanding his ancestral villages. To that end, I spent the latter part of December analyzing the marriage records of Pacanów parish. As regular readers may know, Pacanów was in the Russian-Poland partition in the old gubernia (wojewodztwo/woj.) of Kielce which is north-east of today’s Krakow, Poland.  Pacanów  is now in the woj. of Swieto Krzyskie.

Today I have a graphic of a spreadsheet of the data I collected. Besides providing some demographics by the villages that made up the parish of Pacanów, it also gives you an inkling of the villages that comprise the parish [it may not be an exhaustive list]. You should also be aware that Catholic parish boundaries changed over time, just as they do today. So parish and dioceses may be different from earlier periods and also from those of the present time.

This was also an excellent exercise in practicing reading, transliterating, and translating Russian/Cyrillic to the Latin-based Polish alphabet. As always, the handwriting of the priest , the quality of the paper/book/ink  and even the original scanning of the church records affects your paleographic efforts. So scanning church records for a limited set of proper nouns can improve your paleographic/translating skills. After all, I know the noun has to be a village on the map (some map from that time period) so even difficult paleographic challenges can usually be resolved.

Results of Marriage Statistics

1878-1884 Pacanow Parish Marriage Stats By Village

For indexing/scanning purposes the villages are:

Karsy Duzy, Karsy Maly, Kepa Lubawska, Komorow, Kwasow, Niegoslawcie, Pacanow, Rataje, Slupia, Sroczkow, Szczeglin, Zabiec

I did not include Folwark Dolne as that is a manor house/ estate, (more so than an actual village).

January 7, 2012

OH – Cleveland/Cuyahoga County Eliasz/Elijasz #Polish, #Genealogy

Yesterday in the blog, Stanczyk emailed in an Ancestry database of note. They had an index of Marriages from Cuyahoga County, OH (the Cleveland area) 1810-1973. Most of these are marriage returns from the officiant and list little more than the bride, groom and marriage date and the officiant. Some do in fact list ages of the bridal party or their residences and even two of mine had the parent names.

Now this plays into an earlier blog article of mine about the Cleveland Eliasz/Elijasz, asking for any ancestors to write this jester and discuss family trees. [None so far.]

I was hoping for and found the marriage record of Stanislas Hajek and Agnes Eliasz ! Of all the Cleveland Eliasz/Elijasz this marriage was most convincing to me that they are relatives,as both Stanislas and Agnes (Agnieszka) were from Pacanow, which is my grandfather’s birth village. From a Polish Genealogical Society website (genealodzy.pl) email I received from a Baran, whose grandmother was an Eliasz, and from Ship Manifests, I was able to place this Agnes Eliasz in my family tree as a daughter of Jozef Eliasz & Theresa Siwiec (whose direct line ancestor a while ago sent me my grandparent’s marriages records – civil and church).

Truly the Internet makes this world a smaller place. So today, I am transcribing the married couples from the Cuyahoga County, OH marriages returns of 1913 on the same page with Stanislas Hajek & Agnes Eliasz (from page 193):

Michael Blatnik & Mary Hocevar August 25th, 1913 [#21537]
John Spisak & Veronika Busoge August 25th, 1913 [#21538]
Joseph Wisniewski & Frances Kotecka August 25th, 1913 [# 21539]

Stanislas Hajek & Agnes Eliasz August 25th, 1913 [# 21540]

George Csepey & Helen Weiszer August 26th, 1913 [# 21541]

Boleslas Zaremba & Alexandra Alicka August 26th, 1913 [# 21542]

Louis Rutkowski & Anna Solecka August 26th, 1913 [# 21543]

Aloys Salak & Anna Pisek August 26th, 1913 [# 21544]

Almost all of them look Slavic and most of those names are Polish. Cleveland, a large GreatLakeCity, an American enclave of Poliana in the early 20th century.

Enjoy!

December 18, 2011

Polish Resources – Cobbled from Ancestry.com/PGSA.org and Family Search – #Polish, #Genealogy

Stanczyk, put together a couple of pieces to make a NEW and useful Polish Genealogy database. First off, my email box had a weekly email from Ancestry.com.  This week’s Weekly Discovery is a boon for Polish Genealogists …

U.S. and Poland, Catholic Parish Marriage Index, Polish Genealogical Society of America,
1767–1931

Ok, the above link takes you to Ancestry’s newest database index (http://search.ancestry.com/search/db.aspx?dbid=70048&enc=1) .  Which as the Link Name suggests is a Polish Catholic Parish Marriage Index. I was excited until I discovered that it was really just a re-issue of the PGSA.org ‘s  database: http://www.pgsa.org/CzuchMarAll.php . So if you are not a subscriber to Ancestry, you could just go to PGSA and use their database and get the same results. The PGSA even gives an LDS Microfilm #. So Stanczyk took note of an Anna Eliasz marrying Leon Zielinski in 1910 and the LDS MF#: 1578072 . I made a vow to look that record up in the LDS microfilm. So I was in the LDS Library Catalog verifying the microfilm # was correct and LO and BEHOLD (why is it always LO and BEHOLD — and not just BEHOLD), the Library Catalog says the images are online!!! They even provided a link:

https://www.familysearch.org/search/image/index#uri=https%3A//api.familysearch.org/records/collection/1452409/waypoints

Now thankfully the database did specify 1910 and that the church was St. Stanislaus Kostka and even the Page# 204 was helpful. I used those pieces of info and the Family Search link to go to their web page:

• Illinois, Chicago, Catholic Church Records, 1833-1925

I selected the St. Stanislaus Kostka (Chicago) to go to the web page:

From there,  I picked Marriages, 1910-1915 (you need a free login to use their databases) and browsed the images until I got to page# 204 (which was actually image # 109 of 897) and on the left hand page was Leon Zielinski & at the bottom Anna Eliasz marriage record from the church. I got the actual date and parent names (including mother’s maiden name). See below …

I am not certain that Anna Eliasz is a relative or not because it did not provide the parish where Anna was born (and I seriously doubt Anna was born in Chicago in 1882). Her mother’s maiden gives me hope as that name does appear in my ancestral villages, so now I will have to find an Anna Eliasz birth record (or not) in Biechow/Pacanow parishes with parents Jan Eliasz & Mary Jurek.

The point of today’s article is that by joining the index in PGSA.org (or Ancestry.com) and using the index data with the browseable images from FamilySearch.org I was able to pull a new Church Parish record quite easily without leaving my house. It is the combination of the two resources from two separate websites that make a new and very useful tool. What do you think?

If you have Chicago ancestors (and in particular Polish ones) then you have an early Christmas or Chanukah Present. Drop me a comment of thanks, will ya?

Merry Christmas & Happy Chanukah and just in case,  Happy Holidays to the rest of my readers.

December 17, 2011

A Little Bit of Blog Bigos … #Genealogy, #Website #Rankings, #SSDI

Stanczyk has a lot of catch-up to do. I blame it on the season and the Blood Red Lunar Eclipse — certainly that must be cause of the madness this December.

SSDI

So many blogs have written about the Social Security Death Master File and the many related issues. First millions of records were dropped by the SSA. Next the SSA, and this has probably been going on for months, started redacting the names of the parents on the SS5 Applications, thus eliminating the usefulness of that research tool. Now Congress has bullied the paid genealogy databases (and even Rootsweb) to drop the SS# from their databases on deaths in the last ten years. Rootsweb just dropped their Social Security Database altogether!

Now let me remind the lame (not lame duck) Congress that the Social Security Death Master File is used to inform banks/financials/loan companies/credit card companies etc. that these SS#’s are of the DECEASED and that they should not grant any NEW credit applications with the Social Security Numbers in the Social Security Death Master File! Ergo, having the SS# of a dead person should not avail any criminal and should in fact result in their arrest for fraud, as the afore mentioned companies are supposed to check the Social Security Death Master File against credit apps. Therefore, there is really is no need to  eliminate the SS#’s from Ancestry.com or any other database. By eliminating these numbers you cannot order the SS5 Applications — which is just as well since the SSA has made them much less useful. The result is: genealogists have less data available and the US Government has less MONEY(\$) available since the genealogists now have two reasons not to order the SS5 Applications any longer. The result is the US Government will now lose another source of income??? Boy, is this CONGRESS the biggest bunch of idiots or what?

Eastmans / Website Rankings

As before, let me remind new genealogists that this Genealogy Website Ranking could be utilized to create or augment your genealogy Bookmarks/Favorites. Obviously, they are valuable since a LOT of genealogists visit them.

MOCAVO

I forgot to mention about Mocavo.com (I put it into the newest Genealogy Website Rankings). I have briefly mentioned Mocavo.com before (when I found them in my blog analytics). They are a new search engine, akin to Google. However, they are a Genealogy Search Engine and as such is enhanced to understand GEDCOM, genealogy, dates, places, etc. and their search results are more intensely accurate then say what you would get from Google. They also have the ability search databases and include those in results, as well as GEDCOMs. You have the ability to submit your family tree (GEDCOM) to Mocavo and they can provide you with notices of potential new matches — much like Ancestry.com does for their subscribers. So instead of Googling you Family Tree, try MOCAVOing your Family Tree.

December 6, 2011

An Open Letter to: Jim Delany (Big 10), John D. Swofford (ACC), Larry Scott (PAC 12)

To: Jim Delany (Big 10) John D. Swofford (ACC) Larry Scott (PAC 12)

An Open Letter to: Jim Delany (Big 10), John D. Swofford (ACC), Larry Scott (PAC 12)

12/6/2011

Re: BCS Poll

You should immediately quit the BCS. It is rigged against you and your three conferences. If you read my letter then you should see from my analysis, that the “computer polls” are inherently biased (and perhaps worse than the two human polls that make up the other 2/3 of the BCS rankings).

First off, I used the Human Polls (Harris Poll & USA Today/ESPN) as the normative index. If you say this ok then you can accept my analysis. If you reject it, then you should be pitting LSU against Oklahoma State in the BCS Championship Bowl Game, because that is what the Computer Polls would have made the result if there were no human polls as a part of the BCS Index.

My analysis clearly shows that the computer polls OVERWHELMINGLY favor the BIG12 and have a strong bias in favor of the SEC too. At the same time it is OVERWHELMINGLY rigged against the BIG 10 and strongly biased against the ACC and the PAC 12 conferences.

The analysis shows that the Big 10, ACC and the PAC 12 would have to overcome a huge bias by the computer polls via the Human Polls to have any chance to reach the BCS Championship Game. You should realize that by selecting the SEC every year to play in the BCS Championship Game, you keep the bias in the computer polls and it will become a self-fulfilling prophecy each and every year. That means the BIG Money will continue to flow unchecked into the SEC (and also to lesser degree to the BIG 12) as it is a “virtuous cycle” upwards for these two conferences who get the best recruits and booster money because they are ALWAYS in the BCS Championship Game.

Now that you have given in to the precedent of two teams from the same conference in the BCS Championship Game (should be a rule against this) you will see a heavy bias to that year after year, since that is all new recruits will see and the “virtuous cycle” will persist. Also, did you realize that the computer and the human polls will emphasize the next year’s polls based upon the previous year, via the pre-season polls?

The root cause you will see is that two computer polls in particular: Kenneth Massey & Jeff Sagarin strongly overemphasize Big 12 teams and SEC teams also have a strong positive bias, while at the same time, these two same computer polls also demonstrate an under-emphasis of the Big 10 and a strong negative bias against the ACC and PAC 12. The effect is what we have seen for the last few years and culminating in this years SEC-only Championship.

If you want to keep the BCS Polls, then you will need to do five things to improve them and their perception as fair:

1. Make a rule that the BCS Championship can NEVER have two teams from the same conference. This should be self evident.
2. Make the remaining computer polls submit their algorithms to an audit before the season starts and a week before/after the final BCS rankings \to ensure that these computer algorithms are “bias free” from human intervention and that the same results are achieved in the before/after of the final rankings (i.e no tampering and results are reproducible i.e. no randomness).
3. You must get rid of one or both of Kenneth Massey of Jeff Sagarin computer polls. The dual combination skews the biases in favor of BIG12/SEC and against the BIG10/ACC/PAC12. If you only get rid of one, then the initial removal should be Jeff Sagarin. The two computer polls show the same bias and are merely echoes of each other thus giving them an undue advantage over the other four computer polls. The Jeff Sagarin poll is merely MORE pronounced (in its biases) than the Kenneth Massey poll.
4. No 4 loss or 5 loss TEAM can ever be eligible for a BCS Bowl Game. You need this rule to prevent obvious bias from contaminating the system.
5. No 2 loss TEAM can play in the BCS Championship Bowl Game (substitute the next highest ranked team that does not violate rules 1 & 5).

Mind you the Anderson & Hester computer poll exhibits some bias too, but it at least it is not in COMPLETE lock step with the Kenneth Massey or Jeff Sagarin polls. Otherwise, please dismantle the BCS system and just have 4 super football conferences and take the conference champion from each and have these four teams play a semi-final and a final game to determine the national champion fairly. See the attached spreadsheet data, cut/pasted into the next page and do you your own analysis to validate my findings and see if you reach the same conclusion. Please pay special attention to TEXAS in the final rankings if you wish to be totally disgusted by the computer polls – there is no mathematics that can justify that conclusion by computers, unless there is a BIG12 bias. The computer polls would have made TEXAS, a 7-5 team, the 19th ranked team overall in the whole country and the two offending computer polls would have made TEXAS 13th in the country and eligible for a BCS at Large Bowl Game. Can you imagine? Only TEXAS and AUBURN (BIG12 & SEC) have 5 losses in the BCS Top 25. In fact there are no other 5 loss or any 4 loss teams!

Someone should commend the Richard Billingsley, Colley Matrix and Peter Wolfe computer polls for their ability to keep bias from skewing their rankings.

Anderson & Hester can and should do better in their computer algorithm.

 2011 FINAL BCS POLL Human Polls A/H RB CM KM JS PW Comp   Polls Comp     Diff Diff Summ LSU SEC 1 0 0 0 0 0 0 1 0 0 BAMA SEC 2 -1 -1 -1 -1 0 0 3 -1 -4 OKLA St B12 3 1 1 1 1 0 0 2 1 4 Stanford P12 4 0 0 -1 -4 -6 -3 5 -1 -14 Oregon P12 5 -7 0 -3 -5 -4 -1 8 -3 -20 Arkansas SEC 6 -1 -2 -6 1 2 2 5 1 -4 Boise St MWC 7 -2 1 0 -6 -6 -1 9 -2 -14 Kans. St B12 8 3 1 4 4 3 3 4 4 18 SCaro SEC 9 -1 1 -2 0 1 0 10 -1 -1 Wisc B10 10 -5 0 -5 -6 -9 -2 14 4 -27 VaTech ACC 11 -2 0 -2 -3 -10 -6 13 2 -23 Baylor B12 12 1 2 -5 2 6 5 11 -1 11 UMich B10 13 2 -3 4 -6 -9 -5 15 2 -17 OKLA B12 14 8 5 8 7 8 4 7 -7 40 Clemson ACC 15 -4 0 -3 -5 -2 2 16 1 -12 Georgia SEC 16 2 -4 0 5 5 2 12 -4 10 Mich St. B10 17 -3 4 -4 -7 -7 -5 21 4 -22 TCU MWC 18 -4 4 -1 -5 0 3 17 -1 -3 Houston CUSA 19 3 0 5 -2 -6 0 18 -1 0 Nebraska B10 20 3 2 3 -5 -3 0 19 -1 0 So. Miss CUSA 21 25 -1 -1 25 25 5 24 3 78 Penn St. B10 22 1 1 2 25 25 -1 23 1 53 West VA Beast 23 25 25 -1 25 25 25 25 2 124 Texas B12 24 7 25 2 11 11 0 19 -5 56 Auburn SEC 25 0 1 25 8 11 4 21 -4 49 -7 11 -8 -35 -37 -2 -17 -78 Skew By Conference ACC -6 0 -5 -8 -12 -4 -35 B10 -2 4 0 -24 -28 -13 -63 B12 20 7 10 25 28 12 102 PAC12 -7 0 -4 -9 -10 -4 -34 SEC -1 -5 -9 13 19 8 25

Source: 12/5/2011 Philadelphia Inquirer Final BCS Standings

The bottom five teams were unranked in one or more computer polls making their data unfit for some of the analyses – these were not used in the bottom analysis of Skew By Conference.