Ancestry and DNA Tests

I'm writing this post in response to a number of comments that I see online with regards to using a commercial DNA test, in order to ascertain ancestry.  Quite often, when someone asks how to find out their family history or ancestry, someone will come back with an answer in the form of "just spit in a vial, send it to Ancestry.com, and they'll tell you".  It's not really that simple, so I'm making this post, to explain how an ancestry DNA test can help, or not help, you discover your ancestry.  Nicely dumbed down I hope, for the beginner.

Traditional Genealogy

Traditional genealogists usually set out to create a genealogy (family history and tree), using interview techniques, artefacts, and oral memories, recorded from older relatives.  Artefacts might for example, include old family medals, or photographs.  They then extend the research, through documentary evidences, such as birth, death, and marriage certificates, church registers, census records, transcripts, electoral rolls, and military records. If they are interested in recording all ancestral information, and not merely a single line such as the surname line, then this research can go on for months, years, even decades.

What you cannot do, is to simply pay a small fee, and your entire family history drops through the letter box in a brown envelope.  It takes years of time to research, collate, and to verify a good family tree.  Most genealogy enthusiasts don't mind this, because they actually enjoy doing the research itself.  It becomes a hobby, even sometimes a passion.

However, a number of commercial DNA companies may give the general public the impression, that you now can simply pay a fee, spit or swab, and your ancestry magically appears for you on a website.  It's big business.  Does it work though?  Exactly what is genetic genealogy?

What is Ancestry and why do we care.

Ancestry can simply be defined as our descent from forbearers.  Why do we care who they were? Which forbearers or ancestors?  How many are there?  How far back?

Of course, not every one does care.  Not everyone cares about history.  But for other's how we define ourselves, our communities, and families, it does matter.  It tells us who we are, where we came from.  It defines us, gives us grounding.  It gives us identity.  Wars have often been inspired by ancestry.  At the same time, a deeper appreciation of the human family, and it's common ancestry, can be used to relate to those elsewhere.  One big family.  Discovering the immense poverty and hardships of our ancestors can help us to appreciate what we have, and to help others in need today.

So what ancestry can we discover?  For those few that merely concentrate on one patriarchal line, it's quite simple to define - the generations of a surname.  However, beyond that one narrow line of descent, few appreciate exactly how much total ancestry that we have.  Lets look at our biological ancestors at each generation:

  • 2 parents
  • 4 grandparents
  • 8 great grandparents
  • 16 great grandparents
  • 32 g.g grandparents
  • 64 g.g.g.grandparents
  • 128 g.g.g.g grandprents
  • 256 g.g.g.g.g grandparents.

These are only your 510 most recent direct ancestors, yet just those generations, will take you back to only around 250 years of family history.  Now add all of the recorded children of these direct ancestors - the great great uncles and aunts to the theoretical family tree.  You're probable going to have a tree of around 1,300 individuals.  That is just for 250 years.  You have a big family  Go back a few more generations, and it will explode before you reach far.  All of those direct ancestors though, are a part of your ancestry.  You'll most likely carry some DNA from most of them.  They are, from a biological perspective, who you are.

By the way, the number of biological ancestors will not continue to increase infinitely.  Because increasingly, you will find couples within your tree that are distant biological cousins of each other.  As this accelerates through thousands of years, that explains how all modern people around the world, all descend from a very small population around 100,000 years ago.

So before considering what DNA can do for genealogy, we need to consider which ancestors matter to us.  Do you just want to know who your biological parents, or grandparents were?  Do you want to know the names, places and social positions of your ancestors over centuries?  Do you want to know which parts of the world that your ancestors lived 500 years ago?  Do you want to know how some of your prehistoric ancestors moved across the globe, thousands of years ago?  Maybe you want to know everything.

Let's now turn to genetics for genealogy, and how DNA tests can answer some of these questions.

There are two main types of DNA tests for ancestry, although they are often incorporated together by commercial companies:

  1. The haplogroups, the Y-DNA and mt-DNA
  2. Autosomal DNA
The Haplogroups

The haplogroups are chains, or markers, that are carried on one of only two strict lines of descent.  They do not apply to your entire ancestry - just two lines.  As we saw above, we have 256 g.g.g.g.g grandparents (unless any of their descendants reproduced together).  Our haplogroups came from only two of them.  Your haplogroup does not define you.  Yet, it's quite odd, because very quickly, many genetic genealogists do relate to them, rather like a hereditary football club.  They do become an identity, only if you enthuse over them.

The Y or paternal haplogroup, follows the strict paternal line.  From father to son.  Women do not have a Y chromosome, so cannot pass it on.  It has to come from the biological father.  However, within this constraint, Y-DNA is particularly useful to genealogists.  It mutates often, both as STRs and less often, as SNPs (snips).  Because of these frequent mutations, it is useful for tracing shared descent with others.  It can also be aligned with surname studies.  The champion commercial DNA company for Y-DNA research, is Family Tree DNA.

The mt or mitochondrial (maternal) haplogroup, follows the strict maternal line.  From mother to children.  Both sons and daughter inherit their mt-DNA haplogroup from their biological mother.  However, only the daughters can pass it down.  Two downfalls to mt-DNA for genealogy.  1) The surname frequently changes, traditionally nearly every generation through marriage. 2) it doesn't mutate as frequently as the STRs of Y-DNA. It is still a useful tool, and can prove descent through the maternal line.  It can also still be used for studies of much deeper, ancient ancestry.

Autosomal DNA

This is the bulk of you DNA.  All of the snips (SNPs), that make up who you are genetically.  You receive approximately 50% from each parent, 25% from each grandparent, 12.5% from each great grandparent.  This subdivision cannot go on forever, and indeed, once you go back much more than six generations, the approximates start to deviate, so that you may have no snips at all from a particular line that joined your family tree over 250 years ago.

The problem with autosomal DNA is that it can be such a mess.  It recombines randomly with every generation.  Therefore, it is much harder to track ancestry in the same way, that we can with the haplogroups.

So how can they be applied for genealogy:

Biological descent

Not everyone knows who their biological parents were, or where they came from.  This is the first use of DNA testing.  It can be used to find, test, or prove recent descent.  The first hurdle of genealogy.  Both haplogroup evidence, and autosomal evidence can be used to prove or determine relationship.

Cousins

Many genetic genealogists, use DNA to find distant, and sometimes not so distant cousins.  The hope is that they can link trees, share knowledge and research, perhaps copies of artefacts.  Therefore an awful lot of genetic genealogy is about tracing genetic relatives, and establishing common ancestry.

There are two main tools:

  • Haplogroup Projects.  The Y haplogroup is favoured for it's frequent STRs, and also for it's link to surnames.  Family Tree DNA projects track the STR and SNP data of it's members, tracking families, relationship, known mutations.  Project administrators at FTDNA can predict relationship to other members in the project.  Your Y cousins.
  • Shared segments.  Autosomal DNA can be used for finding distant cousins.  23andMe for example, have Relative Finder.  Alternatively customers of any commercial DNA company that gives them access to their raw data, can upload that data to GEDMATCH.  At GEDmatch, they can search for other kits, looking for lengths of shared segments (measured in cM - centimorgans) on the autosomes or X chromosomes.  The longer or more segments can be used to indicate shared ancestry.

It is important to understand, that this is not about directly tracing ancestry.  It is only about establishing shared biological ancestry, with other researchers, with which you may be able to share resources.  In the old days of genealogy, we would find distantly related researchers by browsing through annually printed surname interest directories.  Here, the same thing is happening, but we are finding people by comparing DNA.

Ancestry from Autosomes

Most commercial DNA companies providing ancestry information, now use their own propriety calculators to look at the autosomal DNA of their customers for patterns that they can relate to a number of reference populations.  23andMe for example, uses Ancestry Composition to determine what parts of the world, that the ancestors of their customers lived 500 years ago.  They predict from this in percentages of ancestry.

However, it is very much a developing art.  The problem is that genes have been randomly mixing and moving around ever since prehistory.  The customers of these DNA companies want hard facts.  They want their ancestry accurately pin pointed down to modern or ancient nation-states, or to historical populations such as the Vikings or Huns.  Ancestral DNA companies are under pressure to provide this deep ancestry.  However, can they?  Ancestral analysis of DNA can be very enlightening.  It can provide some surprises within a family history.  However, it's accuracy is exaggerated.  It is increasingly successful at predicting ancestry from a particular corner or end of a particular continent.  But it cannot for example, accurately tell French, British, and German ancestry apart to any high accuracy.  It can recognise some populations better than others.  It cannot tell anyone if they had Viking ancestry.

Ancient Ancestry

This is a particular value of the haplogroups.  As we accumulate more and more data on more mutations, as we expand the recorded database, and as we relate that to more ancient DNA extracted from referenced and dated ancient human remains, so we will be able to better explore the population genetics not only in history, but deep into prehistory.

However, it is also becoming increasingly realised, that patterns of ancient admixture can also be detected within the autosomes.  Although Autosomal DNA ancestry calculators claim to reveal relatively recent admixtures over the past 500 years, it is becoming clear that these are being confused by much older patterns of admixture.  These patterns can now be explored and probed on a number of GEDmatch programs.  People can compare their DNA with the kits from ancient DNA, or predict just how much of their ancestry was likely "Western Hunter-Gatherer, or "Early Neolithic Farmer".

In addition, more DNA companies are now measuring for much more ancient admixture with archaic populations such as the Neanderthals.

Conclusion

Genetic Genealogy is fun, great fun.  It is not however, a quick and easy replacement for traditional genealogy.  Unless you get lucky with some comparative Y-DNA in a project, it is not going to directly tell you the names or social status of any ancestors.  It can give you a phylogenetic tree, but not any kind of family tree that you can bore other family members with.

Genetic genealogy can provide some tools to some researchers.  It can test biological relationship.  It can be used to predict some of your ancient history.  For most researchers, particularly those that are able to interview many local family members, search local grave yards, access archives and records - it has no, or little value to the pursuit of collecting ancestors.

I personally love to explore my genetic genealogy. But it is documentary research that provides the names.  Genetic genealogy for myself, is more about the long and ancient journey.

The Three Ages of Genealogy

The above image was made from an opportunistic photocopy of a photograph held by a second cousin.  it is a portrait of Samuel William "Fiddler" Curtis.  He was one of my sixteen great great grandparents, and was born at Hassingham, Norfolk in 1852.  He worked as a teamster - an agricultural labourer that drove a team of horses in the fields.

1. The Past - Record Office Genealogy.

This was how I did genealogy almost exclusively twenty two years ago.  It still exists as a method.  It is still the most qualitative, and traditional research method.  It could be represented by a pair of white gloves - the sort that many record offices and archives insist that readers wear, while handling conserved records.  There is of course a cost.  Some parish registers for example, will suffer from handling, regardless of the level of care.  Otherwise I would recommend that all present day genealogists should practice it from time to time - in order to reference to the most original documents, or simply for the experience of handling these wonderful links to our ancestors.  I remember reading some parish records that I knew had been personally kept by my parish clerk ancestors.  I visited county record offices in Norfolk, Berkshire, Oxfordshire, Wiltshire, and Glamorgan.  I visited archives and the GRO in London.  Genealogy meant leaving the house and travelling.

Twenty two years ago, Digital Genealogy was in it's infancy.  The "IGI" was on microfische.  Censuses up to and including 1881 were available on microfilm.  Some parish registers were just starting to appear in the microfilm/fische room, but for many, I had to produce my readers card, don the white gloves, and carry a soft lead pencil.  Good times.  But sometimes frustrating.I had many dead ends.  If an ancestor moved more than a few parishes away, and preceded a census, you had to either spend years looking in so many parishes - or rely on a bit of luck.  You could of course find other researchers with shared interests.  They would advertise these interests in the columns of genealogy magazines, and in printed annual directories.

By the time that my personal interest drifted away from genealogy, things had already changed an awful lot.  Many more records had been photographed onto film or fische - to protect the original records from a growth of interest in family history.  Here in Norfolk, amateur genealogists were encouraged to use the film/fische reading rooms, rather than access the original documents.  Although some negatives were hard to read, it was much faster than ordering and waiting in a reading room by a ticket system.  People were also increasingly using the Internet as a way of sharing.  The IGI moved online.  We were also using database software programs such as Family Tree Maker, and sharing our .gedcom files online.

I then totally moved away from genealogy totally, for perhaps 12 years.

2. The Present - Internet Genealogy

My interest in genealogy and family history returned after that long break way.  What had changed?  What do I think of the current scene?  So many documents have been digitally photographed, transcribed, indexed, then fed onto online databases.  It's incredible.  Within a few months, my family tree has grown and grown.  I've picked up so many dead ends.  The IGI has evolved into FamilySearch.org, an incredible free online resource.  National archives have growing online collections.  There are commercial online subscription based resources galore competing - Ancestry, FindmyPast, MyHeritage, TheGenealogist, GenesReunited, FamilyLink, Genealogy, etc.  FreeBMD grows.  We can not only browse the England & Wales census online, but since I started researching 22 years ago, we now also have 1891, 1901, and 1911.  With a subscription we can even view them from our homes.

It gets much better though.  So much has been transcribed and indexed - then added to databases.  This means that we can database Search for missing ancestors.  This is the greatest advantage to Internet and database transcriptions - this ability to find them, where we might not have looked.  Also to find new details, to flesh out the bones of our ancestors - military records, criminal records, transportations.  In the old days, we would have needed to either visit a number of difficult archives in London, or hire an experienced professional genealogist to do this for us.  This is the sort of stuff that can now be accessed by the amateur from the comfort of the home.  There is a lot that is positive about the Present.

What can be depressing is that the margin for error has not only increased through badly transcribed indexes, but the ease of Internet search, and of copying previous research - duplicating error has greatly increased.  When I uploaded a skeleton direct ancestral tree to MyHeritage, I was plagued by the website, to add other people's work to my tree. However, when I look at their trees, very often, I don't agree with their conclusions.  I see what I believe to be errors.  Wrong generations married up.  Desperate looking links from parents many miles away - that when I investigate them, I can't verify.  I've very quickly learned to distrust other people's online trees.  I'll use them only as suggestions to investigate.

3.  The Future - Genetic Genealogy

The title of this section is a bit of a tease.  I was a bit of a sceptic of genetic genealogy.  Even now, I feel that people wishing to use DNA evidence for extending family trees should in most cases, save their cash.  However, I can see that one day in the future, genetic genealogy could be a serious tool.  What it presently lacks, particularly outside of the USA, is data!  It can only work, when enough people have recorded and shared enough DNA data online.  Even then, for anything else than measuring quite close relationships up to say, second or third cousin, autosome DNA does not offer much to the genealogist.  Most of our DNA is autosome.  Very useful for checking for recent non-paternal events.  Useful for example, for finding close biological relatives.

What I think will be of more use in the future, will be haplogroup DNA.  The Y-DNA and mt-DNA, and then - only when many, many more people, have submitted and recorded their DNA.  Even then, it will not produce a family tree.  It will identify common biological relations between researchers and other submitters.  Y-DNA will increasingly tie to surnames - and also mark the non paternal events where the haplogroup jumps from one surname to another.  FamilyTreeDNA are the forerunners in that field, with their DNA Projects.  Surname and geographic projects link actual family lines to certain haplogroups, clusters of haplogroups, STR markers, SNPs etc.  It's a great idea, but it's in it's infancy.

Imagine a future though, where not only most researchers have registered DNA data, but that of past generations - parents, grandparents, and even ancient DNA from archaeological sites.  This is where genealogy overlaps with anthropology.  Traditional genealogy traces ancestors from recent centuries.  DNA haplogroups show promise for tracing the general movements, admixtures, displacements of ancestors from thousands of years ago.  At the moment, genetic genealogy rarely supports traditional genealogy - rather, it compliments it with very different material.  In the future though, as if we continue to tie more SNPs and STRs to actual family lines, it'll start to mean something more to the historical period.  Actual surnames will start to attach to clusters.  At least that is how I see it.  I'm sure that the shareholders of the DNA testing companies would also like us to see that vision.

Our exotic Y Haplogroup L1b

A Y haplogroup is a genetic marker that is passed down on a paternal line.  From great grandfather, to grandfather, to father, to son, and so on it goes.  The mt-DNA haplogroup on the other hand, is a genetic marker on the maternal line.  Together, they represent only two lines of descent.  The below illustration demonstrates these two markers on our own family pedigree fan chart over recent generations:


What is exciting about these two human haplogroups, is that by recording their mutations, and plotting them both against both the geographical distributions of present-day populations, and of archaeological human remains, we can start to paint a picture of past movements and origins in population genetics across thousands of years.  We can start to see how some of our ancestors moved across the World during prehistory.  Haplogroups offer a personal touch.

My recent 23andme test reported that I have inherited an mt-DNA haplogroup H6a1 from my mother, and a Y haplogroup L2* from my father.  

My brother, sisters, and my sister's children should also share the mt-DNA haplogroup H6a1.  This mt-DNA haplogroup has recently been recognised as originating in Eurasia.  It mutated from earlier haplogroups from Central Asia.  Current thought based on recent evidence (2015) suggests that it was carried into Western Europe during the early Bronze Age, circa 5,000 to 3,500 years ago, by pastoralists that spread out of the Eurasian Steppes north of the Black Sea in the Ukraine and South Russia area.  These Steppe pastoralists have been associated with an archaeological culture known as Yamnaya, and H6a1 has been detected in female human remains there.  Archaeologists suggest that their success was in domesticating strains of horses, that they could ride, in order to manage larger herds and flocks of grazing livestock.  Another success may have been their development of wheeled carts, that could be horse drawn.  Whatever the factors were, they appear to have been so successful, that their descendants spilled out from the Steppes, dominating Bronze Age Europe.  Therefore based on current evidence and thought, it might seem fair to imagine that we have direct maternal ancestors that 5,500 years ago were women in this Eurasian Steppe Culture.  That is the personal touch of the haplogroup.

But what about the Y haplogroup L2* that we inherited from our father, and our paternal line?  My brother and my son should share this Y haplogroup.  I'm making this post to better understand this heritage.

Y Haplogroup L


Distribution Haplogroup L Y-DNA

Distribution of Y haplogroup L today.  Above image by Crates (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY-SA 4.0-3.0-2.5-2.0-1.0 (http://creativecommons.org/licenses/by-sa/4.0-3.0-2.5-2.0-1.0)], via Wikimedia Commons.

There were a number of surprises from my personal 23andMe DNA test results.  However, that my Y haplogroup is L2* was perhaps the biggest shock.  I take back all reservations that I had about DNA testing for ancestral purposes.

The 23andMe introduction that accompanied my reported Y haplogroup suggested "Haplogroup L is found primarily in India, Pakistan and the Middle East. The L1 branch is especially common in India, while L2 and L3 are more common further north.".  This is not an English haplogroup.  It is not even a European haplogroup.  It is regarded here as South Asian, spreading down from Afghanistan to Sri Lanka, and across from Iran and into Eastern Turkey.  The above map illustrates the distribution of the Y Haplogroup L as we presently know it.  However, the Y haplogroup L has sub groups, that until recently were designated as L1, L2, and L3.  These subgroups were not distributed equally across the above geographic distribution.

"M76 (current L1a1, former L1) is the most common subgroup in India, while M76 and M357 (current L1a2, former L3) have approximately equal weight in Pakistan. M317 (current L1b, former L2) is rare in the Indian subcontinent. Iran seems to have all three major subgroups, while Turkey appears primarily M357. Other papers have found additional markers. For instance, L1b can be divided into two subgroups, M247 and M349. The people who do not belong to L1 have not been studied in academic papers, but only in personal genetic tests. Their ancestry is European, but it is possible that this group is present in the Middle East or Caucasus, where few people have tested". (Marco Cagetti).

My actual 23andme (ISOGG 2009) assigned L2* mutation should, using the latest designations, be referred to as L1b or, L-M317. I am seeing suggestions that L-M317 may have originated as recently as 10,000 years ago, between Levant and the Iranian plain. My haplogroup L-M317 appears to be strongest in clusters across Western Asia, between Iran and Turkey, with reports in Iraq, Armenia, Georgia, Anatolia, the Chechen Republic, and the Russian Federation.  It is not South Asian.  Marco Cagetti suggests that it is at very low frequency in Southern Europe, less than 1%.  However, this table might suggest that there are stronger pockets of Y Haplogroup L in pockets across Italy.  It has been observed in Portugal, Spain, Italy, and along the Mediterranean.  A sub-clade, L-M317 M349, is found in the Levant, but also clusters in in Central Europe including Germany, Austria, the Czech Republic, and Switzerland.  M349 is subsequently believed to have originated in the Levant.

What about in England?  L1b doesn't appear to have been well documented or researched here.  The FTDNA Y Haplogroup L Project has mapped only three L submissions in the UK - including one undisclosed, one M349, and a single L-M317 - this one in the Basingstoke area, not a hundred miles from my surname carriers in South Oxfordshire.

The chances are, that my L1b will pan out to belong to the L1b1a M349 sub-clade.  It could relate the Rhine-Danube cluster recorded in Central Europe at the FTDNA Y Haplogroup L project.

So how did it get here?  Where do the European L1b's come from?   Some researchers suggest that it could actually be in quite old in Europe.  It could have spread westwards out of the Levant with the Neolithic Revolution, carried by the first farmers.  If this is the case, then it may have been severely displaced by the arrival of new waves of haplogroups that arrived in Europe later, during the Early Bronze Age, leaving just a few clusters to survive.  My Y could be a remnant of earlier European farmers, that were largely displaced by the same wave of haplogroups from the Steppes that carried my mt-DNA into Europe.

Alternatively, it may have arrived here any time later - during the Later Neolithic, or as is a popular theory, it could have been spread into Europe from the Pontic Greek clusters around the Black Sea, or from elsewhere, via the Roman Empire.  It may have even spread into Europe during the medieval.  Some people suggest Byzantine movements in Southern Europe as one possible source.  Others claim links between their L1b and Ashkenazi Jews in their ancestry - either known, or suggested by autosome ancestry composition testing.

It has been suggested that I commission a BIG Y test, but I cannot justify that cost. I think that it is worthwhile commissioning a Y STR test, in order to examine and provenance it. Then should future research bring up any new understanding, I'll be able to best place our lineage within it.  I've ordered the FTDNA Y111 test next.