Inconsistent Culling of Matches and Shaky Leaf Hints on AncestryDNA

12/12/2014

The formatting of the blog posting may be odd if you are reading this in a Feed Reader or via e-mail distribution, so click on the title above (which is an active link) to view the website version.

I was aware that AncestryDNA would be reducing the number of DNA matches to something more reasonable (see, for example, When Less is More), so I was fully prepared for a significant reduction in DNA matches for myself and my parents. However, I expected some consistency with what would be done (whether I agreed with it or not), but in my limited experience, this is not what has happened.

As of 19 November 2014, when AncestryDNA made their changes, both my parents and myself had a reduction in the number of matches:

My father formerly had 17,527 matches and now has 1514 matches (91% reduction)
My mother formerly had 12,469 matches and now has 1441 matches (88% reduction)
I formerly had 13,898 matches and now have 1720 matches (88% reduction)

All our known ancestors are from the UK and since the AncestryDNA test isn't yet available in the UK (although it is expected to be available sometime early in 2015), we don't have any shaky leaf hints except with each other, and my father with the cousin discussed below. We also don't have any of the new DNA circles, as I selected Family Tree DNA for testing other relatives (about 25 of them to date); only my parents and I have been tested at all of the "Big 3" testing companies for autosomal DNA (23andMe is the third one).

Before the recent change at AncestryDNA, my father and I each had a shaky leaf hint with Deb, a known 4th cousin of mine (3rd cousin x1 removed for Dad), who for both of us was shown as a low confidence match, with an estimated relationship of 5th-8th cousin. Unfortunately, AncestryDNA doesn't provide any chromosome browser or any other tools tools for us to compare our DNA data on their website. However, because we all uploaded our data to the wonderful FREE website GEDmatch, we were able to see where we had matching DNA with Deb. There is a universal desire across the genetic genealogy community for AncestryDNA to provide information on the matching segments and/or some tools for us to do this ourselves, but they have been totally resistant to providing a chromosome browser or any tools whatsoever. They have purposely provided a "dumbed down" product, as they know that the majority of Ancestry users wouldn't use it; however, it is very disappointing that they refuse to provide any tools for those of us who would understand how to use them. See Chromosome Browser War (Roberta Estes, DNAeXplained, 30 Nov 2014) for a detailed explanation why this is so important – this blog posting also shows the difference between what AncestryDNA doesn't provide compared with the useful and essential tools that Family Tree DNA and 23andMe have available.

Using GEDmatch, I was able to do comparisons with the following kits:

My father: GEDmatch kit # A961084
Me (Sue): GEDmatch kit # A229901 and paternal phased kit # PA229901P1
4th cousin Deb: GEDmatch kit # A145432
My Uncle: GEDmatch kit # F315518

The table below shows the matching DNA on doing one-to-one comparisons on GEDmatch, which illustrates that Dad and I share the identical segment on Chromosome 12, and this applies to whether phased data (see Phasingfor an explanation) are compared or not – we both share 8.5 cM with Deb on Chromosome 12. My paternal uncle shares more DNA with Deb, a total of 30.9 cM across 3 segments (7.4 cM on Chromosome 6, 15.0 cM on Chromosome 9, and the same 8.5 cM segment as Dad and me on Chromosome 12) – he hasn't been tested at AncestryDNA. When doing one-to-one comparisons on GEDmatch (Deb vs. Dad and my phased kit), I also checked using a minimum segment length of 1 cM, rather than the default of 7 cM, which shows we also have another short matching segment (4.3 cM for Dad and 3.7 cM for me) on Chromosome 9 over the same area as the 15.0 cM segment my uncle has, so this is likely to be "real" (IBD). With the addition of the short segments (we don't have any others besides this one), Dad and I share 12.8 cM and 12.2 cM, respectively, and Deb and my Uncle share >30 cM.

Comparison (GEDmatch #s)	Chromosome	Start Location	End Location	cM	SNPs
Deb (A145522) vs. Dad (A961084)	12	128881883	132276195	8.5	1072
Deb (A145522) vs. Sue (A229901)	12	128881883	132276195	8.5	1072
Deb (A145522) vs. Sue (PA229901P1)	12	128881883	132276195	8.5	1070
Deb (A145522) vs. Uncle (F315518)	6	123277300	130092303	7.4	1315
	9	74117429	85135617	15	2914
	12	128881883	132276195	8.5	1073

It is therefore really odd that although Deb has a shaky leaf hint with my father as previously (shown as predicted 5th-8th cousin, with a confidence of moderate), Deb has disappeared from my AncestryDNA match list. Since the longest segment my father and I share with Deb is identical and seems to be above AncestryDNA's minimum segment length to be declared a match of ~6 cM, I'm really puzzled why she isn't on my DNA match list any longer. My parents and I have all been tested at AncestryDNA and linked appropriately on my tree, so AncestryDNA is able to use my fully phased results for identifying appropriate DNA matches. If you have any suggestions for why this has happened, please add a comment below.

The only "logical" explanation I can think of is that maybe AncestryDNA is taking into account gender for determining the length of a segment, although that doesn't make any sense to me when they have my phased data (and I haven't heard any mutterings about this). When I check on Rutger's Map Interpolator, which uses their smoothed v.2 maps (Build 36), the sex averaged result for positions 128881883-132276195 on Chromosome 12 is 8.5 cM, with the Female and Male values being 4.9 cM and 13.7 cM, respectively – so my "female" segment is then well below 6 cM. Ann Turner, one of my favorite responders on the DNA-NEWBIE message board, provided instructions for how to use this interpolator (if you belong to this group, see Message 38320). Using my example:

On Rutger's Map Interpolator, select Chromosome 12 on the pull-down list
For Query, select "physical positions (bp) only"
In the box, put "128881883 132276195" (or you can put one position on one line and the other on a separate line)
The output screen will appear as
No. Chr Query_Posn Sex_Ave_cM Female_cM Male_cM
1 12 128881883 167.4464 210.0233 126.4723
2 12 132276195 175.9759 214.8793 140.1783
Subtract the respective values on row 1 from those on row 2 for Sex_Ave, Female, and Male

I was waiting for Deb to "transfer" her AncestryDNA results to Family Tree DNA to see our match lists there before finishing this blog posting. I wasn't surprised to find on FTDNA that Deb and my Uncle are shown as matches, but neither my Dad nor I are. The reason for this is that besides requiring a minimum matching segment of ~8 cM to be declared a match, Family Tree DNA's algorithm also requires that the total matching segments (down to 1 cM) be at least 20 cM, in order to minimize the chance of false-positives. So my issue with AncestryDNA isn't that Deb didn't appear on my match list, but rather why they have appeared to have handled me differently from my father. But all of this emphasizes the importance of everyone uploading their data to GEDmatch, which allows you to select your own criteria for making comparisons between any two individuals (whether they on the "one-to-many" match list on GEDmatch or not).

In addition to uploading AncestryDNA results to GEDmatch, I also highly recommend that everyone also "transfers" their results to Family Tree DNA (only the results are transferred, not your actual DNA sample), so you are fishing in as many DNA ponds as possible. The cost for doing this has been permanently reduced to $39, so for those in the US, the most cost-effective way to be in 2 DNA databases is to test at AncestryDNA (regular price $99 + shipping, but there is currently a sale on until 21 Dec 2014 for $89, and you may be able to find a coupon code for free shipping), then transfer your results to Family Tree DNA (click on FTDNA Autosomal Transfer) for $39 (no sample required, so no shipping). If you have tested at 23andMe, you will be able to upload your results to GEDmatch – however, you won't be able to transfer your results to FTDNA unless you were tested with the V.3 chip; the 23andMe V.2 test (sold prior to November 24, 2010) and the 23andMe V4 test (sold from November/December 2013) are NOT compatible with FTDNA's Family Finder product.

By the way:

For a limited time, you can still download a list of your former matches before the culling, so especially in view of my experience here, I strongly recommend you do this while you still have the chance, in case you also had any legitimate matches in the past that you can no longer see. Go to your AncestryDNA Home page and click on SETTINGS (to the right of your picture, with the gear icon), then in the Actions box on the right, click on Download v1 DNA Matches.
I've found a way to actually print off the detailed and technical Matching White Paper and also to "Learn More about DNA Circles" document. In addition to the latter being really hidden away if you aren't actually in a circle (see Debbie Kennett's blog posting Improved Cousin matching at AncestryDNA, 20 Dec 2014, for instructions for how to locate these), AncestryDNA also makes it extremely difficult to print more than a page of each document. They don't provide a link to a PDF, which could easily be printed off, and also if you try to use [Ctrl + A] to select all on a PC, that doesn't work either. But if you drag with your mouse to select the whole article, you can then copy everything and paste this into a Word document, but make sure you do this using the option to Keep Source Formatting. Virtually all of the formatting copies over very well, including the figures, equations, etc. Much as though I would like to provide an easy link here to the PDF I created myself from the resulting Word document, I didn't want to run the risk of having Ancestry's lawyers coming after me because of copyright issues! [NOTE: Debbie Kennett has posted a much easier way in the Comments section.]

Please leave a comment if you have any idea why AncestryDNA has kicked my 4th cousin off my match list and yet she still is on my father's. Thank you.

Sitemap / Blogmap
Subscribe to Blog

9 Comments

Jason Lee link

12/12/2014 10:51:14

Ancestry uses its Timber algorithm to create "adjusted cM lengths in a somewhat 'ad hoc' way to determine the genetic relationships between people."

Matches are culled on the basis of what many of us are calling "pile ups." The pile up regions are defined on an individual basis, so that might explain the "inconsistent culling" described above.

http://dnagenealogy.tumblr.com/post/105007101253/timber-chopping-down-matching-segments-at

Sue Griffith link

12/12/2014 13:09:03

Hi Jason:

Thanks for taking the time to read and make a comment. I had already read your posting about Ancestry's Timber algorithm (which I think you posted today) and thought it was coincidental that we posted around the same time about a somewhat related topic.

I understand about Ancestry culling "matches" based on regions of pile ups, and presumably this explains the majority of the ~90% reduction of matches with me and my parents. However, I don't think this is a likely explanation for the discrepancy between the way that Ancestry retained cousin Deb on my father's match list and yet removed her (my only shaky leaf hint and known 4th cousin) from mine when I've inherited an identical segment (fuly phased) from my father. I had already checked with GEDmatch's DNA Segment Search (Tier 1 tool) to see if this seemed to be an area of "pileup" either for my own kit or for that of my mother (I have a few segments on GEDmatch where there is a tall stack of matches, which I pretty much ignore), but that isn't the case here. Besides Deb, Dad, and my uncle, I have only 2 other matches over this area on Chromosome 12. I have also mapped a much longer segment (45 cM) on my maternal chromosome that overlaps the paternal segment with Deb on Chromosome 12 (see http://www.genealogyjunkie.net/chromosome-mapping-close-family.html, which needs to be updated with matches from several more relatives), and besides 2 known relatives matching her there (a known 1st cousin x1 removed and 3rd cousin), my mother has only 1 other match at this location identified on GEDmatch's DNA Segment Search. So there is no indication that this is a region of pileup, either for my mother or me. And with the ability to fully phase my results (both of my parents have been tested at AncestryDNA), I would hope that there would be a way for Ancestry to differentiate between my legitimate phased segment (IBD) versus "pileup". However, unless Ancestry provides segment information or a chromosome browser for us to make our own assessments, relying on faith with Ancestry doing the right thing (with no data to corroborate) just doesn't wash with the growing number of us who are capable of analyzing the data ourselves.

I would love to be able to recommend AncestryDNA unreservedly as the number 1 company for autosomal DNA testing, due to their far superior trees, but I suspect this will never be the case. Maybe they could bury a chromosome browser within their website as well as they have managed to hide the DNA Circles information to those of us not in a circle, so the majority of users (those who the product is "dumbed down" for) aren't aware of its existence, but I realize this is wishful thinking.

Sue

Jason Lee link

12/13/2014 15:39:02

I can't say that you're wrong, Sue. Ancestry doesn't give us enough information to come up with definitive answers. But I think it's essentially impossible to know where your designated "pile up" regions are located. Pile ups are based on tiny windows perhaps only 100 SNPs in size or smaller. You could have multiple pile up windows within a segment that is 8 or 9 cM in size. With multiple pile up windows within a segment that size, the segment could be eliminated. But a larger segment in the same region would remain on the match list if its recalculated size remains above the threshold after the pile up windows are cut from it.

There's no way to do a similar analysis at GEDmatch. You can't set the SNP count minimum threshold below 500. Even if you could, GEDmatch's database is so much smaller than Ancestry's you certainly wouldn't have the same pile up profile.

I think that highlights a couple of unsettling quirks in Ancestry's matching scheme.

1. A match is not determined only by the DNA that you share with another customer; it's also defined by the DNA you share with everyone else on all parts of your genome. The downweighting of a pile up window is effected by your other pile up windows.

2. A match is not determined only by the DNA that you share with another customer; it's also defined by the people who choose to use Ancestry. I.e., if AncestryDNA suddenly became wildly popular in the Pacific Northwest, the "pile up profile" might change significantly. Likewise if Ancestry opened up to the international market.

I assume the people at Ancestry mean well, but their haphazard approach to DNA matching is nearly risible.

12/12/2014 15:02:58

"I also checked using a minimum segment length of 1 cM, rather than the default of 7 cM, which shows we also have another short matching segment (4.3 cM for Dad and 3.7 cM for me) on Chromosome 9 over the same area as the 15.0 cM segment my uncle has, so this is likely to be "real" (IBD)."

I think you are leading yourself astray here. Blaine has a blog post dedicated to this assertion about small segments, and it is worth reading and understanding his argument.

I for one ignore these small segments (and believe FTDNA is quite wrong in using them), as we all will match somewhere at 1cM. For example, using only 1cM and keeping 700 SNPs as the limit, my mother matches all of you on gedmatch, including your uncle on 3.7cM and 3.4cM segments. However, there is no reason to believe you and my mother are related any more recently since the founding of Jamestown.

As for why AncestryDNA kept a match for your father but not you, when it appears you inherited that region of his chromosome whole - good question. I do not remember AncestryDNA stating that they use gender-specific linkages. One possible source of problem is the phasing engine. When there is a "switch error" one can get false positives.

Also, even if you are not aware of a pile-up in that part of of chr12 in which you are looking, that doesn't mean there is not one in the general population. Of those thousand of previous matches that disappeared there well could have been many who had been matched to you or your father in this region.

Since the region on which you are concentrating is so small (and remember that AncestryDNA is not using B36 like gedmatch is doing), any small error or adjustment could have made whatever identical region fall below the declarative threshold.

My mother lost a genealogical 4th cousin in the reduction of matches. In my case it is not such a big loss, but since there are a substantial number of genealogical 4th cousins with whom one won't share a statistically significant chromosome region I'm not so sure if the initial "match" was that solid to begin with.

Sue Griffith link

12/12/2014 21:40:40

S9: Yes, I've read Blaine's article, as well as several others recently relating to short segments. I only checked down to 1 cM on GEDmatch to see if I could find a reason to explain what has happened, not because I generally take any notice of them. Although in this particular instance, I think Dad's ~4 cM segment on Chr. 9 (with 960 SNPs) may be IBD, as it overlaps his brother's longest (15 cM) segment.

I take your point about pile-up in the general population. With Ancestry's resistance to provide basic segment information, we will never be privy to information on our own pileup areas.

I need to go ahead and phase us all myself. Luckily my father's maternal 1st cousin (and her daughter) happen to be a match at the same location on Chr. 12 too (Deb is a paternal cousin) and I have everyone's raw data.

12/13/2014 12:16:50

Note that I should have written "false negative" rather than "false positive", about phasing engines.

Debbie Kennett link

12/12/2014 19:35:46

Sue, Thanks for highlighting this issue. I'm afraid I don't know what the answer is. I've shared your post with the large ISOGG Facebook group because I know that there are some AncestryDNA employees who are in that group and I hope that they will look into the matter. It would indeed help if we had access to the segment data so that we could understand what is going on.

If you want to make a PDF of the Ancestry white papers there is in fact an easy way to do so if you use the PrintFriendly website:

http://www.printfriendly.com/

Ann Turner kindly brought this website to our attention on one of the ISOGG lists.

Sue Griffith link

12/12/2014 21:54:36

Thanks, Debbie. I should have thought of trying PrintFriendly. I've added a note in the blog posting.

Mitch Hallam

1/10/2015 10:59:48

Hello Sue,
I've been doing my own research on my family and I came across your hyperlink. I've always wanted to do a DNA test to get true results on my ancestors. I saw that one of the surnames you are looking for was Hellam and now over time surnames may have changed with a letter or so. I am interested if mine, Hallam, is in some way associated with your ancestors. Thank you.

Your comment will be posted after it is approved.

Inconsistent Culling of Matches and Shaky Leaf Hints on AncestryDNA

Leave a Reply.

Author

Contact Me

Subscribe to Blog

Blog Categories

Archives