Tuesday, February 19, 2013
Development and Psychopathology
Chris Beam and I have a new paper out at Development and Psychopathology. Link here.
Monday, February 18, 2013
Realism and Gloom
Steve Hsu replied to my blog post. (Almost a week ago! He has had about ten blog posts since then. I have never been able to keep up with the pace of blogging.)
Steve wonders why I am so gloomy about the prospects for an explanatory genetic science. His optimism is based on a "model" linked here.
With all due respect, that isn't much of a model. All it says is that SOMEHOW, something like height or IQ has to instantiated by all the genes that make up the identical twin correlation. All the main effects, plus all the (unspecified) interactions and nonlinear terms. Well sure, but that isn't saying anything except that identical twins are some kind of existence proof that it is possible for all the information to be added up in an organism to develop into a phenotype. I joked in my last post that we cold predict height or IQ if we could grow an identical twin for each of us. That is what organisms are: developmental computations over the near-infinite dimensionality of the gene-and-environment space. The problem is that we can't figure out how to reproduce that process using any finite combination rule on the actual DNA. It's like saying that in theory we ought to be able to predict the weather on the first Tuesday in March 2017, if we just get enough data, and use a model that combines all the linear and non-linear combinations. Except we can't, because a) It's a completely hypothetical argument, and b) There is chaotic non-linearity in between here and there.
Tim Bates also replies, here, mostly in the context of IQ. Some of the post isn't a response to me, but to hard-core environmentalists who believe "twin studies are fatally flawed"and that kind of thing, which has never been me.
The main point of his post is to wonder what is going to happen as sample sizes get bigger and bigger, allowing us to detect statistically significant effects of alleles with smaller and smaller effects. Tim expects that the genes that are identified will cluster in understandable biogenetic pathways, leading to cumulative brain science about intelligence. Maybe, but how does he know this? He cites height, but a quick glance at the BGAnet Facebook group will show that even the claim that height genes make sense is pretty controversial. One thing that isn't going to happen as sample sizes increase: the effect sizes of the SNPs aren't going to go up. We already have an unbiased estimate of that, and it is gloomy.
But here is the real point. Suppose you took the entire research program for IQ: twins and adoptees, on out to GWAS and biochemical pathways, and did it instead for marital status. We already know that marital status is heritable, and given that it is heritable, I don't see any reason that given big enough sample sizes etc etc, we wouldn't find SNPs that exceed 10 minus whatever. (Or is there an alternative, a way for something to be heritable without having significant SNP associations?) Would divorce SNPs cluster in biochemical pathways and lead to a neuroscience of marriage? Genetic reductionists have a choice. Either you have to explain why SOME things (height, IQ) are headed to genetic explanation via twin studies, GWAS, etc, while OTHER things (divorce, how much TV you watch) get the heritability but not the ultimate genetic explanation. OR you have to anticipate a world in which everything is explained by combinations of SNPs. Everything is heritable, so either everything is ultimately explainable in genetic terms, or some heritable things can't be decomposed into genetic molecules.
A serious math problem underlies all this. As sample sizes go up, we increase the power to detect significant effects of smaller and smaller SNPs, with diminishing returns on the total percentage of variance explained. It seems like it ought to be possible to estimate the distribution of SNP effect sizes from existing data, and then calculate how far out in the distribution we would have to go in order to explain, say, half the variance, which is what we can do easily by just predicting from the parents IQs. My guess is that we would have to get way way the hell out in the distribution of effect sizes, by which time the marginal effects would be so ridiculously tiny that the sample sizes required would not be in the tens of thousands but the billions. As I write this I have the feeling that someone must have already done it.
Steve wonders why I am so gloomy about the prospects for an explanatory genetic science. His optimism is based on a "model" linked here.
With all due respect, that isn't much of a model. All it says is that SOMEHOW, something like height or IQ has to instantiated by all the genes that make up the identical twin correlation. All the main effects, plus all the (unspecified) interactions and nonlinear terms. Well sure, but that isn't saying anything except that identical twins are some kind of existence proof that it is possible for all the information to be added up in an organism to develop into a phenotype. I joked in my last post that we cold predict height or IQ if we could grow an identical twin for each of us. That is what organisms are: developmental computations over the near-infinite dimensionality of the gene-and-environment space. The problem is that we can't figure out how to reproduce that process using any finite combination rule on the actual DNA. It's like saying that in theory we ought to be able to predict the weather on the first Tuesday in March 2017, if we just get enough data, and use a model that combines all the linear and non-linear combinations. Except we can't, because a) It's a completely hypothetical argument, and b) There is chaotic non-linearity in between here and there.
Tim Bates also replies, here, mostly in the context of IQ. Some of the post isn't a response to me, but to hard-core environmentalists who believe "twin studies are fatally flawed"and that kind of thing, which has never been me.
The main point of his post is to wonder what is going to happen as sample sizes get bigger and bigger, allowing us to detect statistically significant effects of alleles with smaller and smaller effects. Tim expects that the genes that are identified will cluster in understandable biogenetic pathways, leading to cumulative brain science about intelligence. Maybe, but how does he know this? He cites height, but a quick glance at the BGAnet Facebook group will show that even the claim that height genes make sense is pretty controversial. One thing that isn't going to happen as sample sizes increase: the effect sizes of the SNPs aren't going to go up. We already have an unbiased estimate of that, and it is gloomy.
But here is the real point. Suppose you took the entire research program for IQ: twins and adoptees, on out to GWAS and biochemical pathways, and did it instead for marital status. We already know that marital status is heritable, and given that it is heritable, I don't see any reason that given big enough sample sizes etc etc, we wouldn't find SNPs that exceed 10 minus whatever. (Or is there an alternative, a way for something to be heritable without having significant SNP associations?) Would divorce SNPs cluster in biochemical pathways and lead to a neuroscience of marriage? Genetic reductionists have a choice. Either you have to explain why SOME things (height, IQ) are headed to genetic explanation via twin studies, GWAS, etc, while OTHER things (divorce, how much TV you watch) get the heritability but not the ultimate genetic explanation. OR you have to anticipate a world in which everything is explained by combinations of SNPs. Everything is heritable, so either everything is ultimately explainable in genetic terms, or some heritable things can't be decomposed into genetic molecules.
A serious math problem underlies all this. As sample sizes go up, we increase the power to detect significant effects of smaller and smaller SNPs, with diminishing returns on the total percentage of variance explained. It seems like it ought to be possible to estimate the distribution of SNP effect sizes from existing data, and then calculate how far out in the distribution we would have to go in order to explain, say, half the variance, which is what we can do easily by just predicting from the parents IQs. My guess is that we would have to get way way the hell out in the distribution of effect sizes, by which time the marginal effects would be so ridiculously tiny that the sample sizes required would not be in the tens of thousands but the billions. As I write this I have the feeling that someone must have already done it.
Wednesday, February 13, 2013
I am prompted to dust off my little (well, never) used blog by a paper that was just published in Molecular Psychiatry. I have gotten a bunch of emails about it, mostly from people who seem to think it contradicts my outlook on behavior genetics. Link here, though it is behind a paywall unless your University gets you through it. I don't have any criticisms of the study itself, really. It is timely, well-done and interesting. I just don't think it is revolutionary, or even a harbinger of something revolutionary; it is a new way of demonstrating something we have known for a long time.
The research group put together a large consortium of studies with genome-wide SNP data on samples of children with IQ scores. They then searched for genome-wide significance for the individual SNPs (and didn't find any, although they are getting closer), conducted a gene- (as opposed to SNP) based analysis that identified one gene with a significant association with IQ, used genome-wide complex trait analysis to show that common SNPs jointly account for a substantial proportion of the variation in IQ, and built a multi-SNP predictor based on the SNPs most strongly related to IQ, which predicted 1.2, 3.5 and .5 percent of the variation in IQ in three replication samples.
What does all this mean? To understand it, you have to place it in context: the first of the three assertions in the title, that IQ is heritable, has been perfectly well established by twin and adoption studies for seventy-five years. It's good to show once again without the twins, but it is hardly news. The second assertion, that it is highly polygenic, has been pretty obvious for a long time also, and has become moreso recently.
But what of the GCTA and the predictive composite? GCTA is more like a twin study than it is like gene-finding. SNP arrays are used to define pairwise genomic similarity among "unrelated" individuals, and then genomic similarity is compared to phenotypic similarity. So yes, the heritability that was detected via quantitative genetics exists down in the SNPs somewhere, but where else would it have been? When the researchers create composites of actual SNPs, instead of just identifying SNP-based variance, they can account for a weighted mean of 1.7% of the variance, which is a correlation of r=.13. That, to me, is the bottom line: if we were start a program tomorrow to take SNPs from newborns and predict their intelligence, we would do so at a level much worse than predicting from the parent's income, for example, never mind from their IQ. And this part of the story is not one that we expect to improve as samples get bigger. The 1.7% was based on all the SNPs, not just those reaching some magical level of significance.
What we do expect as samples get bigger is that maybe some individual SNPs will reach that magical level. Steve Hsu predicts so, here. I say so what. Sure, if samples reach into the hundreds of thousands, a few SNPS with truly tiny effect sizes will be significant. Once again: no one sensible thought that maybe SNPs weren't associated with intelligence; the twin studies demonstrate that SNPs have to be associated with intelligence. The real question is whether, short of growing everyone an identical twin, we can figure out the combinatorial rules by which bits of DNA combine, so we can build useful scientific explanations or prediction models. I still see no signs that we are headed in that direction.
The research group put together a large consortium of studies with genome-wide SNP data on samples of children with IQ scores. They then searched for genome-wide significance for the individual SNPs (and didn't find any, although they are getting closer), conducted a gene- (as opposed to SNP) based analysis that identified one gene with a significant association with IQ, used genome-wide complex trait analysis to show that common SNPs jointly account for a substantial proportion of the variation in IQ, and built a multi-SNP predictor based on the SNPs most strongly related to IQ, which predicted 1.2, 3.5 and .5 percent of the variation in IQ in three replication samples.
What does all this mean? To understand it, you have to place it in context: the first of the three assertions in the title, that IQ is heritable, has been perfectly well established by twin and adoption studies for seventy-five years. It's good to show once again without the twins, but it is hardly news. The second assertion, that it is highly polygenic, has been pretty obvious for a long time also, and has become moreso recently.
But what of the GCTA and the predictive composite? GCTA is more like a twin study than it is like gene-finding. SNP arrays are used to define pairwise genomic similarity among "unrelated" individuals, and then genomic similarity is compared to phenotypic similarity. So yes, the heritability that was detected via quantitative genetics exists down in the SNPs somewhere, but where else would it have been? When the researchers create composites of actual SNPs, instead of just identifying SNP-based variance, they can account for a weighted mean of 1.7% of the variance, which is a correlation of r=.13. That, to me, is the bottom line: if we were start a program tomorrow to take SNPs from newborns and predict their intelligence, we would do so at a level much worse than predicting from the parent's income, for example, never mind from their IQ. And this part of the story is not one that we expect to improve as samples get bigger. The 1.7% was based on all the SNPs, not just those reaching some magical level of significance.
What we do expect as samples get bigger is that maybe some individual SNPs will reach that magical level. Steve Hsu predicts so, here. I say so what. Sure, if samples reach into the hundreds of thousands, a few SNPS with truly tiny effect sizes will be significant. Once again: no one sensible thought that maybe SNPs weren't associated with intelligence; the twin studies demonstrate that SNPs have to be associated with intelligence. The real question is whether, short of growing everyone an identical twin, we can figure out the combinatorial rules by which bits of DNA combine, so we can build useful scientific explanations or prediction models. I still see no signs that we are headed in that direction.
Subscribe to:
Posts (Atom)