One experience I’ve frequently had in Astronomy is that there is no result so obvious that someone won’t claim the exact opposite. Indeed, the more obvious the result, the louder the claim to contradict it.
There is a very obvious acceleration scale in galaxies. It can be seen in several ways. Here I describe a nice way that is completely independent of any statistics or model fitting: no need to argue over how to set priors.
Simple dimensional analysis shows that a galaxy with a flat rotation curve has a characteristic acceleration
g† = 0.8 Vf4/(G Mb)
where Vf is the flat rotation speed, Mb is the baryonic mass, and G is Newton’s constant. The factor 0.8 arises from the disk geometry of rotating galaxies, which are not spherical cows. This is first year grad school material: see Binney & Tremaine. I include it here merely to place the characteristic acceleration g† on the same scale as Milgrom’s acceleration constant a0.
These are all known numbers or measurable quantities. There are no free parameters: nothing to fiddle; nothing to fit. The only slightly tricky quantity is the baryonic mass, which is the sum of stars and gas. For the stars, we measure the light but need the mass, so we must adopt a mass-to-light ratio, M*/L. Here I adopt the simple model used to construct the radial acceleration relation: a constant 0.5 M⊙/L⊙ at 3.6 microns for galaxy disks, and 0.7 M⊙/L⊙ for bulges. This is the best present choice from stellar population models; the basic story does not change with plausible variations.
This is all it takes to compute the characteristic acceleration of galaxies. Here is the resulting histogram for SPARCgalaxies:
Do you see the acceleration scale? It’s right there in the data.
I first employed this method in 2011, where I found <g†> = 1.24 ± 0.14 Å s-2 for a sample of gas rich galaxies that predates and is largely independent of the SPARC data. This is consistent with the SPARC result <g†> = 1.20 ± 0.02 Å s-2. This consistency provides some reassurance that the mass-to-light scale is near to correct since the gas rich galaxies are not sensitive to the choice of M*/L. Indeed, the value of Milgrom’s constant has not changed meaningfully since Begeman, Broeils, & Sanders (1991).
The width of the acceleration histogram is dominated by measurement uncertainties and scatter in M*/L. We have assumed that M*/L is constant here, but this cannot be exactly true. It is a good approximation in the near-infrared, but there must be some variation from galaxy to galaxy, as each galaxy has its own unique star formation history. Intrinsic scatter in M*/L due to population difference broadens the distribution. The intrinsic distribution of characteristic accelerations must be smaller.
I have computed the scatter budget many times. It always comes up the same: known uncertainties and scatter in M*/L gobble up the entire budget. There is very little room left for intrinsic variation in <g†>. The upper limit is < 0.06 dex, an absurdly tiny number by the standards of extragalactic astronomy. The data are consistent with negligible intrinsic scatter, i.e., a universal acceleration scale. Apparently a fundamental acceleration scale is present in galaxies.
The radial acceleration relation connects what we see in visible mass with what we get in galaxy dynamics. This is true in a statistical sense, with remarkably little scatter. The SPARC data are consistent with a single, universal force law in galaxies. One that appears to be sourced by the baryons alone.
This was not expected with dark matter. Indeed, it would be hard to imagine a less natural result. We can only salvage the dark matter picture by tweaking it to make it mimic its chief rival. This is not a healthy situation for a theory.
On the other hand, if these results really do indicate the action of a single universal force law, then it should be possible to fit each individual galaxy. This has been done manytimesbefore, with surprisingly positive results. Does it work for the entirety of SPARC?
For the impatient, the answer is yes. Graduate student Pengfei Li has addressed this issue in a paper in press at A&A. There are some inevitable goofballs; this is astronomy after all. But by and large, it works much better than I expected – the goof rate is only about 10%, and the worst goofs are for the worst data.
Fig. 1 from the paper gives the example of NGC 2841. This case has been historically problematic for MOND, but a good fit falls out of the Bayesian MCMC procedure employed. We marginalize over the nuisance parameters (distance and inclination) in addition to the stellar mass-to-light ratio of disk and bulge. These come out a tad high in this case, but everything is within the uncertainties. A long standing historical problem is easily solved by application of Bayesian statistics.
Another example is provided by the low surface brightness (LSB) dwarf galaxy IC 2574. Note that like all LSB galaxies, it lies at the low acceleration end of the RAR. This is what attracted my attention to the problem a long time ago: the mass discrepancy is large everywhere, so conventionally dark matter dominates. And yet, the luminous matter tells you everything you need to know to predict the rotation curve. This makes no physical sense whatsoever: it is as if the baryonic tail wags the dark matter dog.
In this case, the mass-to-light ratio of the stars comes out a bit low. LSB galaxies like IC 2574 are gas rich; the stellar mass is pretty much an afterthought to the fitting process. That’s good: there is very little freedom; the rotation curve has to follow almost directly from the observed gas distribution. If it doesn’t, there’s nothing to be done to fix it. But it is also bad: since the stars contribute little to the total mass budget, their mass-to-light ratio is not well constrained by the fit – changing it a lot makes little overall difference. This renders the formal uncertainty on the mass-to-light ratio highly dubious. The quoted number is correct for the data as presented, but it does not reflect the inevitable systematic errors that afflict astronomical observations in a variety of subtle ways. In this case, a small change in the innermost velocity measurements (as happens in the THINGS data) could change the mass-to-light ratio by a huge factor (and well outside the stated error) without doing squat to the overall fit.
We can address statistically how [un]reasonable the required fit parameters are. Short answer: they’re pretty darn reasonable. Here is the distribution of 3.6 micron band mass-to-light ratios.
From a stellar population perspective, we expect roughly constant mass-to-light ratios in the near-infrared, with some scatter. The fits to the rotation curves give just that. There is no guarantee that this should work out. It could be a meaningless fit parameter with no connection to stellar astrophysics. Instead, it reproduces the normalization, color dependence, and scatter expected from completely independent stellar population models.
The stellar mass-to-light ratio is practically inaccessible in the context of dark matter fits to rotation curves, as it is horribly degenerate with the parameters of the dark matter halo. That MOND returns reasonable mass-to-light ratios is one of those important details that keeps me wondering. It seems like there must be something to it.
Unsurprisingly, once we fit the mass-to-light ratio and the nuisance parameters, the scatter in the RAR itself practically vanishes. It does not entirely go away, as we fit only one mass-to-light ratio per galaxy (two in the handful of cases with a bulge). The scatter in the individual velocity measurements has been minimized, but some remains. The amount that remains is tiny (0.06 dex) and consistent with what we’d expect from measurement errors and mild asymmetries (non-circular motions).
For those unfamiliar with extragalactic astronomy, it is common for “correlations” to be weak and have enormous intrinsic scatter. Early versions of the Tully-Fisher relation were considered spooky-tight with a mere 0.4 mag. of scatter. In the RAR we have a relation as near to perfect as we’re likely to get. The data are consistent with a single, universal force law – at least in the radial direction in rotating galaxies.
That’s a strong statement. It is hard to understand in the context of dark matter. If you think you do, you are not thinking clearly.
So how strong is this statement? Very. We tried fits allowing additional freedom. None is necessary. One can of course introduce more parameters, but we find that no more are needed. The bare minimum is the mass-to-light ratio (plus the nuisance parameters of distance and inclination); these entirely suffice to describe the data. Allowing more freedom does not meaningfully improve the fits.
For example, I have often seen it asserted that MOND fits require variation in the acceleration constant of the theory. If this were true, I would have zero interest in the theory. So we checked.
Here we learn something important about the role of priors in Bayesian fits. If we allow the critical acceleration g† to vary from galaxy to galaxy with a flat prior, it does indeed do so: it flops around all over the place. Aha! So g† is not constant! MOND is falsified!
Well, no. Flat priors are often problematic, as they have no physical motivation. By allowing for a wide variation in g†, one is inviting covariance with other parameters. As g† goes wild, so too does the mass-to-light ratio. This wrecks the stellar mass Tully-Fisher relation by introducing a lot of unnecessary variation in the mass-to-light ratio: luminosity correlates nicely with rotation speed, but stellar mass picks up a lot of extraneous scatter. Worse, all this variation in both g† and the mass-to-light ratio does very little to improve the fits. It does a tiny bit – χ2 gets infinitesimally better, so the fitting program takes it. But the improvement is not statistically meaningful.
In contrast, with a Gaussian prior, we get essentially the same fits, but with practically zero variation in g†. wee The reduced χ2 actually gets a bit worse thanks to the extra, unnecessary, degree of freedom. This demonstrates that for these data, g† is consistent with a single, universal value. For whatever reason it may occur physically, this number is in the data.
We have made the SPARC data public, so anyone who wants to reproduce these results may easily do so. Just mind your priors, and don’t take every individual error bar too seriously. There is a long tail to high χ2 that persists for any type of model. If you get a bad fit with the RAR, you will almost certainly get a bad fit with your favorite dark matter halo model as well. This is astronomy, fergodssake.
A subject of long-standing interest in extragalactic astronomy is how stars form in galaxies. Some galaxies are “red and dead” – most of their stars formed long ago, and have evolved as stars will: the massive stars live bright but short lives, leaving the less massive ones to linger longer, producing relatively little light until they swell up to become red giants as they too near the end of their lives. Other galaxies, including our own Milky Way, made some stars in the ancient past and are still actively forming stars today. So what’s the difference?
The difference between star forming galaxies and those that are red and dead turns out to be both simple and complicated. For one, star forming galaxies have a supply of cold gas in their interstellar media, the fuel from which stars form. Dead galaxies have very little in the way of cold gas. So that’s simple: star forming galaxies have the fuel to make stars, dead galaxies don’t. But why that difference? That’s a more complicated question I’m not going to begin to touch in this post.
One can see current star formation in galaxies in a variety of ways. These usually relate to the ultraviolet (UV) photons produced by short-lived stars. Only O stars are hot enough to produce the ionizing radiation that powers the emission of HII (pronounced `H-two’) regions – regions of ionized gas that are like cosmic neon lights. O stars power HII regions but live less than 10 million years. That’s a blink of the eye on the cosmic timescale, so if you see HII regions, you know stars have formed recently enough that the short-lived O stars are still around.
Measuring the intensity of the Hα Balmer line emission provides a proxy for the number of UV photons that ionize the gas, which in turn basically counts the number of O stars that produce the ionizing radiation. This number, divided by the short life-spans of O stars, measures the current star formation rate (SFR).
There are many uncertainties in the calibration of this SFR: how many UV photons do O stars emit? Over what time span? How many of these ionizing photons are converted into Hα, and how many are absorbed by dust or manage to escape into intergalactic space? For every O star that comes and goes, how many smaller stars are born along with it? This latter question is especially pernicious, as most stellar mass resides in small stars. The O stars are only the tip of the iceberg; we are using the tip to extrapolate the size of the entire iceberg.
Astronomers have obsessed over these and related questions for a long time. See, for example, the review by Kennicutt & Evans. Suffice it to say we have a surprisingly decent handle on it, and yet the systematic uncertainties remain substantial. Different methods give the same answer to within an order of magnitude, but often differ by a factor of a few. The difference is often in the mass spectrum of stars that is assumed, but even rationalizing that to the same scale, the same data can be interpreted to give different answers, based on how much UV we estimate to be absorbed by dust.
In addition to the current SFR, one can also measure the stellar mass. This follows from the total luminosity measured from starlight. Many of the same concerns apply, but are somewhat less severe because more of the iceberg is being measured. For a long time we weren’t sure we could do better than a factor of two, but this work has advanced to the point where the integrated stellar masses of galaxies can be estimated to ~20% accuracy.
A diagram that has become popular in the last decade or so is the so-called star forming main sequence. This name is made in analogy with the main sequence of stars, the physics of which is well understood. Whether this is an appropriate analogy is debatable, but the terminology seems to have stuck. In the case of galaxies, the main sequence of star forming galaxies is a plot of star formation rate against stellar mass.
The star forming main sequence is shown in the graph below. It is constructed from data from the SINGS survey (red points) and our own work on dwarf low surface brightness (LSB) galaxies (blue points). Each point represents one galaxy. Its stellar mass is determined by adding up the light emitted by all the stars, while the SFR is estimated from the Hα emission that traces the ionizing UV radiation of the O stars.
The data show a nice correlation, albeit with plenty of intrinsic scatter. This is hardly surprising, as the two axes are not physically independent. They are measuring different quantities that trace the same underlying property: star formation over different time scales. The y-axis is a measure of the quasi-instantaneous star formation rate; the x-axis is the SFR integrated over the age of the galaxy.
Since the stellar mass is the time integral of the SFR, one expects the slope of the star forming main sequence (SFMS) to be one. This is illustrated by the diagonal line marked “Hubble time.” A galaxy forming stars at a constant rate for the age of the universe will fall on this line.
The data for LSB galaxies scatter about a line with slope unity. The best-fit line has a normalization a bit less than that of a constant SFR for a Hubble time. This might mean that the galaxies are somewhat younger than the universe (a little must be true, but need not be much), have a slowly declining SFR (an exponential decline with an e-folding time of a Hubble time works well), or it could just be an error in the calibration of one or both axes. The systematic errors discussed above are easily large enough to account for the difference.
To first order, the SFR in LSB galaxies is constant when averaged over billions of years. On the millions of years timescale appropriate to O stars, the instantaneous SFR bounces up and down. Looks pretty stochastic: galaxies form stars at a steady average rate that varies up and down on short timescales.
Short-term fluctuations in the SFR explain the data with current SFR higher than the past average. These are the points that stray into the gray region of the plot, which becomes increasingly forbidden towards the top left. This is because galaxies that form stars so fast for too long will build up their entire stellar mass in the blink of a cosmic eye. This is illustrated by the lines marked as 0.1 and 0.01 of a Hubble time. A galaxy above these lines would make all their stars in < 2 Gyr; it would have had to be born yesterday. No galaxies reside in this part of the diagram. Those that approach it are called “starbursts:” they’re forming stars at a high specific rate (relative to their mass) but this is presumably a brief-lived phenomenon.
Note that the most massive of the SINGS galaxies all fall below the extrapolation of the line fit to the LSB galaxies (dotted line). The are forming a lot of stars in an absolute sense, simply because they are giant galaxies. But the current SFR is lower than the past average, as if they were winding down. This “quenching” seems to be a mass-dependent phenomenon: more massive galaxies evolve faster, burning through their gas supply before dwarfs do. Red and dead galaxies have already completed this process; the massive spirals of today are weary giants that may join the red and dead galaxy population in the future.
One consequence of mass-dependent quenching is that it skews attempts to fit relations to the SFMS. There are very many such attempts in the literature; these usually have a slope less than one. The dashed line in the plot above gives one specific example. There are many others.
If one looks only at the most massive SINGS galaxies, the slope is indeed shallower than one. Selection effects bias galaxy catalogs strongly in favor of the biggest and brightest, so most work has been done on massive galaxies with M* > 1010 M☉. That only covers the top one tenth of the area of this graph. If that’s what you’ve got to work with, you get a shallow slope like the dashed line.
The dashed line does a lousy job of extrapolating to low mass. This is obvious from the dwarf galaxy data. It is also obvious from the simple mathematical considerations outlined above. Low mass galaxies could only fall on the dashed line if they were born yesterday. Otherwise, their high specific star formation rates would over-produce their observed stellar mass.
Despite this simple physical limit, fits to the SFMS that stray into the forbidden zone are ubiquitous in the literature. In addition to selection effects, I suspect the calibrations of both SFR and stellar mass are in part to blame. Galaxies will stray into the forbidden zone if the stellar mass is underestimated or the SFR is overestimated, or some combination of the two. Probably both are going on at some level. I suspect the larger problem is in the SFR. In particular, it appears that many measurements of the SFR have been over-corrected for the effects of dust. Such a correction certainly has to be made, but since extinction corrections are exponential, it is easy to over-do. Indeed, I suspect this is why the dashed line overshoots even the bright galaxies from SINGS.
This brings us back to the terminology of the main sequence. Among stars, the main sequence is defined by low mass stars that evolve slowly. There is a turn-off point, and an associated mass, where stars transition from the main sequence to the sub giant branch. They then ascend the red giant branch as they evolve.
If we project this terminology onto galaxies, the main sequence should be defined by the low mass dwarfs. These are nowhere near to exhausting their gas supplies, so can continue to form stars far into the future. They establish a star forming main sequence of slope unity because that’s what the math says they must do.
Most of the literature on this subject refers to massive star forming galaxies. These are not the main sequence. They are the turn-off population. Massive spirals are near to exhausting their gas supply. Star formation is winding down as the fuel runs out.
Red and dead galaxies are the next stage, once star formation has stopped entirely. I suppose these are the red giants in this strained analogy to individual stars. That is appropriate insofar as most of the light from red and dead galaxies is produced by red giant stars. But is this really they right way to think about it? Or are we letting our terminology get the best of us?
I have not posted here in a while. This is mostly due to the fact that I have a job that is both engaging and demanding. I started this blog as a way to blow off steam, but I realized this mostly meant ranting about those fools at the academy! of whom there are indeed plenty. These are reality based rants, but I’ve got better things to do.
As it happens, I’ve come down with a bug that keeps me at home but leaves just enough energy to read and type, but little else. This is an excellent recipe for inciting a rant. Reading the Washington Post article on delayed gratification in children brings it on.
It is not really the article that gets me, let alone the scholarly paper on which it is based. I have not read the latter, and have no intention of doing so. I hope its author has thought through the interpretation better than is implied by what I see in the WaPo article. That is easy for me to believe; my own experience is that what academics say to the press has little to do with what eventually appears in the press – sometimes even inverting its meaning outright. (At one point I was quoted as saying that dark matter experimentalists should give up, when what I had said was that it was important to pursue these experiments to their logical conclusion, but that we also needed to think about what would constitute a logical conclusion if dark matter remains undetected.)
So I am at pains to say that my ire is not directed at the published academic article. In this case it isn’t even directed at the article in the WaPo, regardless of whether it is a fair representation of the academic work or not. My ire is directed entirely at the interpretation of a single graph, which I am going to eviscerate.
The graph in question shows the delay time measured in psychology experiments over the years. It is an attempt to measure self-control in children. When presented with a marshmallow but told they may have two marshmallows if they wait for it, how long can they hold out? This delayed gratification is thought to be a measure of self-control that correlates positively with all manners of subsequent development. Which may indeed be true. But what can we learn from this particular graph?
The graph plots the time delay measured from different experiments against the date of the experiment. Every point (plotted as a marshmallow – cute! I don’t object to that) represents an average over many children tested at that time. Apparently they have been “corrected” to account for the age of the children (one gets better at delayed gratification as one matures) which is certainly necessary, but it also raises a flag. How was the correction made? Such details can matter.
However, my primary concern is more basic. Do the data, as shown, actually demonstrate a trend?
To answer this question for yourself, the first thing you have to be able to do is mentally remove the line. That big black bold line that so nicely connects the dots. Perhaps it is a legitimate statistical fit of some sort. Or perhaps it is boldface to [mis]guide the eye. Doesn’t matter. Ignore it. Look at the data.
The first thing I notice about the data are the outliers – in this case, 3 points at very high delay times. These do not follow the advertised trend, or any trend. Indeed, they seem in no way related to the other data. It is as if a different experiment had been conducted.
When confronted with outlying data, one has a couple of choices. If we accept that these data are correct and from the same experiment, then there is no trend: the time of delayed gratification could be pretty much anything from a minute to half an hour. However, the rest of the data do clump together, so the other option is that these outliers are not really representing the same thing as the rest of the data, and should be ignored, or at least treated with less weight.
The outliers may be the most striking part of the data set, but they are usually the least important. There are all sorts of statistical measures by which to deal with them. I do not know which, if any, have been applied. There are no error bars, no boxes representing quartiles or some other percentage spanned by the data each point represents. Just marshmallows. Now I’m a little grumpy about the cutesy marshmallows. All marshmallows are portrayed as equal, but are some marshmallows more equal than others? This graph provides no information on this critical point.
In the absence of any knowledge about the accuracy of each marshmallow, one is forced to use one’s brain. This is called judgement. This can be good or bad. It is possible to train the brain to be a good judge of these things – a skill that seems to be in decline these days.
What I see in the data are several clumps of points (disregarding the outliers). In the past decade there are over a dozen points all clumped together around an average of 8 minutes. That seems like a pretty consistent measure of the delayed gratification of the current generation of children.
Before 2007, the data are more sparse. There are a half a dozen points on either side of 1997. These have a similar average of 7 or 8 minutes.
Before that there are very little data. What there is goes back to the sixties. One could choose to see that as two clumps of three points, or one clump of six points. If one does the latter, the mean is around 5 minutes. So we had a “trend” of 5 minutes circa 1970, 7 minutes circa 1997, and 8 minutes circa 2010. That is an increase over time, but it is also a tiny trend – much less persuasive than the heavy solid line in the graph implies.
If we treat the two clumps of three separately – as I think we should, since they sit well apart from each other – then we have to choose which to believe. They aren’t consistent. The delay time in 1968 looks to have an average of two minutes; in 1970 it looks to be 8 minutes. So which is it?
According to the line in the graph, we should believe the 1968 data and not the 1970 data. That is, the 1968 data fall nicely on the line, while the 1970 data fall well off it. In percentage terms, the 1970 data are as far from the trend as the highest 2010 point that we rejected as an outlier.
When fitting a line, the slope of the line can be strongly influence by the points at its ends. In this case, the earliest and the latest data. The latest data seem pretty consistent, but the earliest data are split. So the slope depends entirely on which clump of three early points you choose to believe.
If we choose to believe the 1970 clump, then the “trend” becomes 8 minutes in 1970, 7 minutes in 1997, 8 minutes in 2010. Which is to say, no trend at all. Try disregarding the first three (1968) points and draw your own line on this graph. Without them, it is pretty flat. In the absence of error bars and credible statistics, I would conclude that there is no meaningful trend present in the data at all. Maybe a formal fit gives a non-zero slope, but I find it hard to believe it is meaningfully non-zero.
None of this happens in a vacuum. Lets step back and apply some external knowledge. Have people changed over the 5 decades of my life?
The contention of the WaPo article is that they have. Specifically, contrary to the perception that iPhones and video games have created a generation with a cripplingly short attention span (congrats if you made it this far!), in fact the data show the opposite. The ability of children to delay gratification has improved over the time these experiments have been conducted.
What does the claimed trend imply? If we take it literally, then extrapolating back in time, the delay time goes to zero around 1917. People in the past must have been completely incapable of delaying gratification for even an instant. This was a power our species only developed in the past century.
I hope that sounds implausible. If there is no trend, which is what the data actually show, then children a half century ago were much the same as children a generation ago are much the same as the children of today. So the more conservative interpretation of the graph would be that human nature is rather invariant, at least as indicated by the measure of delayed gratification in children.
Sadly, null results are dull. There well may be a published study reporting no trend, but it doesn’t get picked up by the Washington Post. Imagine the headline: “Children today are much the same as they’ve always been!” Who’s gonna click on that? In this fashion, even reputable news sources contribute to the scourge of misleading science and fake news that currently pollutes our public discourse.
This sort of over-interpretation of weak trends is rife in many fields. My own, for example. This is why I’m good at spotting them. Fortunately, screwing up in Astronomy seldom threatens life and limb.
Then there is Medicine. My mother was a medical librarian; I occasionally browsed their journals when waiting for her at work. Graphs for the efficacy of treatments that looked like the marshmallow graph were very common. Which is to say, no effect was in evidence, but it was often portrayed as a positive trend. They seem to be getting better lately (which is to say, at some point in the not distant past some medical researchers were exposed to basic statistics), but there is an obvious pressure to provide a treatment, even if the effect of the available course of treatment is tiny. Couple that to the aggressive marketing of drugs in the US, and it would not surprise me if many drugs have been prescribed based on efficacy trends weaker than seen in the marshmallow graph. See! There is a line with a positive slope! It must be doing some good!
Another problem with data interpretation is in the corrections applied. In the case of marshmallows, one must correct for the age of the subject: an eight year old can usually hold out longer than a toddler. No doubt there are other corrections. The way these are usually made is to fit some sort of function to whatever trend is seen with age in a particular experiment. While that trend may be real, it also has scatter (I’ve known eight year olds who couldn’t out wait a toddler), which makes it dodgy to apply. Do all experiments see the same trend? It is safe to apply the same correction to all of them? Worse, it is often necessary to extrapolate these corrections beyond where they are constrained by data. This is known to be dangerous, as the correction can become overlarge upon extrapolation.
It would not surprise me if the abnormally low points around 1968 were over-corrected in some way. But then, it was the sixties. Children may have not changed much since then, but the practice of psychology certainly has. Lets consider the implications that has for comparing 1968 data to 2017 data.
The sixties were a good time for psychological research. The field had grown enormously since the time of Freud and was widely respected. However, this was also the time when many experimental psychologists thought psychotropic drugs were a good idea. Influential people praised the virtues of LSD.
My father was a grad student in psychology in the sixties. He worked with swans. One group of hatchlings imprinted on him. When they grew up, they thought they should mate with people – that’s what their mom looked like, after all. So they’d and make aggressive displays towards any person (they could not distinguish human gender) who ventured too close.
He related the anecdote of a colleague who became interested in the effect of LSD on animals. The field was so respected at the time that this chap was able to talk the local zoo into letting him inject an elephant with LSD. What could go wrong?
Perhaps you’ve heard the expression “That would have killed a horse! Fortunately, you’re not a horse.” Well, the fellow in question figured elephants were a lot bigger than people. So he scaled up the dose by the ratio of body mass. Not, say, the ratio of brain size, or whatever aspect of the metabolism deals with LSD.
That’s enough LSD to kill an elephant.
Sad as that was for the elephant, who is reputed to have been struck dead pretty much instantly – no tripping rampage preceded its demise – my point here is that these were the same people conducting the experiments in 1968. Standards were a little different. The difference seen in the graph may have more to do with differences in the field than with differences in the subjects.
That is not to say we should simply disregard old data. The date on which an observation is made has no bearing on its reliability. The practice of the field at that time does.
The 1968 delay times are absurdly low. All three are under four minutes. Such low delay times are not reproduced in any of the subsequent experiments. They would be more credible if the same result were even occasionally reproduced. It ain’t.
Another way to look at this is that there should be a comparable number of outliers on either side of the correct trend. That isn’t necessarily true – sometimes systematic errors push in a single direction – but in the absence of knowledge of such effects, one would expect outliers on both the high side and the low side.
In the marshmallow graph, with the trend as drawn, there are lots of outliers on the high side. There are none on the low side. [By outlier, I mean points well away from the trend, not just scattered a little to one side or the other.]
If instead we draw a flat line at 7 or 8 minutes, then there are three outliers on both sides. The three very high points, and the three very low points, which happen to occur around 1968. It is entirely because the three outliers on the low side happen at the earliest time that we get even the hint of a trend. Spread them out, and they would immediately be dismissed as outliers – which is probably what they are. Without them, there is no significant trend. This would be the more conservative interpretation of the marshmallow graph.
Perhaps those kids in 1968 were different in other ways. The experiments were presumably conducted in psychology departments on university campuses in the late sixties. It was OK to smoke inside back then, and not everybody restricted themselves to tobacco in those days. Who knows how much second hand marijuana smoke was inhaled just to getting to the test site? I jest, but the 1968 numbers might just measure the impact on delayed gratification when the subject gets the munchies.