Does Newton’s Constant Vary?

Does Newton’s Constant Vary?

This title is an example of what has come to be called Betteridge’s law. This is a relatively recent name for an old phenomenon: if a title is posed as a question, the answer is no. This is especially true in science, whether the authors are conscious of it or not.

Pengfei Li completed his Ph.D. recently, fitting all manner of dark matter halos as well as the radial acceleration relation (RAR) to galaxies in the SPARC database. For the RAR, he found that galaxy data were consistent with a single, universal acceleration scale, g+. There is of course scatter in the data, but this appears to us to be consistent with what we expect from variation in the mass-to-light ratios of stars and the various uncertainties in the data.

This conclusion has been controversial despite being painfully obvious. I have my own law for data interpretation in astronomy:

Obvious results provoke opposition. The more obvious the result, the stronger the opposition.

S. McGaugh, 1997

The constancy of the acceleration scale is such a case. Where we do not believe we can distinguish between galaxies, others think they can – using our own data! Here it is worth contemplating what all is involved in building a database like SPARC – we were the ones who did the work, after all. In the case of the photometry, we observed the galaxies, we reduced the data, we cleaned the images of foreground contaminants (stars), we fit isophotes, we built mass models – that’s a very short version of what we did in order to be able to estimate the acceleration predicted by Newtonian gravity for the observed distribution of stars. That’s one axis of the RAR. The other is the observed acceleration, which comes from rotation curves, which require even more work. I will spare you the work flow; we did some galaxies ourselves, and took others from the literature in full appreciation of what we could and could not believe — which we have a deep appreciation for because we do the same kind of work ourselves. In contrast, the people claiming to find the opposite of what we find obtained the data by downloading it from our website. The only thing they do is the very last step in the analysis, making fits with Bayesian statistics the same as we do, but in manifest ignorance of the process by which the data came to be. This leads to an underappreciation of the uncertainty in the uncertainties.

This is another rule of thumb in science: outside groups are unlikely to discover important things that were overlooked by the group that did the original work. An example from about seven years ago was the putative 126 GeV line in Fermi satellite data. This was thought by some at the time to be evidence for dark matter annihilating into gamma rays with energy corresponding to the rest mass of the dark matter particles and their anti-particles. This would be a remarkable, Nobel-winning discovery, if true. Strange then that the claim was not made by the Fermi team themselves. Did outsiders beat them to the punch with their own data? It can happen: sometimes large collaborations can be slow to move on important results, wanting to vet everything carefully or warring internally over its meaning while outside investigators move more swiftly. But it can also be that the vetting shows that the exciting result is not credible.

I recall the 126 GeV line being a big deal. There was an entire session devoted to it at a conference I was scheduled to attend. Our time is valuable: I can’t go to every interesting conference, and don’t want to spend time on conferences that aren’t interesting. I was skeptical, simply because of the rule of thumb. I wrote the organizers, and asked if they really thought that this would still be a thing by the time the conference happened in few months’ time. Some of them certainly thought so, so it went ahead. As it happened, it wasn’t. Not a single speaker who was scheduled to talk about the 126 GeV line actually did so. In a few short months, if had gone from an exciting result sure to win a Nobel prize to nada.

What 126 GeV line? Did I say that? I don’t recall saying that.

This happens all the time. Science isn’t as simple as a dry table of numbers and error bars. This is especially true in astronomy, where we are observing objects in the sky. It is never possible to do an ideal experiment in which one controls for all possible systematics: the universe is not a closed box in which we can control the conditions. Heck, we don’t even know what all the unknowns are. It is a big friggin’ universe.

The practical consequence of this is that the uncertainty in any astronomical measurement is almost always larger than its formal error bar. There are effects we can quantify and include appropriately in the error assessment. There are things we can not. We know they’re there, but that doesn’t mean we can put a meaningful number on them.

Indeed, the sociology of this has evolved over the course of my career. Back in the day, everybody understood these things, and took the stated errors with a grain of salt. If it was important to estimate the systematic uncertainty, it was common to estimate a wide band, in effect saying “I’m pretty sure it is in this range.” Nowadays, it has become common to split out terms for random and systematic error. This is helpful to the non-specialist, but it can also be misleading because, so stated, the confidence interval on the systematic looks like a 1 sigma error even though it is not likely to have a Gaussian distribution. Being 3 sigma off of the central value might be a lot more likely than this implies — or a lot less.

People have become more careful in making error estimates, which ironically has made matters worse. People seem to think that they can actually believe the error bars. Sometimes you can, but sometimes not. Many people don’t know how much salt to take it with, or realize that they should take it with a grain of salt at all. Worse, more and more folks come over from particle physics where extraordinary accuracy is the norm. They are completely unprepared to cope with astronomical data, or even fully process that the error bars may not be what they think they are. There is no appreciation for the uncertainties in the uncertainties, which is absolutely fundamental in astrophysics.

Consequently, one gets overly credulous analyses. In the case of the RAR, a number of papers have claimed that the acceleration scale isn’t constant. Not even remotely! Why do they make this claim?

Below is a histogram of raw acceleration scales from SPARC galaxies. In effect, they are claiming that they can tell the difference between galaxies in the tail on one side of the histogram from those on the opposite side. We don’t think we can, which is the more conservative claim. The width of the histogram is just the scatter that one expects from astronomical data, so the data are consistent with zero intrinsic scatter. That’s not to say that’s necessarily what Nature is doing: we can never measure zero scatter, so it is always conceivable that there is some intrinsic variation in the characteristic acceleration scale. All we can say is that if is there, it is so small that we cannot yet resolve it.

Histogram of the acceleration scale in individual galaxies g+ relative the characteristic value a0.

Posed as a histogram like this, it is easy to see that there is a characteristic value – the peak – with some scatter around it. The entire issue it whether that scatter is due to real variation from galaxy to galaxy, or if it is just noise. One way to check this is to make quality cuts: in the plot above, the gray-striped histogram plots every available galaxy. The solid blue one makes some mild quality cuts, like knowing the distance to better than 20%. That matters, because the acceleration scale is a quantity that depends on distance – a notoriously difficult quantity to measure accurately in astronomy. When this quality cut is imposed, the width of the histogram shrinks. The better data make a tighter histogram – just as one would expect if the scatter is due to noise. If instead the scatter is a real, physical effect, it should, if anything, be more pronounced in the better data.

This should not be difficult to understand. And yet – other representations of the data give a different impression, like this one:

Best-fit accelerations from Marra et al. (2020).

This figure tells a very different story. The characteristic acceleration does not just scatter around a universal value. There is a clear correlation from one end of the plot to the other. Indeed, it is a perfectly smooth transition, because “Galaxy” is the number of each galaxy ordered by the value of its acceleration, from lowest to highest. The axes are not independent, they represent identically the same quantity. It is a plot of x against x. If properly projected it into a histogram, it would look like the one above.

This is a terrible way to plot data. It makes it look like there is a correlation where there is none. Setting this aside, there is a potential issue with the most discrepant galaxies – those at either extreme. There are more points that are roughly 3 sigma from a constant value than there should be for a sample this size. If this is the right assessment of the uncertainty, then there is indeed some variation from galaxy to galaxy. Not much, but the galaxies at the left hand side of the plot are different from those on the right hand side.

But can we believe the formal uncertainties that inform this error analysis? If you’ve read this far, you will anticipate that the answer to this question obeys Betteridge’s law. No.

One of the reasons we can’t just assign confidence intervals and believe them like a common physicist is that there are other factors in the analysis – nuisance parameters in Bayesian verbiage – with which the acceleration scale covaries. That’s a fancy way of saying that if we turn one knob, it affects another. We assign priors to the nuisance parameters (e.g., the distance to each galaxy and its inclination) based on independent measurements. But there is still some room to slop around. The question is really what to believe at the end of the analysis. We don’t think we can distinguish the acceleration scale from one galaxy to another, but this other analysis says we should. So which is it?

It is easy at this point to devolve into accusations of picking priors to obtain a preconceived result. I don’t think anyone is doing that. But how to show it?

Pengfei had the brilliant idea to perform the same analysis as Marra et al., but allowing Newton’s constant to vary. This is Big G, a universal constant that’s been known to be a constant of nature for centuries. It surely does not vary. However, G appears in our equations, so we can test for variation therein. Pengfei did this, following the same procedure as Mara et al., and finds the same kind of graph – now for G instead of g+.

Best fit values of Newton’s constant from Li et al (2021).

You see here the same kind of trend for Newton’s constant as one sees above for the acceleration scale. The same data have been analyzed in the same way. It has also been plotted in the same way, giving the impression of a correlation where there is none. The result is also the same: if we believe the formal uncertainties, the best-fit G is different for the galaxies at the left than from those to the right.

I’m pretty sure Newton’s constant does not vary this much. I’m entirely sure that the rotation curve data we analyze are not capable of making this determination. It would be absurd to claim so. The same absurdity extends to the acceleration scale g+. If we don’t believe the variation in G, there’s no reason to believe that in g+.


So what is going on here? It boils down to the errors on the rotation curves not representing the uncertainty in the circular velocity as we would like for them to. There are all sorts of reasons for this, observational, physical, and systematic. I’ve written about this at great lengths elsewhere, and I haven’t the patience to do so again here. it is turgidly technical to the extent that even the pros don’t read it. It boils down to the ancient, forgotten wisdom of astronomy: you have to take the errors with a grain of salt.

Here is the cumulative distribution (CDF) of reduced chi squared for the plot above.

Cumulative distribution of reduced chi-squared for different priors on Newton’s constant.

Two things to notice here. First, the CDF looks the same regardless of whether we let Newton’s constant vary or not, or how we assign the Bayesian priors. There’s no value added in letting it vary – just as we found for the characteristic acceleration scale in the first place. Second, the reduced chi squared is rarely close to one. It should be! As a goodness of fit measure, one claims to have a good fit when chi squared equal to one. The majority of these are not good fits! Rather than the gradual slope we see here, the CDF of chi squared should be a nearly straight vertical line. That’s nothing like what we see.

If one interprets this literally, there are many large chi squared values well in excess of unity. These are bad fits, and the model should be rejected. That’s exactly what Rodrigues et al. (2018) found, rejecting the constancy of the acceleration scale at 10 sigma. By their reasoning, we must also reject the constancy of Newton’s constant with the same high confidence. That’s just silly.

One strange thing: the people complaining that the acceleration scale is not constant are only testing that hypothesis. Their presumption is that if the data reject that, it falsifies MOND. The attitude is that this is an automatic win for dark matter. Is it? They don’t bother checking.

We do. We can do the same exercise with dark matter. We find the same result. The CDF looks the same; there are many galaxies with chi squared that is too large.

CDF of rotation curve fits with various types of dark matter halos. None provide a satisfactory fit (as indicated by chi squared) to all galaxies.

Having found the same result for dark matter halos that we found for the RAR, if we apply the same logic, then all proposed model halos are excluded. There are too many bad fits with overly large chi squared.

We have now ruled out all conceivable models. Dark matter is falsified. MOND is falsified. Nothing works. Look on these data, ye mighty, and despair.

But wait! Should we believe the error bars that lead to the end of all things? What would Betteridge say?

Here is the rotation curve of DDO 170 fit with the RAR. Look first at the left box, with the data (points) and the fit (red line). Then look at the fit parameters in the right box.

RAR fit to the rotation curve of DDO 170 (left) with fit parameters at right.

Looking at the left panel, this is a good fit. The line representing the model provides a reasonable depiction of the data.

Looking at the right panel, this is a terrible fit. The reduced chi squared is 4.9. That’s a lot larger than one! The model is rejected with high confidence.

Well, which is it? Lots of people fall into the trap of blindly trusting statistical tests like chi squared. Statistics can only help your brain. They can’t replace it. Trust your eye-brain. This is a good fit. Chi squared is overly large not because this is a bad model but because the error bars are too small. The absolute amount by which the data “miss” is just a few km/s. This is not much by the standards of galaxies, and could easily be explained by a small departure of the tracer from a purely circular orbit – a physical effect we expect at that level. Or it could simply be that the errors are underestimated. Either way, it isn’t a big deal. It would be incredibly naive to take chi squared at face value.

If you want to see a dozen plots like this for all the various models fit to each of over a hundred galaxies, see Li et al. (2020). The bottom line is always the same. The same galaxies are poorly fit by any model — dark matter or MOND. Chi squared is too big not because all conceivable models are wrong, but because the formal errors are underestimated in many cases.

This comes as no surprise to anyone with experience working with astronomical data. We can work to improve the data and the error estimation – see, for example, Sellwood et al (2021). But we can’t blindly turn the crank on some statistical black box and expect all the secrets of the universe to tumble out onto a silver platter for our delectation. There’s a little more to it than that.

A Stellar Population Mystery in a Low Surface Brightness Galaxy

A Stellar Population Mystery in a Low Surface Brightness Galaxy

“Galaxies are made of stars.”

Bob Schommer, quoted by Dave Silva in his dissertation on stellar populations

This tongue-in-cheek quote is a statement of the obvious, at least for the 90+ years since Hubble established that galaxies are stellar systems comparable to and distinct from the Milky Way. There’s interstellar gas and dust too, and I suppose for nearly half that time, people have also thought galaxies to be composed of dark matter. But you can’t see that; the defining characteristic of galaxies is the stars by whose amalgamated light they shine.

The spiral galaxy NGC 7757 (left) and a patch of adjacent sky (right). Both images are 1/4 degree on a side. Most of the sky looks like the patch at right, populated only by scattered stars. You know a galaxy when you see one. These images are based on photographic data obtained using the Oschin Schmidt Telescope on Palomar Mountain as part of the Palomar Observatory Sky Survey-II (POSS-II).

Stellar populations is the term astronomers use to describe the generations of stars that compose the galaxies we observe. The concept was introduced by Walter Baade in a 1944 paper in which he resolved individual stars in Andromeda and companion galaxies, aided by war time blackouts. He noted that some of the stars he resolved had color-magnitude diagrams (CMDs – see below) that resembled that of the solar neighborhood, while others were more like globular clusters. Thus was born Population I and Population II, the epitome of astronomical terminology.

More generally, one can imagine defining lots of populations by tracing groups of stars with a common origin in space and time to the event in which they formed. From this perspective, the Milky Way is the composite of all the star forming events that built it up. Each group has its own age, composition, and orbital properties, and it would be good to have a map that is more detailed than “Pop I” and “Pop II.” Many projects are working to map out these complex details, including ESA’s Gaia satellite, which is producing many spectacular and fundamental results, like the orbit and acceleration of the sun within the Milky Way.

A simple stellar population is a group of stars that all share the same composition and age: they were born of the same material at the same time. Even such a simple stellar population can be rather complicated, as stars form with a distribution of masses (the IMF, for Initial Mass Function) from tiny to massive. The lowest mass stars are those that just barely cross the threshold for igniting hydrogen fusion in their core, which occurs at about 7% of the mass of the sun. Still lower mass objects are called brown dwarfs, and were once considered a candidate for dark matter. Though they don’t shine from fusion like stars, brown dwarfs do glow with the residual heat of their formation through gravitational contraction, and we can now see that there are nowhere near enough of them to be the dark matter. At the opposite end of the mass spectrum, stars many tens of times the mass of the sun are known, with occasional specimens reaching upwards of 100 solar masses. These massive stars burn bright and exhaust their fuel quickly, exploding as supernovae after a few million years – a mere blink of the cosmic eye. By contrast, the lowest mass stars are so faint that they take practically forever to burn through their fuel, and are expected to continue to shine (albeit feebly) for many tens of Hubble times into the future. There is a strong and continuous relation between stellar mass and lifetime: the sun is expected to persist as-is for about 10 billion years (it is just shy of halfway through its “main sequence” lifetime). After a mundane life fusing hydrogen and helium as a main sequence star, the sun will swell into a red giant, becoming brighter and larger in radius (but not mass). This period is much shorter-lived, as are the complex sequence of events that follow it, ultimately leaving behind the naked core as an earth-sized but roughly half solar mass white dwarf remnant.

Matters become more complicated when we consider galaxies composed of multiple generations and different compositions. Nevertheless, we understand well enough the evolution of individual stars – a triumph of twentieth century astronomy – to consider the complex stellar populations of external galaxies. A particular interest of mine are the stellar populations of low surface brightness galaxies. These are late morphological types (often but not always irregular galaxies) that tend to be gas rich and very blue. This requires many young stars, but also implies a low metallicity. This much can be inferred from unresolved observations of galaxies, but the effects of age and composition are often degenerate. The best way to sort this out is to do as Baade did and resolve galaxies into individual stars. This was basically impossible for all but the nearest galaxies before the launch of the Hubble Space Telescope. The resolution of HST allows us to see farther out and deeper into the color-magnitude diagrams of external galaxies.

The low surface brightness galaxy F575-3, as discovered on a POSS-II sky survey plate (left) and as seen by HST (right). Both images are negatives. Only a tiny fraction of the 6.6 degree square POSS-II plate is shown as the image at left covers a mere 1/13 degree on a side. The pink outline shows the still smaller area of sky observed by HST, which visited the object at different roll angles: the differing orientation of the satellite causes the slight twist in the rectangular shape that is imaged. HST resolves individual stars, allowing construction of a color-magnitude diagram. It also resolves background galaxies, which are the majority of the extended objects in this image. Some even shine right through the foreground LSB galaxy!

Collaborator Jim Schombert has long been a leader in studying low surface brightness galaxies, discovering many examples of the class, and leading their study with HST among many stellar contributions. He is one of the unsung heroes without whom the field would be nowhere near where it is today. This post discusses a big puzzle he has identified in the stellar populations of low surface brightness galaxies: the case of the stars with inexplicable[?] IR excesses. Perhaps he has also solved this puzzle, but first we have to understand what is normal and what is weird in a galaxy’s stellar population.

When we resolve a galaxy into stars in more than one filter, the first thing we do is plot a color-magnitude diagram (CMD). The CMD quantifies how bright a star is, and what its color is – a proxy for its surface temperature. Hot stars are blue; cooler ones are red. The CMD is the primary tool by which the evolution of stars was unraveled. Normal features of the CMD include the main sequence (where stars spend the majority of their lives) and the red giant branch (prominent since giant stars are bright if rare). This is what Baade recognized in Populations I and II – stars with CMDs like those near the sun (lots of main sequence stars and some red giants) and those like globular clusters (mostly red giants at bright magnitudes and fainter main sequence stars).

In actively star forming galaxies like F415-3 below, there are plenty of young, massive, bright stars. These evolve rapidly, traipsing across the CMD from blue to red and back to blue and then red again. We can use what we know about stellar evolution to deduce the star formation history of a galaxy – how many stars formed as a function of time. This works quite well for short time periods as massive stars evolve fast and are easy to see, but it becomes increasingly hard for older stars. A galaxy boasts about its age when it is young but becomes less forthcoming as it gets older.

Color-magnitude diagram (CMD) of the low surface brightness galaxy F415-3 observed by HST (Schombert & McGaugh 2015). Each point is one star. The x-axis is color, with bluer stars to the left and redder stars to the right. The y-axis is magnitude: brighter stars are higher; fainter stars are lower. There are many, many stars fainter than those detected here; these observations only resolve the brightest stars that populate the top of the CMD. The lines demarcate the CMD into regions dominated by stars in various evolutionary phases. Counting stars in each box lets us trace out the recent star formation history, which is found to vary stochastically over the past few tens of millions of years while remaining roughly constant when averaged over the age of the universe (13+ billion years).

Most late type, irregular galaxies have been perking along, forming stars at a modest but fairly steady rate for most of the history of the universe. That’s a very broad-brush statement; there are many puzzling details in the details. F415-3 seems to be deficient in AGB stars. These are asymptotic giants, the phase of evolution after the phase after the first-ascent red giant branch. This may be challenging the limits of our understanding of the modeling of stellar evolution. The basics are well-understood, but stars are giant, complicated, multifaceted beasts: just as understanding that terrestrial planets are basically metallic cores surrounded by mantles of rocky minerals falls short of describing the Earth, so too does a basic understanding of stellar evolution fall short of explaining every detail of every star. That’s what I love about astronomy: there is always something new to learn.

Below is the CMD of F575-3, now in the near infrared filters available on HST rather than the optical filters above. There is not such a rich recent star formation history in this case; indeed, this galaxy has been abnormally quiescent for its class. There are some young stars above the tip of the red giant branch (the horizontal blue line), but no HII regions of ionized gas that point up the hottest, youngest stars (typically < 10 Myr old). Mostly we see a red giant branch (the region dark with points below the line) and some main sequence stars (the cloud of points to the left of the red giant branch). These merge into a large blob at faint magnitudes as the uncertainties smear everything together at the limits of the observation.

Color-magnitude diagram of the stars in F575-3 observed by HST (left) and the surrounding field (right). The typical size of the error bars is shown in the right panel; this causes the data to smear into a blob at fainter magnitudes. One can nevertheless recognize some of the main features, as noted: the main sequence of young stars, the red giant branch below the horizontal line, and a region of rapidly evolving stars above the line (mostly asymptotic giants with some helium burning stars and a few red supergiants). There are also a number of stars to the right of the giant branch, in a region of the CMD that is not explained by models of stellar evolution. There shouldn’t be any stars here, but there are more than can be explained by background contamination. What are they?

One cool thing about F575-3 is that it has the bluest red giants known. All red giants are red, but just how red depends sensitively on their metallicity – the fraction of their composition that isn’t hydrogen or helium. As stars evolve, they synthesize heavy elements that are incorporated into subsequent generations of stars. After a while, you have a comparatively metal-rich composition like that of the sun – which is still not much: the mass of the elements in the sun that are not hydrogen or helium is less than 2% of the total. I know that sounds like a small fraction – it is a small fraction – but it is still rather a lot by the standards of the universe in which we live, which started as three parts hydrogen and one part helium, and nothing heavier than lithium. Stars have had to work hard for generation upon generation to make everything else in the periodic table from carbon on up. Galaxies smaller than the Milky Way haven’t got as far along in this process, so dwarf galaxies are typically low metallicity – often much less than 1% by mass.

F575-3 is especially low metallicity. Or so it appears from the color of its red giant stars. These are the bluest reds currently known. Here are some other dwarfs for comparison, organized in order of increasing metallicity. The right edge of the red giant branch in F575-3 is clearly to the left of everything else.

Color-magnitude diagrams of some of the dwarf galaxies that have been observed by HST. Colored lines illustrate the sequence expected for red giants of different metallicities. These are all well below the solar composition, as measured by the logarithmic ratio of the iron abundance relative to hydrogen relative to that in the sun: solar [Fe/H] = 0; [Fe/H] = -1 is one tenth of the solar metal abundance. The lines illustrate the locations of giant branches with [Fe/H] = -2.3 (blue), -1.5 (green) and -0.7 (red). That’s 0.5%, 3%, and 20% of solar, respectively. Heavy elements make up less than 0.4% of the mass of the stars in these galaxies.

But that’s not what I wrote to tell you about. I already knew LSB galaxies were low metallicity; that’s what I did part of my thesis on. That was based on the gas phase abundances, but it makes sense that the stars would share this property – they form out of the interstellar gas, after all. Somebody has to be the bluest of them all. That’s remarkable, but not surprising.

What is surprising is that F575-3 has an excess of stars with an IR-excess – their colors are too red in the infrared part of the spectrum. These are the stars to the right of the red giant branch. We found it basically impossible to populate this portion of the CMD without completely overdoing it. Plausible stellar evolution tracks don’t go there. Nature has no menu option for a sprinkling of high metallicity giant stars but hold the metals everywhere else: once you make those metals, there are ample numbers of high metallicity stars. So what the heck are these things with a near-IR excess?

The CMD of F575-3 in near-IR (left) and optical colors (right). Main sequence stars are blue, rapidly evolving phases like asymptotic giants are red, and most of the black points are red giant stars. There is a population of mystery stars colored purple. These have a near-IR excess: very red colors in the infrared, but normal colors in the optical.

My first thought was that they were bogus. There are always goofy things in astronomical data; outliers are often defects of some sort – in the detector, or the result of cosmic ray strikes. So initially they were easy to ignore. However, this kept nagging at us; it seemed like too much to just dismiss. There are some things like this in the background, but not enough to explain how many we see in the body of the diagram. This argued against things not associated with the galaxy itself, like background galaxies with redshifted colors. When we plotted the distribution of near-IR excess objects, they were clearly associated with the galaxy.

The distribution of sources with a near-IR excess (red) compared to objects of similar apparent magnitude. They’re in the same place as the galaxy that the eye sees in the raw image. Whatever they are, they’re clearly part of F575-3.

The colors make no sense for stars. They aren’t the occasional high metallicity red giant. So our next thought was extinction by interstellar dust. This has the net effect of making things look redder. But Jim did the hard work of matching up individual stars in both the optical and near-IR filters. The optical colors are normal. The population that stands out in the near-IR CMD mixes in evenly with the rest of the stars in the optical CMD. That’s the opposite of what dust does. Dust affects the optical colors more strongly. Here the optical colors are normal, but the near-IR colors are too red – hence an IR-excess.

There, I was stumped. We had convinced ourselves that we couldn’t just dismiss the IR-excess population as artifacts. They had the right spatial distribution to be part of the galaxy. They had the right magnitudes to be stars in the galaxy. But that had really weird IR colors that were unexplained by any plausible track of stellar evolution.

Important detail: stellar evolution models track what happens in the star, up to its surface, but not in the environment beyond. Jim thought about it, and came back to me with an idea outside my purview. He remembered a conversation he had had long ago with Karl Rakos while observing high redshift clusters with custom-tailored filters. Rakos had previously worked on Ap and Be stars – peculiar stars. I had heard of these things, but they’re rare and don’t contribute significantly to the integrated light of the stellar population in a galaxy like the Milky Way. They seemed like an oddity of little consequence in a big universe.

Be stars – that’s “B” then “e” for B-type stars (the second hottest spectral classification) with emission lines (hence the e). Stars mostly just have absorption lines; emission lines make them peculiar. But Jim learned from his conversations with Rakos that these stars also frequently had IR-excesses. Some digging into the literature, and sure enough, these types of stars have the right magnitudes and colors to explain the strange population we can’t otherwise understand.

It is still weird. There are a lot of them. Not a lot in an absolute sense, but a lot more than we’d expect from their frequency in the Milky Way. But now that we know to look for them, you can see a similar population in the some other dwarfs. Maybe they become more frequent in lower metallicity galaxies. The emission lines and the IR excess come from a disk of hot gas around the star; maybe such disks are more likely to form when there are fewer metals. This makes at least a tiny amount of sense, as B stars have a lot of energy to emit and angular momentum to transport. The mechanisms by which that can happen multiply when there are metals to make dust grains that can absorb and reprocess the abundance of UV photons. In their absence, when the metallicity is low, nature has to find another way. So maybe – maybe – Be stars are more common in lower metallicity environments because the dearth of dust encourages the formation of gas disks. That’s entirely speculative (a fun but dangerous aspect of astronomy), so maybe not.

I don’t know if ultimately Be stars are the correct interpretation. It’s the best we’ve come up with. I really don’t know whether metallicity and dust play the role I just speculatively described. But it is a new and unexpected thing – and that’s the cool thing about the never-ending discovery space of astronomy. Even when you know what to expect, the universe can still surprise you – if you pay attention to the data.

25 years a heretic

25 years a heretic

People seem to like to do retrospectives at year’s end. I take a longer view, but the end of 2020 seems like a fitting time to do that. Below is the text of a paper I wrote in 1995 with collaborators at the Kapteyn Institute of the University of Groningen. The last edit date is from December of that year, so this text (in plain TeX, not LaTeX!) is now a quarter century old. I am just going to cut & paste it as-was; I even managed to recover the original figures and translate them into something web-friendly (postscript to jpeg). This is exactly how it was.

This was my first attempt to express in the scientific literature my concerns for the viability of the dark matter paradigm, and my puzzlement that the only theory to get any genuine predictions right was MOND. It was the hardest admission in my career that this could be even a remote possibility. Nevertheless, intellectual honesty demanded that I report it. To fail to do so would be an act of reality denial antithetical to the foundational principles of science.

It was never published. There were three referees. Initially, one was positive, one was negative, and one insisted that rotation curves weren’t flat. There was one iteration; this is the resubmitted version in which the concerns of the second referee were addressed to his apparent satisfaction by making the third figure a lot more complicated. The third referee persisted that none of this was valid because rotation curves weren’t flat. Seems like he had a problem with something beyond the scope of this paper, but the net result was rejection.

One valid concern that ran through the refereeing process from all sides was “what about everything else?” This is a good question that couldn’t fit into a short letter like this. Thanks to the support of Vera Rubin and a Carnegie Fellowship, I spent the next couple of years looking into everything else. The results were published in 1998 in a series of three long papers: one on dark matter, one on MOND, and one making detailed fits.

This had started from a very different place intellectually with my efforts to write a paper on galaxy formation that would have been similar to contemporaneous papers like Dalcanton, Spergel, & Summers and Mo, Mao, & White. This would have followed from my thesis and from work with Houjun Mo, who was an office mate when we were postdocs at the IoA in Cambridge. (The ideas discussed in Mo, McGaugh, & Bothun have been reborn recently in the galaxy formation literature under the moniker of “assembly bias.”) But I had realized by then that my ideas – and those in the papers cited – were wrong. So I didn’t write a paper that I knew to be wrong. I wrote this one instead.

Nothing substantive has changed since. Reading it afresh, I’m amazed how many of the arguments over the past quarter century were anticipated here. As a scientific community, we are stuck in a rut, and seem to prefer to spin the wheels to dig ourselves in deeper than consider the plain if difficult path out.


Testing hypotheses of dark matter and alternative gravity with low surface density galaxies

The missing mass problem remains one of the most vexing in astrophysics. Observations clearly indicate either the presence of a tremendous amount of as yet unidentified dark matter1,2, or the need to modify the law of gravity3-7. These hypotheses make vastly different predictions as a function of density. Observations of the rotation curves of galaxies of much lower surface brightness than previously studied therefore provide a powerful test for discriminating between them. The dark matter hypothesis requires a surprisingly strong relation between the surface brightness and mass to light ratio8, placing stringent constraints on theories of galaxy formation and evolution. Alternatively, the observed behaviour is predicted4 by one of the hypothesised alterations of gravity known as modified Newtonian dynamics3,5 (MOND).

Spiral galaxies are observed to have asymptotically flat [i.e., V(R) ~ constant for large R] rotation curves that extend well beyond their optical edges. This trend continues for as far (many, sometimes > 10 galaxy scale lengths) as can be probed by gaseous tracers1,2 or by the orbits of satellite galaxies9. Outside a galaxy’s optical radius, the gravitational acceleration is aN = GM/R2 = V2/R so one expects V(R) ~ R-1/2. This Keplerian behaviour is not observed in galaxies.

One approach to this problem is to increase M in the outer parts of galaxies in order to provide the extra gravitational acceleration necessary to keep the rotation curves flat. Indeed, this is the only option within the framework of Newtonian gravity since both V and R are directly measured. The additional mass must be invisible, dominant, and extend well beyond the optical edge of the galaxies.

Postulating the existence of this large amount of dark matter which reveals itself only by its gravitational effects is a radical hypothesis. Yet the kinematic data force it upon us, so much so that the existence of dark matter is generally accepted. Enormous effort has gone into attempting to theoretically predict its nature and experimentally verify its existence, but to date there exists no convincing detection of any hypothesised dark matter candidate, and many plausible candidates have been ruled out10.

Another possible solution is to alter the fundamental equation aN = GM/R2. Our faith in this simple equation is very well founded on extensive experimental tests of Newtonian gravity. Since it is so fundamental, altering it is an even more radical hypothesis than invoking the existence of large amounts of dark matter of completely unknown constituent components. However, a radical solution is required either way, so both possibilities must be considered and tested.

A phenomenological theory specifically introduced to address the problem of the flat rotation curves is MOND3. It has no other motivation and so far there is no firm physical basis for the theory. It provides no satisfactory cosmology, having yet to be reconciled with General Relativity. However, with the introduction of one new fundamental constant (an acceleration a0), it is empirically quite successful in fitting galaxy rotation curves11-14. It hypothesises that for accelerations a < a0 = 1.2 x 10-10 m s-2, the effective acceleration is given by aeff = (aN a0)1/2. This simple prescription works well with essentially only one free parameter per galaxy, the stellar mass to light ratio, which is subject to independent constraint by stellar evolution theory. More importantly, MOND makes predictions which are distinct and testable. One specific prediction4 is that the asymptotic (flat) value of the rotation velocity, Va, is Va = (GMa0)1/4. Note that Va does not depend on R, but only on M in the regime of small accelerations (a < a0).

In contrast, Newtonian gravity depends on both M and R. Replacing R with a mass surface density variable S = M(R)/R2, the Newtonian prediction becomes M S ~ Va4 which contrasts with the MOND prediction M ~ Va4. These relations are the theoretical basis in each case for the observed luminosity-linewidth relation L ~ Va4 (better known as the Tully-Fisher15 relation. Note that the observed value of the exponent is bandpass dependent, but does obtain the theoretical value of 4 in the near infrared16 which is considered the best indicator of the stellar mass. The systematic variation with bandpass is a very small effect compared to the difference between the two gravitational theories, and must be attributed to dust or stars under either theory.) To transform from theory to observation one requires the mass to light ratio Y: Y = M/L = S/s, where s is the surface brightness. Note that in the purely Newtonian case, M and L are very different functions of R, so Y is itself a strong function of R. We define Y to be the mass to light ratio within the optical radius R*, as this is the only radius which can be measured by observation. The global mass to light ratio would be very different (since M ~ R for R > R*, the total masses of dark haloes are not measurable), but the particular choice of definition does not affect the relevant functional dependences is all that matters. The predictions become Y2sL ~ Va4 for Newtonian gravity8,16 and YL ~ Va4 for MOND4.

The only sensible17 null hypothesis that can be constructed is that the mass to light ratio be roughly constant from galaxy to galaxy. Clearly distinct predictions thus emerge if galaxies of different surface brightnesses s are examined. In the Newtonian case there should be a family of parallel Tully-Fisher relations for each surface brightness. In the case of MOND, all galaxies should follow the same Tully-Fisher relation irrespective of surface brightness.

Recently it has been shown that extreme objects such as low surface brightness galaxies8,18 (those with central surface brightnesses fainter than s0 = 23 B mag./[] corresponding 40 L pc-2) obey the same Tully-Fisher relation as do the high surface brightness galaxies (typically with s0 = 21.65 B mag./[] or 140 L pc-2) which originally15 defined it. Fig. 1 shows the luminosity-linewidth plane for galaxies ranging over a factor of 40 in surface brightness. Regardless of surface brightness, galaxies fall on the same Tully-Fisher relation.

The luminosity-linewidth (Tully-Fisher) relation for spiral galaxies over a large range in surface brightness. The B-band relation is shown; the same result is obtained in all bands8,18. Absolute magnitudes are measured from apparent magnitudes assuming H0 = 75 km/s/Mpc. Rotation velocities Va are directly proportional to observed 21 cm linewidths (measured as the full width at 20% of maximum) W20 corrected for inclination [sin-1(i)]. Open symbols are an independent sample which defines42 the Tully-Fisher relation (solid line). The dotted lines show the expected shift of the Tully-Fisher relation for each step in surface brightness away from the canonical value s0 = 21.5 if the mass to light ratio remains constant. Low surface brightness galaxies are plotted as solid symbols, binned by surface brightness: red triangles: 22 < s0 < 23; green squares: 23 < s0 < 24; blue circles: s0 > 24. One galaxy with two independent measurements is connected by a line. This gives an indication of the typical uncertainty which is sufficient to explain nearly all the scatter. Contrary to the clear expectation of a readily detectable shift as indicated by the dotted lines, galaxies fall on the same Tully-Fisher relation regardless of surface brightness, as predicted by MOND.

MOND predicts this behaviour in spite of the very different surface densities of low surface brightness galaxies. In order to understand this observational fact in the framework of standard Newtonian gravity requires a subtle relation8 between surface brightness and the mass to light ratio to keep the product sY2 constant. If we retain normal gravity and the dark matter hypothesis, this result is unavoidable, and the null hypothesis of similar mass to light ratios (which, together with an assumed constancy of surface brightness, is usually invoked to explain the Tully-Fisher relation) is strongly rejected. Instead, the current epoch surface brightness is tightly correlated with the properties of the dark matter halo, placing strict constraints on models of galaxy formation and evolution.

The mass to light ratios computed for both cases are shown as a function of surface brightness in Fig. 2. Fig. 2 is based solely on galaxies with full rotation curves19,20 and surface photometry, so Va and R* are directly measured. The correlation in the Newtonian case is very clear (Fig. 2a), confirming our inference8 from the Tully-Fisher relation. Such tight correlations are very rare in extragalactic astronomy, and the Y-s relation is probably the real cause of an inferred Y-L relation. The latter is much weaker because surface brightness and luminosity are only weakly correlated21-24.

The mass to light ratio Y (in M/L) determined with (a) Newtonian dynamics and (b) MOND, plotted as a function of central surface brightness. The mass determination for Newtonian dynamics is M = V2 R*/G and for MOND is M = V4/(G a0). We have adopted as a consistent definition of the optical radius R* four scale lengths of the exponential optical disc. This is where discs tend to have edges, and contains essentially all the light21,22. The definition of R* makes a tremendous difference to the absolute value of the mass to light ratio in the Newtonian case, but makes no difference at all to the functional relation will be present regardless of the precise definition. These mass measurements are more sensitive to the inclination corrections than is the Tully-Fisher relation since there is a sin-2(i) term in the Newtonian case and one of sin-4(i) for MOND. It is thus very important that the inclination be accurately measured, and we have retained only galaxies which have adequate inclination determinations — error bars are plotted for a nominal uncertainty of 6 degrees. The sensitivity to inclination manifests itself as an increase in the scatter from (a) to (b). The derived mass is also very sensitive to the measured value of the asymptotic velocity itself, so we have used only those galaxies for which this can be taken directly from a full rotation curve19,20,42. We do not employ profile widths; the velocity measurements here are independent of those in Fig. 1. In both cases, we have subtracted off the known atomic gas mass19,20,42, so what remains is essentially only the stars and any dark matter that may exist. A very strong correlation (regression coefficient = 0.85) is apparent in (a): this is the mass to light ratio — surface brightness conspiracy. The slope is consistent (within the errors) with the theoretical expectation s ~ Y-2 derived from the Tully-Fisher relation8. At the highest surface brightnesses, the mass to light ratio is similar to that expected for the stellar population. At the faintest surface brightnesses, it has increased by a factor of nearly ten, indicating increasing dark matter domination within the optical disc as surface brightness decreases or a very systematic change in the stellar population, or both. In (b), the mass to light ratio scatters about a constant value of 2. This mean value, and the lack of a trend, is what is expected for stellar populations17,21-24.

The Y-s relation is not predicted by any dark matter theory25,26. It can not be purely an effect of the stellar mass to light ratio, since no other stellar population indicator such as color21-24 or metallicity27,28 is so tightly correlated with surface brightness. In principle it could be an effect of the stellar mass fraction, as the gas mass to light ratio follows a relation very similar to that of total mass to light ratio20. We correct for this in Fig. 2 by subtracting the known atomic gas mass so that Y refers only to the stars and any dark matter. We do not correct for molecular gas, as this has never been detected in low surface brightness galaxies to rather sensitive limits30 so the total mass of such gas is unimportant if current estimates31 of the variation of the CO to H2 conversion factor with metallicity are correct. These corrections have no discernible effect at all in Fig. 2 because the dark mass is totally dominant. It is thus very hard to see how any evolutionary effect in the luminous matter can be relevant.

In the case of MOND, the mass to light ratio directly reflects that of the stellar population once the correction for gas mass fraction is made. There is no trend of Y* with surface brightness (Fig. 2b), a more natural result and one which is consistent with our studies of the stellar populations of low surface brightness galaxies21-23. These suggest that Y* should be roughly constant or slightly declining as surface brightness decreases, with much scatter. The mean value Y* = 2 is also expected from stellar evolutionary theory17, which always gives a number 0 < Y* < 10 and usually gives 0.5 < Y* < 3 for disk galaxies. This is particularly striking since Y* is the only free parameter allowed to MOND, and the observed mean is very close to that directly observed29 in the Milky Way (1.7 ± 0.5 M/L).

The essence of the problem is illustrated by Fig. 3, which shows the rotation curves of two galaxies of essentially the same luminosity but vastly different surface brightnesses. Though the asymptotic velocities are the same (as required by the Tully-Fisher relation), the rotation curve of the low surface brightness galaxy rises less quickly than that of the high surface brightness galaxy as expected if the mass is distributed like the light. Indeed, the ratio of surface brightnesses is correct to explain the ratio of velocities at small radii if both galaxies have similar mass to light ratios. However, if this continues to be the case as R increases, the low surface brightness galaxy should reach a lower asymptotic velocity simply because R* must be larger for the same L. That this does not occur is the problem, and poses very significant systematic constraints on the dark matter distribution.

The rotation curves of two galaxies, one of high surface brightness11 (NGC 2403; open circles) and one of low surface brightness19 (UGC 128; filled circles). The two galaxies have very nearly the same asymptotic velocity, and hence luminosity, as required by the Tully-Fisher relation. However, they have central surface brightnesses which differ by a factor of 13. The lines give the contributions to the rotation curves of the various components. Green: luminous disk. Blue: dark matter halo. Red: luminous disk (stars and gas) with MOND. Solid lines refer to NGC 2403 and dotted lines to UGC 128. The fits for NGC 2403 are taken from ref. 11, for which the stars have Y* = 1.5 M/L. For UGC 128, no specific fit is made: the blue and green dotted lines are simply the NGC 2403 fits scaled by the ratio of disk scale lengths h. This provides a remarkably good description of the UGC 128 rotation curve and illustrates one possible manifestation of the fine tuning problem: if disks have similar Y, the halo parameters p0 and R0 must scale with the disk parameters s0 and h while conspiring to keep the product p0 R02 fixed at any given luminosity. Note also that the halo of NGC 2403 gives an adequate fit to the rotation curve of UGC 128. This is another possible manifestation of the fine tuning problem: all galaxies of the same luminosity have the same halo, with Y systematically varying with s0 so that Y* goes to zero as s0 goes to zero. Neither of these is exactly correct because the contribution of the gas can not be set to zero as is mathematically possible with the stars. This causes the resulting fin tuning problems to be even more complex, involving more parameters. Alternatively, the green dotted line is the rotation curve expected by MOND for a galaxy with the observed luminous mass distribution of UGC 128.

Satisfying the Tully-Fisher relation has led to some expectation that haloes all have the same density structure. This simplest possibility is immediately ruled out. In order to obtain L ~ Va4 ~ MS, one might suppose that the mass surface density S is constant from galaxy to galaxy, irrespective of the luminous surface density s. This achieves the correct asymptotic velocity Va, but requires that the mass distribution, and hence the complete rotation curve, be essentially identical for all galaxies of the same luminosity. This is obviously not the case (Fig. 3), as the rotation curves of lower surface brightness galaxies rise much more gradually than those of higher surface brightness galaxies (also a prediction4 of MOND). It might be possible to have approximately constant density haloes if the highest surface brightness disks are maximal and the lowest minimal in their contribution to the inner parts of the rotation curves, but this then requires fine tuning of Y* with this systematically decreasing with surface brightness.

The expected form of the halo mass distribution depends on the dominant form of dark matter. This could exist in three general categories: baryonic (e.g., MACHOs), hot (e.g., neutrinos), and cold exotic particles (e.g., WIMPs). The first two make no specific predictions. Baryonic dark matter candidates are most subject to direct detection, and most plausible candidates have been ruled out10 with remaining suggestions of necessity sounding increasingly contrived32. Hot dark matter is not relevant to the present problem. Even if neutrinos have a small mass, their velocities considerably exceed the escape velocities of the haloes of low mass galaxies where the problem is most severe. Cosmological simulations involving exotic cold dark matter33,34 have advanced to the point where predictions are being made about the density structure of haloes. These take the form33,34 p(R) = pH/[R(R+RH)b] where pH characterises the halo density and RH its radius, with b ~ 2 to 3. The characteristic density depends on the mean density of the universe at the collapse epoch, and is generally expected to be greater for lower mass galaxies since these collapse first in such scenarios. This goes in the opposite sense of the observations, which show that low mass and low surface brightness galaxies are less, not more, dense. The observed behaviour is actually expected in scenarios which do not smooth on a particular mass scale and hence allow galaxies of the same mass to collapse at a variety of epochs25, but in this case the Tully-Fisher relation should not be universal. Worse, note that at small R < RH, p(R) ~ R-1. It has already been noted32,35 that such a steep interior density distribution is completely inconsistent with the few (4) analysed observations of dwarf galaxies. Our data19,20 confirm and considerably extend this conclusion for 24 low surface brightness galaxies over a wide range in luminosity.

The failure of the predicted exotic cold dark matter density distribution either rules out this form of dark matter, indicates some failing in the simulations (in spite of wide-spread consensus), or requires some mechanism to redistribute the mass. Feedback from star formation is usually invoked for the last of these, but this can not work for two reasons. First, an objection in principle: a small mass of stars and gas must have a dramatic impact on the distribution of the dominant dark mass, with which they can only interact gravitationally. More mass redistribution is required in less luminous galaxies since they start out denser but end up more diffuse; of course progressively less baryonic material is available to bring this about as luminosity declines. Second, an empirical objection: in this scenario, galaxies explode and gas is lost. However, progressively fainter and lower surface brightness galaxies, which need to suffer more severe explosions, are actually very gas rich.

Observationally, dark matter haloes are inferred to have density distributions1,2,11 with constant density cores, p(R) = p0/[1 + (R/R0)g]. Here, p0 is the core density and R0 is the core size with g ~ 2 being required to produce flat rotation curves. For g = 2, the rotation curve resulting from this mass distribution is V(R) = Va [1-(R0/R) tan-1({R/R0)]1/2 where the asymptotic velocity is Va = (4πG p0 R02)1/2. To satisfy the Tully-Fisher relation, Va, and hence the product p0 R02, must be the same for all galaxies of the same luminosity. To decrease the rate of rise of the rotation curves as surface brightness decreases, R0 must increase. Together, these two require a fine tuning conspiracy to keep the product p0 R02 constant while R0 must vary with the surface brightness at a given luminosity. Luminosity and surface brightness themselves are only weakly correlated, so there exists a wide range in one parameter at any fixed value of the other. Thus the structural properties of the invisible dark matter halo dictate those of the luminous disk, or vice versa. So, s and L give the essential information about the mass distribution without recourse to kinematic information.

A strict s-p0-R0 relation is rigorously obeyed only if the haloes are spherical and dominate throughout. This is probably a good approximation for low surface brightness galaxies but may not be for the those of the highest surface brightness. However, a significant non-halo contribution can at best replace one fine tuning problem with another (e.g., surface brightness being strongly correlated with the stellar population mass to light ratio instead of halo core density) and generally causes additional conspiracies.

There are two perspectives for interpreting these relations, with the preferred perspective depending strongly on the philosophical attitude one has towards empirical and theoretical knowledge. One view is that these are real relations which galaxies and their haloes obey. As such, they provide a positive link between models of galaxy formation and evolution and reality.

The other view is that this list of fine tuning requirements makes it rather unattractive to maintain the dark matter hypothesis. MOND provides an empirically more natural explanation for these observations. In addition to the Tully-Fisher relation, MOND correctly predicts the systematics of the shapes of the rotation curves of low surface brightness galaxies19,20 and fits the specific case of UGC 128 (Fig. 3). Low surface brightness galaxies were stipulated4 to be a stringent test of the theory because they should be well into the regime a < a0. This is now observed to be true, and to the limit of observational accuracy the predictions of MOND are confirmed. The critical acceleration scale a0 is apparently universal, so there is a single force law acting in galactic disks for which MOND provides the correct description. The cause of this could be either a particular dark matter distribution36 or a real modification of gravity. The former is difficult to arrange, and a single force law strongly supports the latter hypothesis since in principle the dark matter could have any number of distributions which would give rise to a variety of effective force laws. Even if MOND is not correct, it is essential to understand why it so closely describe the observations. Though the data can not exclude Newtonian dynamics, with a working empirical alternative (really an extension) at hand, we would not hesitate to reject as incomplete any less venerable hypothesis.

Nevertheless, MOND itself remains incomplete as a theory, being more of a Kepler’s Law for galaxies. It provides only an empirical description of kinematic data. While successful for disk galaxies, it was thought to fail in clusters of galaxies37. Recently it has been recognized that there exist two missing mass problems in galaxy clusters, one of which is now solved38: most of the luminous matter is in X-ray gas, not galaxies. This vastly improves the consistency of MOND with with cluster dynamics39. The problem with the theory remains a reconciliation with Relativity and thereby standard cosmology (which is itself in considerable difficulty38,40), and a lack of any prediction about gravitational lensing41. These are theoretical problems which need to be more widely addressed in light of MOND’s empirical success.

ACKNOWLEDGEMENTS. We thank R. Sanders and M. Milgrom for clarifying aspects of a theory with which we were previously unfamiliar. SSM is grateful to the Kapteyn Astronomical Institute for enormous hospitality during visits when much of this work was done. [Note added in 2020: this work was supported by a cooperative grant funded by the EU and would no longer be possible thanks to Brexit.]

REFERENCES

  1. Rubin, V. C. Science 220, 1339-1344 (1983).
  2. Sancisi, R. & van Albada, T. S. in Dark Matter in the Universe, IAU Symp. No. 117, (eds. Knapp, G. & Kormendy, J.) 67-80 (Reidel, Dordrecht, 1987).
  3. Milgrom, M. Astrophys. J. 270, 365-370 (1983).
  4. Milgrom, M. Astrophys. J. 270, 371-383 (1983).
  5. Bekenstein, K. G., & Milgrom, M. Astrophys. J. 286, 7-14
  6. Mannheim, P. D., & Kazanas, D. 1989, Astrophys. J. 342, 635-651 (1989).
  7. Sanders, R. H. Astron. Atrophys. Rev. 2, 1-28 (1990).
  8. Zwaan, M.A., van der Hulst, J. M., de Blok, W. J. G. & McGaugh, S. S. Mon. Not. R. astr. Soc., 273, L35-L38, (1995).
  9. Zaritsky, D. & White, S. D. M. Astrophys. J. 435, 599-610 (1994).
  10. Carr, B. Ann. Rev. Astr. Astrophys., 32, 531-590 (1994).
  11. Begeman, K. G., Broeils, A. H. & Sanders, R. H. Mon. Not. R. astr. Soc. 249, 523-537 (1991).
  12. Kent, S. M. Astr. J. 93, 816-832 (1987).
  13. Milgrom, M. Astrophys. J. 333, 689-693 (1988).
  14. Milgrom, M. & Braun, E. Astrophys. J. 334, 130-134 (1988).
  15. Tully, R. B., & Fisher, J. R. Astr. Astrophys., 54, 661-673 (1977).
  16. Aaronson, M., Huchra, J., & Mould, J. Astrophys. J. 229, 1-17 (1979).
  17. Larson, R. B. & Tinsley, B. M. Astrophys. J. 219, 48-58 (1978).
  18. Sprayberry, D., Bernstein, G. M., Impey, C. D. & Bothun, G. D. Astrophys. J. 438, 72-82 (1995).
  19. van der Hulst, J. M., Skillman, E. D., Smith, T. R., Bothun, G. D., McGaugh, S. S. & de Blok, W. J. G. Astr. J. 106, 548-559 (1993).
  20. de Blok, W. J. G., McGaugh, S. S., & van der Hulst, J. M. Mon. Not. R. astr. Soc. (submitted).
  21. McGaugh, S. S., & Bothun, G. D. Astr. J. 107, 530-542 (1994).
  22. de Blok, W. J. G., van der Hulst, J. M., & Bothun, G. D. Mon. Not. R. astr. Soc. 274, 235-259 (1995).
  23. Ronnback, J., & Bergvall, N. Astr. Astrophys., 292, 360-378 (1994).
  24. de Jong, R. S. Ph.D. thesis, University of Groningen (1995).
  25. Mo, H. J., McGaugh, S. S. & Bothun, G. D. Mon. Not. R. astr. Soc. 267, 129-140 (1994).
  26. Dalcanton, J. J., Spergel, D. N., Summers, F. J. Astrophys. J., (in press).
  27. McGaugh, S. S. Astrophys. J. 426, 135-149 (1994).
  28. Ronnback, J., & Bergvall, N. Astr. Astrophys., 302, 353-359 (1995).
  29. Kuijken, K. & Gilmore, G. Mon. Not. R. astr. Soc., 239, 605-649 (1989).
  30. Schombert, J. M., Bothun, G. D., Impey, C. D., & Mundy, L. G. Astron. J., 100, 1523-1529 (1990).
  31. Wilson, C. D. Astrophys. J. 448, L97-L100 (1995).
  32. Moore, B. Nature 370, 629-631 (1994).
  33. Navarro, J. F., Frenk, C. S., & White, S. D. M. Mon. Not. R. astr. Soc., 275, 720-728 (1995).
  34. Cole, S. & Lacey, C. Mon. Not. R. astr. Soc., in press.
  35. Flores, R. A. & Primack, J. R. Astrophys. J. 427, 1-4 (1994).
  36. Sanders, R. H., & Begeman, K. G. Mon. Not. R. astr. Soc. 266, 360-366 (1994).
  37. The, L. S., & White, S. D. M. Astron. J., 95, 1642-1651 (1988).
  38. White, S. D. M., Navarro, J. F., Evrard, A. E. & Frenk, C. S. Nature 366, 429-433 (1993).
  39. Sanders, R. H. Astron. Astrophys. 284, L31-L34 (1994).
  40. Bolte, M., & Hogan, C. J. Nature 376, 399-402 (1995).
  41. Bekenstein, J. D. & Sanders, R. H. Astrophys. J. 429, 480-490 (1994).
  42. Broeils, A. H., Ph.D. thesis, Univ. of Groningen (1992).

Statistical detection of the external field effect from large scale structure

Statistical detection of the external field effect from large scale structure

A unique prediction of MOND

One curious aspect of MOND as a theory is the External Field Effect (EFE). The modified force law depends on an absolute acceleration scale, with motion being amplified over the Newtonian expectation when the force per unit mass falls below the critical acceleration scale a0 = 1.2 x 10-10 m/s/s. Usually we consider a galaxy to be an island universe: it is a system so isolated that we need consider only its own gravity. This is an excellent approximation in most circumstances, but in principle all sources of gravity from all over the universe matter.

The EFE in dwarf satellite galaxies

An example of the EFE is provided by dwarf satellite galaxies – small galaxies orbiting a larger host. It can happen that the stars in such a dwarf feel a stronger acceleration towards the host than to each other – the external field exceeds the internal self-gravity of the dwarf . In this limit, they’re more a collection of stars in a common orbit around the larger host than they are a self-gravitating island universe.

A weird consequence of the EFE in MOND is that a dwarf galaxy orbiting a large host will behave differently than it would if it were isolated in the depths of intergalactic space. MOND obeys the Weak Equivalence Principle but does not obey local position invariance. That means it violates the Strong Equivalence Principle while remaining consistent with the Einstein Equivalence Principle, a subtle but important distinction about how gravity self-gravitates.

Nothing like this happens conventionally, with or without dark matter. Gravity is local; it doesn’t matter what the rest of the universe is doing. Larger systems don’t impact smaller ones except in the extreme of tidal disruption, where the null geodesics diverge within the lesser object because it is no longer small compared to the gradient in the gravitational field. An amusing, if extreme, example is spaghettification. The EFE in MOND is a much subtler effect: when near a host, there is an extra source of acceleration, so a dwarf satellite is not as deep in the MOND regime as the equivalent isolated dwarf. Consequently, there is less of a boost from MOND: stars move a little slower, and conventionally one would infer a bit less dark matter.

The importance of the EFE in dwarf satellite galaxies is well documented. It was essential to the a priori prediction of the velocity dispersion in Crater 2 (where MOND correctly anticipated a velocity dispersion of just 2 km/s where the conventional expectation with dark matter was more like 17 km/s) and to the correct prediction of that for NGC 1052-DF2 (13 rather than 20 km/s). Indeed, one can see the difference between isolated and EFE cases in matched pairs of dwarfs satellites of Andromeda. Andromeda has enough satellites that one can pick out otherwise indistinguishable dwarfs where one happens to be subject to the EFE while its twin is practically isolated. The speeds of stars in the dwarfs affected by the EFE are consistently lower, as predicted. For example, the relatively isolated dwarf satellite of Andromeda known as And XXVIII has a velocity dispersion of 5 km/s, while its near twin And XVII (which has very nearly the same luminosity and size) is affected by the EFE and consequently has a velocity dispersion of only 3 km/s.

The case of dwarf satellites is the most obvious place where the EFE occurs. In principle, it applies everywhere all the time. It is most obvious in dwarf satellites because the external field can be comparable to or even greater than the internal field. In principle, the EFE also matters even when smaller than the internal field, albeit only a little bit: the extra acceleration causes an object to be not quite as deep in the MOND regime.

The EFE from large scale structure

Even in the depths of intergalactic space, there is some non-zero acceleration due to everything else in the universe. This is very reminiscent of Mach’s Principle, which Einstein reputedly struggled hard to incorporate into General Relativity. I’m not going to solve that in a blog post, but note that MOND is much more in the spirit of Mach and Lorenz and Einstein than its detractors generally seem to presume.

Here I describe the apparent detection of the subtle effect of a small but non-zero background acceleration field. This is very different from the case of dwarf satellites where the EFE can exceed the internal field. It is just a small tweak to the dominant internal fields of very nearly isolated island universes. It’s like the lapping of waves on their shores: hardly relevant to the existence of the island, but a pleasant feature as you walk along the beach.

The universe has structure; there are places with lots of galaxies (groups, clusters, walls, sheets) and other places with very few (voids). This large scale structure should impose a low-level but non-zero acceleration field that should vary in amplitude from place to place and affect all galaxies in their outskirts. For this reason, we do not expect rotation curves to remain flat forever; even in MOND, there comes an over-under point where the gravity of everything else takes over from any individual object. A test particle at the see-saw balance point between the Milky Way and Andromeda may not know which galaxy to call daddy, but it sure knows they’re both there. The background acceleration field matters to such diverse subjects as groups of galaxies and Lyman alpha absorbers at high redshift.

As an historical aside, Lyman alpha absorbers at high redshift were initially found to deviate from MOND by many orders of magnitude. That was withoug the EFE. With the EFE, the discrepancy is much smaller, but persists. The amplitude of the EFE at high redshift is very uncertain. I expect it is higher in MOND than estimated because structure forms fast in MOND; this might suffice to solve the problem. Whether or not this is the case, it makes a good example of how a simple calculation can make MOND seem way off when it isn’t. If I had a dollar for every time I’ve seen that happen, I could fly first class.

I made an early estimate of the average intergalactic acceleration field, finding the typical environmental acceleration eenv to be about 2% of a0 (eenv ~ 2.6 x 10-12 m/s/s, see just before eq. 31). This is highly uncertain and should be location dependent, differing a lot from voids to richer environments. It is hard to find systems that probe much below 10% of a0, and the effect it would cause on the average (non-satellite) galaxy is rather subtle, so I have mostly neglected this background acceleration as, well, pretty negligible.

This changed recently thanks to Kyu-Hyun Chae and Harry Desmond. We met at a conference in Bonn a year ago September. (Remember travel? I used to complain about how much travel work involved. Now I miss it – especially as experience demonstrates that some things really do require in-person interaction.) Kyu thought we should be able to tease out the EFE from SPARC data in a statistical way, and Harry offered to make a map of the environmental acceleration based on the locations of known galaxies. This is a distinct improvement over the crude average of my ancient first estimate as it specifies the EFE that ought to occur at the location of each individual galaxy. The results of this collaboration were recently published open-access in the Astrophysical Journal.

This did not come easily. I think I mentioned that the predicted effect is subtle. We’re no longer talking about the effect of a big host on a tiny dwarf up close to it. We’re talking about the background of everything on giant galaxies. Space is incomprehensibly vast, so every galaxy is far, far away, and the expected effect is small. So my first reaction was “Sure. Great idea. No way can we do this with current data.” I am please to report that I was wrong: with lots of hard work, perseverance, and the power of Bayesian statistics, we have obtained a positive detection of the EFE.

One reason for my initial skepticism was the importance of data quality. The rotation curves in SPARC are a heterogeneous lot, being the accumulated work of an entire community of radio astronomers over the course of several decades. Some galaxies are bright and their data stupendous, others… not so much. Having started myself working on low surface brightness galaxies – the least stupendous of them all – and having spent much of the past quarter century working long and hard to improve the data, I tend to be rather skeptical of what can be accomplished.

An example of a galaxy with good data is NGC 5055 (aka M63, aka the Sunflower galaxy, pictured atop as viewed by the Hubble Space Telescope). NGC 5055 happens to reside in a relatively high acceleration environment for a spiral, with eenv ~ 9% of a0. For comparison, the acceleration at the last measured point of its rotation curve is about 15% of a0. So they’re within a factor of two, which is pretty much the strongest effect in the whole sample. This additional bit of acceleration means NGC 5055 is not quite as deep in the MOND regime as it would be all by its lonesome, with the net effect that the rotation curve is predicted to decline a little bit faster than it would in the isolated case, as you can see in the figure below. See that? Or is it too subtle? I think I mentioned the effect was pretty subtle.

The rotation curve of NGC 5055 (velocity in km/s vs. radius in kpc). The blue and green bands are the rotation expected from the observed stars and gas. The red band is the MOND fit with (left) and without (right) the external field effect (EFE) from Chae et al. ΔBIC is a statistical measure that indicates that the fit with the EFE is a meaningful improvement over that without (in technical terms, “way better”).

That this case works well is encouraging. I like to start with a good case: if you can’t see what you’re looking for in the best of the data, stop. But I still didn’t hold out much hope for the rest of the sample. Then Kyu showed that the most isolated galaxies – those subject to the lowest environmental accelerations – showed no effect. That sounds boring, but null results are important. It could happen that the environmental acceleration was a promiscuous free parameter that appears to improve a fit without really adding any value. That it declined to do that in cases where it shouldn’t was intriguing. The galaxies in the most extreme environments show an effect when they should, but don’t when they shouldn’t.

Statistical detection of the EFE

Statistics become useful for interpreting the entirety of the large sample of galaxies. Because of the variability in data quality, we knew some cases would go astray. But we only need to know if the fit for any galaxy is improved relative to the case where the EFE is neglected, so each case sets its own standard. This relative measure is more robust than analyses that require an assessment of the absolute fit quality. All we’re really asking the data is whether the presence of an EFE helps. To my initial and ongoing amazement, it does.

The environmental acceleration predicted by the distribution of known galaxies, eenv, against the amplitude e of an external field that provides the best-fit to each rotation curve (Fig. 5 of Chae et al).

The figure above shows the amplitude of the EFE that best fits each rotation curve along the x-axis. The median is 5% of a0. This is non-zero at 4.7σ, and our detection of the EFE is comparable in quality to that of the Baryon Acoustic Oscillation or the accelerated expansion of the universe when these were first accepted. Of course, these were widely anticipated effects, while the EFE is expected only in MOND. Personally, I think it is a mistake to obsess over the number of σ, which is not as robust as people like to think. I am more impressed that the peak of the color map (the darkest color in the data density map above) is positive definite and clearly non-zero.

Taken together, the data prefer a small but clearly non-zero EFE. That’s a statistical statement for the whole sample. Of course, the amplitude (e) of the EFE inferred for individual galaxies is uncertain, and is occasionally negative. This is unphysical: it shouldn’t happen. Nevertheless, it is statistically expected given the amount of uncertainty in the data: for error bars this size, some of the data should spill over to e < 0.

I didn’t initially think we could detect the EFE in this way because I expected that the error bars would wash out the effect. That is, I expected the colored blob above would be smeared out enough that the peak would encompass zero. That’s not what happened, me of little faith. I am also encouraged that the distribution skews positive: the error bars scatter points in both direction, and wind up positive more often than negative. That’s an indication that they started from an underlying distribution centered on e > 0, not e = 0.

The y-axis in the figure above is the estimate of the environmental acceleration based on the 2M++ galaxy catalog. This is entirely independent of the best fit e from rotation curves. It is the expected EFE from the distribution of mass that we know about. The median environmental EFE found in this way is 3% of a0. This is pretty close to the 2% I estimated over 20 years ago. Given the uncertainties, it is quite compatible with the median of 5% found from the rotation curve fits.

In an ideal world where all quantities are perfectly known, there would be a correlation between the external field inferred from the best fit to the rotation curves and that of the environment predicted by large scale structure. We are nowhere near to that ideal. I can conceive of improving both measurements, but I find it hard to imagine getting to the point where we can see a correlation between e and eenv. The data quality required on both fronts would be stunning.

Then again, I never thought we could get this far, so I am game to give it a go.

Oh… you don’t want to look in there

Oh… you don’t want to look in there

This post is a recent conversation with David Garofalo for his blog.


Today we talk to Dr. Stacy McGaugh, Chair of the Astronomy Department at Case Western Reserve University.

David: Hi Stacy. You had set out to disprove MOND and instead found evidence to support it. That sounds like the poster child for how science works. Was praise forthcoming?

Stacy: In the late 1980s and into the 1990s, I set out to try to understand low surface brightness galaxies. These are diffuse systems of stars and gas that rotate like the familiar bright spirals, but whose stars are much more spread out. Why? How did these things come to be? Why were they different from brighter galaxies? How could we explain their properties? These were the problems I started out working on that inadvertently set me on a collision course with MOND.

I did not set out to prove or disprove either MOND or dark matter. I was not really even aware of MOND at that time. I had head of it only on a couple of occasions, but I hadn’t payed any attention, and didn’t really know anything about it. Why would I bother? It was already well established that there had to be dark matter.

I worked to develop our understanding of low surface brightness galaxies in the context of dark matter. Their blue colors, low metallicities, high gas fractions, and overall diffuse nature could be explained if they had formed in dark matter halos that are themselves lower than average density: they occupy the low concentration side of the distribution of dark matter halos at a given mass. I found this interpretation quite satisfactory, so gave me no cause to doubt dark matter to that point.

This picture made two genuine predictions that had yet to be tested. First, low surface brightness galaxies should be less strongly clustered than brighter galaxies. Second, having their mass spread over a larger area, they should shift off of the Tully-Fisher relation defined by denser galaxies. The first prediction came true, and for a period I was jubilant that we had made an important new contribution to out understanding of both galaxies and dark matter. The second prediction failed badly: low surface brightness galaxies adhere to the same Tully-Fisher relation that other galaxies follow.

I tried desperately to understand the failure of the second prediction in terms of dark matter. I tried what seemed like a thousand ways to explain this, but ultimately they were all tautological: I could only explain it if I assumed the answer from the start. The adherence of low surface brightness galaxies to the Tully-Fisher relation poses a serious fine-tuning problem: the distribution of dark matter must be adjusted to exactly counterbalance that of the visible matter so as not to leave any residuals. This makes no sense, and anyone who claims it does is not thinking clearly.

It was in this crisis of comprehension in which I became aware that MOND predicted exactly what I was seeing. No fine-tuning was required. Low surface brightness galaxies followed the same Tully-Fisher relation as other galaxies because the modified force law stipulates that they must. It was only at this point (in the mid-’90s) at which I started to take MOND seriously. If it had got this prediction right, what else did it predict?

I was still convinced that the right answer had to be dark matter. There was, after all, so much evidence for it. So this one prediction must be a fluke; surely it would fail the next test. That was not what happened: MOND passed test after test after test, successfully predicting observations both basic and detailed that dark matter theory got wrong or did not even address. It was only after this experience that I realized that what I thought was evidence for dark matter was really just evidence that something was wrong: the data cannot be explained with ordinary gravity without invisible mass. The data – and here I mean ALL the data – were mostly ambiguous: they did not clearly distinguish whether the problem was with mass we couldn’t see or with the underlying equations from which we inferred the need for dark matter.

So to get back to your original question, yes – this is how science should work. I hadn’t set out to test MOND, but I had inadvertently performed exactly the right experiment for that purpose. MOND had its predictions come true where the predictions of other theories did not: both my own theory and those of others who were working in the context of dark matter. We got it wrong while MOND got it right. That led me to change my mind: I had been wrong to be sure the answer had to be dark matter, and to be so quick to dismiss MOND. Admitting this was the most difficult struggle I ever faced in my career.

David: From the perspective of dark matter, how does one understand MOND’s success?

Stacy: One does not.

That the predictions of MOND should come true in a universe dominated by dark matter makes no sense.

Before I became aware of MOND, I spent lots of time trying to come up with dark matter-based explanations for what I was seeing. It didn’t work. Since then, I have continued to search for a viable explanation with dark matter. I have not been successful. Others have claimed such success, but whenever I look at their work, it always seems that what they assert to be a great success is just a specific elaboration of a model I had already considered and rejected as obviously unworkable. The difference boils down to Occam’s razor. If you give dark matter theory enough free parameters, it can be adjusted to “predict” pretty much anything. But the best we can hope to do with dark matter theory is to retroactively explain what MOND successfully predicted in advance. Why should we be impressed by that?

David: Does MOND fail in clusters?

Stacy: Yes and no: there are multiple tests in clusters. MOND passes some and flunks others – as does dark matter.

The most famous test is the baryon fraction. This should be one in MOND – all the mass is normal baryonic matter. With dark matter, it should be the cosmic ratio of normal to dark matter (about 1:5).

MOND fails this test: it explains most of the discrepancy in clusters, but not all of it. The dark matter picture does somewhat better here, as the baryon fraction is close to the cosmic expectation — at least for the richest clusters of galaxies. In smaller clusters and groups of galaxies, the normal matter content falls short of the cosmic value. So both theories suffer a “missing baryon” problem: MOND in rich clusters; dark matter in everything smaller.

Another test is the mass-temperature relation. Both theories predict a relation between the mass of a cluster and the temperature of the gas it contains, but they predict different slopes for this relation. MOND gets the slope right but the amplitude wrong, leading to the missing baryon problem above. Dark matter gets the amplitude right for the most massive clusters, but gets the slope wrong – which leads to it having a missing baryon problem for systems smaller than the largest clusters.

There are other tests. Clusters continue to merge; the collision velocity of merging clusters is predicted to be higher in MOND than with dark matter. For example, the famous bullet cluster, which is often cited as a contradiction to MOND, has a collision speed that is practically impossible with dark matter: there just isn’t enough time for the two components of the bullet to accelerate up to the observed relative speed if they fall together under the influence of normal gravity and the required amount of dark mass. People have argued over the severity of this perplexing problem, but the high collision speed happens quite naturally in MOND as a consequence of its greater effective force of attraction. So, taken at face value, the bullet cluster both confirms and refutes both theories!

I could go on… one expects clusters to form earlier and become more massive in MOND than in dark matter. There are some indications that this is the case – the highest redshift clusters came as a surprise to conventional structure formation theory – but the relative numbers of clusters as a function of mass seems to agree well with current expectations with dark matter. So clusters are a mixed bag.

More generally, there is a widespread myth that MOND fits rotation curves, but gets nothing else right. This is what I expected to find when I started fact checking, but the opposite is true. MOND explains a huge variety of data well. The presumptive superiority of dark matter is just that – a presumption.

David: At a physics colloquium two decades ago, Vera Rubin described how theorists were willing and eager to explain her data to her. At an astronomy colloquium a few years later, you echoed that sentiment in relation to your data on velocity curves. One concludes that theorists are uniquely insightful and generous people. Is there anyone you would like to thank for putting you straight? 
 
Stacy:  So they perceive themselves to be.

MOND has made many successful a priori predictions. This is the golden standard of the scientific method. If there is another explanation for it, I’d like to know what it is.

As your questions supposes, many theorists have offered such explanations. At most one of them can be correct. I have yet to hear a satisfactory explanation.


David: What are MOND people working on these days? 
 
Stacy: Any problem that is interesting in extragalactic astronomy is interesting in the context of MOND. Outstanding questions include planes of satellite dwarf galaxies, clusters of galaxies, the formation of large scale structure, and the microwave background. MOND-specific topics include the precise value of the MOND acceleration constant, predicting the velocity dispersions of dwarf galaxies, and the search for the predicted external field effect, which is a unique signature of MOND.

The phrasing of this question raises a sociological issue. I don’t know what a “MOND person” is. Before now, I have only heard it used as a pejorative.

I am a scientist who has worked on many topics. MOND is just one of them. Does that make me a “MOND person”? I have also worked on dark matter, so am I also a “dark matter person”? Are these mutually exclusive?

I have attended conferences where I have heard people say ‘“MOND people” do this’ or ‘“MOND people” fail to do that.’ Never does the speaker of these words specify who they’re talking about: “MOND people” are a nameless Other. In all cases, I am more familiar with the people and the research they pretend to describe, but in no way do I recognize what they’re talking about. It is just a way of saying “Those People” are Bad.

There are many experts on dark matter in the world. I am one of them. There are rather fewer experts on MOND. I am also one of them. Every one of these “MOND people” is also an expert on dark matter. This situation is not reciprocated: many experts on dark matter are shockingly ignorant about MOND. I was once guilty of that myself, but realized that ignorance is not a sound basis on which to base a scientific judgement.

David: Are you tired of getting these types of questions? 
 
Stacy: Yes and no.

No, in that these are interesting questions about fundamental science. That is always fun to talk about.

Yes, in that I find myself having the same arguments over and over again, usually with scientists who remain trapped in the misconceptions I suffered myself a quarter century ago, but whose minds are closed to ideas that threaten their sacred cows. If dark matter is a real, physical substance, then show me a piece already.

Big Trouble in a Deep Void

Big Trouble in a Deep Void

The following is a guest post by Indranil Banik, Moritz Haslbauer, and Pavel Kroupa (bios at end) based on their new paper

Modifying gravity to save cosmology

Cosmology is currently in a major crisis because of many severe tensions, the most serious and well-known being that local observations of how quickly the Universe is expanding (the so-called ‘Hubble constant’) exceed the prediction of the standard cosmological model, ΛCDM. This prediction is based on the cosmic microwave background (CMB), the most ancient light we can observe – which is generally thought to have been emitted about 400,000 years after the Big Bang. For ΛCDM to fit the pattern of fluctuations observed in the CMB by the Planck satellite and other experiments, the Hubble constant must have a particular value of 67.4 ± 0.5 km/s/Mpc. Local measurements are nearly all above this ‘Planck value’, but are consistent with each other. In our paper, we use a local value of 73.8 ± 1.1 km/s/Mpc using a combination of supernovae and gravitationally lensed quasars, two particularly precise yet independent techniques.

This unexpectedly rapid local expansion of the Universe could be due to us residing in a huge underdense region, or void. However, a void wide and deep enough to explain the Hubble tension is not possible in ΛCDM, which is built on Einstein’s theory of gravity, General Relativity. Still, there is quite strong evidence that we are indeed living within a large void with a radius of about 300 Mpc, or one billion light years. This evidence comes from many surveys covering the whole electromagnetic spectrum, from radio to X-rays. The most compelling evidence comes from analysis of galaxy number counts in the near-infrared, giving the void its name of the Keenan-Barger-Cowie (KBC) void. Gravity from matter outside the void would pull more than matter inside it, making the Universe appear to expand faster than it actually is for an observer inside the void. This ‘Hubble bubble’ scenario (depicted in Figure 1) could solve the Hubble tension, a possibility considered – and rejected – in several previous works (e.g. Kenworthy+ 2019). We will return to their objections against this idea.

Figure 1: Illustration of the Universe’s large scale structure. The darker regions are voids, and the bright dots represent galaxies. The arrows show how gravity from surrounding denser regions pulls outwards on galaxies in a void. If we were living in such a void (as indicated by the yellow star), the Universe would expand faster locally than it does on average. This could explain the Hubble tension. Credit: Technology Review

One of the main objections seemed to be that since such a large and deep void is incompatible with ΛCDM, it can’t exist. This is a common way of thinking, but the problem with it was clear to us from a very early stage. The first part of this logic is sound – assuming General Relativity, a hot Big Bang, and that the state of the Universe at early times is apparent in the CMB (i.e. it was flat and almost homogeneous then), we are led to the standard flat ΛCDM model. By studying the largest suitable simulation of this model (called MXXL), we found that it should be completely impossible to find ourselves inside a void with the observed size and depth (or fractional underdensity) of the KBC void – this possibility can be rejected with more confidence than the discovery of the Higgs boson when first announced. We therefore applied one of the leading alternative gravity theories called Milgromian Dynamics (MOND), a controversial idea developed in the early 1980s by Israeli physicist Mordehai Milgrom. We used MOND (explained in a simple way here) to evolve a small density fluctuation forwards from early times, studying if 13 billion years later it fits the density and velocity field of the local Universe. Before describing our results, we briefly introduce MOND and explain how to use it in a potentially viable cosmological framework. Astronomers often assume MOND cannot be extended to cosmological scales (typically >10 Mpc), which is probably true without some auxiliary assumptions. This is also the case for General Relativity, though in that case the scale where auxiliary assumptions become crucial is only a few kpc, namely in galaxies.

MOND was originally designed to explain why galaxies rotate faster in their outskirts than they should if one applies General Relativity to their luminous matter distribution. This discrepancy gave rise to the idea of dark matter halos around individual galaxies. For dark matter to cluster on such scales, it would have to be ‘cold’, or equivalently consist of rather heavy particles (above a few thousand eV/c2, or a millionth of a proton mass). Any lighter and the gravity from galaxies could not hold on to the dark matter. MOND assumes these speculative and unexplained cold dark matter haloes do not exist – the need for them is after all dependent on the validity of General Relativity. In MOND once the gravity from any object gets down to a certain very low threshold called a0, it declines more gradually with increasing distance, following an inverse distance law instead of the usual inverse square law. MOND has successfully predicted many galaxy rotation curves, highlighting some remarkable correlations with their visible mass. This is unexpected if they mostly consist of invisible dark matter with quite different properties to visible mass. The Local Group satellite galaxy planes also strongly favour MOND over ΛCDM, as explained using the logic of Figure 2 and in this YouTube video.

Figure 2: the satellite galaxies of the Milky Way and Andromeda mostly lie within thin planes. These are difficult to form unless the galaxies in them are tidal dwarfs born from the interaction of two major galaxies. Since tidal dwarfs should be free of dark matter due to the way they form, the satellites in the satellite planes should have rather weak self-gravity in ΛCDM. This is not the case as measured from their high internal velocity dispersions. So the extra gravity needed to hold galaxies together should not come from dark matter that can in principle be separated from the visible.

To extend MOND to cosmology, we used what we call the νHDM framework (with ν pronounced “nu”), originally proposed by Angus (2009). In this model, the cold dark matter of ΛCDM is replaced by the same total mass in sterile neutrinos with a mass of only 11 eV/c2, almost a billion times lighter than a proton. Their low mass means they would not clump together in galaxies, consistent with the original idea of MOND to explain galaxies with only their visible mass. This makes the extra collisionless matter ‘hot’, hence the name of the model. But this collisionless matter would exist inside galaxy clusters, helping to explain unusual configurations like the Bullet Cluster and the unexpectedly strong gravity (even in MOND) in quieter clusters. Considering the universe as a whole, νHDM has the same overall matter content as ΛCDM. This makes the overall expansion history of the universe very similar in both models, so both can explain the amounts of deuterium and helium produced in the first few minutes after the Big Bang. They should also yield similar fluctuations in the CMB because both models contain the same amount of dark matter. These fluctuations would get somewhat blurred by sterile neutrinos of such a low mass due to their rather fast motion in the early Universe. However, it has been demonstrated that Planck data are consistent with dark matter particles more massive than 10 eV/c2. Crucially, we showed that the density fluctuations evident in the CMB typically yield a gravitational field strength of 21 a0 (correcting an earlier erroneous estimate of 570 a0 in the above paper), making the gravitational physics nearly identical to General Relativity. Clearly, the main lines of early Universe evidence used to argue in favour of ΛCDM are not sufficiently unique to distinguish it from νHDM (Angus 2009).

The models nonetheless behave very differently later on. We estimated that for redshifts below about 50 (when the Universe is older than about 50 million years), the gravity would typically fall below a0 thanks to the expansion of the Universe (the CMB comes from a redshift of 1100). After this ‘MOND moment’, both the ordinary matter and the sterile neutrinos would clump on large scales just like in ΛCDM, but there would also be the extra gravity from MOND. This would cause structures to grow much faster (Figure 3), allowing much wider and deeper voids.


Figure 3: Evolution of the density contrast within a 300 co-moving Mpc sphere in different Newtonian (red) and MOND (blue) models, shown as a function of the Universe’s size relative to its present size (this changes almost linearly with time). Notice the much faster structure growth in MOND. The solid blue line uses a time-independent external field on the void, while the dot-dashed blue line shows the effect of a stronger external field in the past. This requires a deeper initial void to match present-day observations.

We used this basic framework to set up a dynamical model of the void. By making various approximations and trying different initial density profiles, we were able to simultaneously fit the apparent local Hubble constant, the observed density profile of the KBC void, and many other observables like the acceleration parameter, which we come to below. We also confirmed previous results that the same observables rule out standard cosmology at 7.09σ significance. This is much more than the typical threshold of 5σ used to claim a discovery in cases like the Higgs boson, where the results agree with prior expectations.

One objection to our model was that a large local void would cause the apparent expansion of the Universe to accelerate at late times. Equivalently, observations that go beyond the void should see a standard Planck cosmology, leading to a step-like behaviour near the void edge. At stake is the so-called acceleration parameter q0 (which we defined oppositely to convention to correct a historical error). In ΛCDM, we expect q0 = 0.55, while in general much higher values are expected in a Hubble bubble scenario. The objection of Kenworthy+ (2019) was that since the observed q0 is close to 0.55, there is no room for a void. However, their data analysis fixed q0 to the ΛCDM expectation, thereby removing any hope of discovering a deviation that might be caused by a local void. Other analyses (e.g. Camarena & Marra 2020b) which do not make such a theory-motivated assumption find q0 = 1.08, which is quite consistent with our best-fitting model (Figure 4). We also discussed other objections to a large local void, for instance the Wu & Huterer (2017) paper which did not consider a sufficiently large void, forcing the authors to consider a much deeper void to try and solve the Hubble tension. This led to some serious observational inconsistencies, but a larger and shallower void like the observed KBC void seems to explain the data nicely. In fact, combining all the constraints we applied to our model, the overall tension is only 2.53σ, meaning the data have a 1.14% chance of arising if ours were the correct model. The actual observations are thus not the most likely consequence of our model, but could plausibly arise if it were correct. Given also the high likelihood that some if not all of the observational errors we took from publications are underestimates, this is actually a very good level of consistency.

Figure 4: The predicted local Hubble constant (x-axis) and acceleration parameter (y-axis) as measured with local supernovae (black dot, with red error ellipses). Our best-fitting models with different initial void density profiles (blue symbols) can easily explain the observations. However, there is significant tension with the prediction of ΛCDM based on parameters needed to fit Planck observations of the CMB (green dot). In particular, local observations favour a higher acceleration parameter, suggestive of a local void.

Unlike other attempts to solve the Hubble tension, ours is unique in using an already existing theory (MOND) developed for a different reason (galaxy rotation curves). The use of unseen collisionless matter made of hypothetical sterile neutrinos is still required to explain the properties of galaxy clusters, which otherwise do not sit well with MOND. In addition, these neutrinos provide an easy way to explain the CMB and background expansion history, though recently Skordis & Zlosnik (2020) showed that this is possible in MOND with only ordinary matter. In any case, MOND is a theory of gravity, while dark matter is a hypothesis that more matter exists than meets the eye. The ideas could both be right, and should be tested separately.

A dark matter-MOND hybrid thus appears to be a very promising way to resolve the current crisis in cosmology. Still, more work is required to construct a fully-fledged relativistic MOND theory capable of addressing cosmology. This could build on the theory proposed by Skordis & Zlosnik (2019) in which gravitational waves travel at the speed of light, which was considered to be a major difficulty for MOND. We argued that such a theory would enhance structure formation to the required extent under a wide range of plausible theoretical assumptions, but this needs to be shown explicitly starting from a relativistic MOND theory. Cosmological structure formation simulations are certainly required in this scenario – these are currently under way in Bonn. Further observations would also help greatly, especially of the matter density in the outskirts of the KBC void at distances of about 500 Mpc. This could hold vital clues to how quickly the void has grown, helping to pin down the behaviour of the sought-after MOND theory.

There is now a very real prospect of obtaining a single theory that works across all astronomical scales, from the tiniest dwarf galaxies up to the largest structures in the Universe & its overall expansion rate, and from a few seconds after the birth of the Universe until today. Rather than argue whether this theory looks more like MOND or standard cosmology, what we should really do is combine the best elements of both, paying careful attention to all observations.


Authors

Indranil Banik is a Humboldt postdoctoral fellow in the Helmholtz Institute for Radiation and Nuclear Physics (HISKP) at the University of Bonn, Germany. He did his undergraduate and masters at Trinity College, Cambridge, and his PhD at Saint Andrews under Hongsheng Zhao. His research focuses on testing whether gravity continues to follow the Newtonian inverse square law at the low accelerations typical of galactic outskirts, with MOND being the best-developed alternative.

Moritz Haslbauer is a PhD student at the Max Planck Institute for Radio Astronomy (MPIfR) in Bonn. He obtained his undergraduate degree from the University of Vienna and his masters from the University of Bonn. He works on the formation and evolution of galaxies and their distribution in the local Universe in order to test different cosmological models and gravitational theories. Prof. Pavel Kroupa is his PhD supervisor.

Pavel Kroupa is a professor at the University of Bonn and professorem hospitem at Charles University in Prague. He went to school in Germany and South Africa, studied physics in Perth, Australia, and obtained his PhD at Trinity College, Cambridge, UK. He researches stellar populations and their dynamics as well as the dark matter problem, therewith testing gravitational theories and cosmological models.

Link to the published science paper.

YouTube video on the paper

Contact: ibanik@astro.uni-bonn.de.

Indranil Banik’s YouTube channel.

A lengthy personal experience with experimental searches for WIMPs

A lengthy personal experience with experimental searches for WIMPs

This post is adopted from a web page I wrote in 2008, before starting this blog. It covers some ground that I guess is now historic about things that were known about WIMPs from their beginnings in the 1980s, and experimental searches therefore. In part, I was just trying to keep track of experimental limits, with updates added as noted since the first writing. This is motivated now by some troll on Twitter trying to gaslight people into believing there were no predictions for WIMPs prior to the discovery of the Higgs boson. Contrary to this assertion, the field had already gone through many generations of predictions, with the theorists moving the goal posts every time a prediction was excluded. I have colleagues involved in WIMP searches that have left that field in disgust at having the goal posts moved on them: what good are the experimental searches if, every time they reach the promised land, they’re simply told the promised land is over the next horizon? You experimentalists just keep your noses to the grindstone, and don’t bother the Big Brains with any inconvenient questions!

We were already very far down this path in 2008 – so far down it, I called it the express elevator to hell, since the predicted interaction cross-section kept decreasing to evade experimental limits. Since that time, theorists have added sideways in mass to their evasion tactics, with some advocating for “light” dark matter (less in mass than the 2 GeV Lee-Weinberg limit for the minimum WIMP mass) while others advocate for undetectably high mass WIMPzillas (because there’s a lot of unexplored if unexpected parameter space at high mass to roam around in before hitting the unitarity bound. Theorists love to go free range.)

These evasion tactics had become ridiculous well before the Higgs was discovered in 2012. Many people don’t seem to have memories that long, so let’s review. Text in normal font was written in 2008; later additions are italicized.

Seeking WIMPs in all the wrong places

This article has been updated many times since it was first written in 2008, at which time we were already many years down the path it describes.

The Need for Dark Matter
Extragalactic systems like spiral galaxies and clusters of galaxies exhibit mass discrepancies. The application of Newton’s Law of Gravity to the observed stars and gas fails to explain the rapid observed motions. This leads to the inference that some form of invisible mass – dark matter – dominates the dynamics of the universe.

WIMPs
If asked what the dark matter is, most scientists working in the field will respond honestly that we have no idea. There are many possible candidates. Some, like MACHOs (Massive Compact Halo Objects, perhaps brown dwarfs) have essentially been ruled out. However, in our heart of hearts there is a huge odds-on favorite: the WIMP.

WIMP stands for Weakly Interacting Massive Particle. This is an entire class of new fundamental particles that emerge from supersymmetry. Supersymmetry (SUSY) is a theoretical notion by which known elementary particles have supersymmetric partner particles. This notion is not part of the highly successful Standard Model of particle physics, but might exist provided that the Higgs boson exists. In the so-called Minimal Supersymmetric Standard Model (MSSM), which was hypothesized to explain the hierarchy problem (i.e., why do the elementary particles have the various masses that they do), the lightest stable supersymmetric particle is the neutralino. This is the WIMP that presumably makes up the dark matter.

2020 update: the Higgs does indeed exist. Unfortunately, it is too normal. That is, it fits perfectly well with the Standard Model without any need for SUSY. Indeed, it is so normal that MSSM is pretty much excluded. One can persist with more complicated theories (as always) but to date SUSY has flunked every experimental test, including the “golden test” of the decay of the Bs meson. Never heard of the golden test? The theorists were all about it until SUSY flunked it; now they never seem to mention it.

Cosmology, meet particle physics
There is a confluence of history in the development of previously distinct fields. The need for cosmological dark matter became clear in the 1980s, the same time that MSSM was hypothesized to solve the hierarchy problem in particle physics. Moreover, it was quickly realized that the cosmological dark matter could not be normal (“baryonic“) matter. New fundamental particles therefore seemed a natural fit.

The cosmic craving for CDM
There are two cosmological reason why we need non-baryonic cold dark matter (CDM):

  1. The measured density of gravitating mass appears to considerably exceed that in normal matter as constrained by Big Bang Nucleosynthesis (BBN): Ωm = 6 Ωb (so Ωnot baryons = 5 Ωbaryons).
  2. Gravity is too weak to grow the presently observed structures (e.g., galaxies, clusters, filaments) from the smooth initial condition observed in the cosmic microwave background (CMB) unless something speeds up the process. Extra mass will do this, but it must not interact with the photons of the CMB the way ordinary matter does.

By themselves, either of these arguments are strong. Together, they were compelling enough to launch the CDM paradigm. (Like most scientists of my generation, I certainly bought into it.)

From the astronomical perspective, all that is required is that the dark matter be non-baryonic and dynamically cold. Non-baryonic so that it does not participate in Big Bang Nucleosynthesis or interact with photons (a good way to remain invisible!), and dynamically cold (i.e., slow moving, not relativistic) so that it can clump and form gravitationally bound structures. Many things might satisfy these astronomical requirements. For example, supermassive black holes fit the bill, though they would somehow have to form in the first second of creation in order not to impact BBN.

The WIMP Miracle
From a particle physics perspective, the early universe was a high energy place where energy and mass could switch from one form to the other freely as enshrined in Einstein’s E = mc2. Pairs of particles and their antiparticles could come and go. However, as the universe expands, it cools. As it cools, it loses the energy necessary to create particle pairs. When this happens for a particular particle depends on the mass of the particle – the more mass, the more energy is required, and the earlier that particle-antiparticle pair “freeze out.” After freeze-out, the remaining particle-antiparticle pairs can mutually annihilate, leaving only energy. To avoid this fate, there must either be some asymmetry (apparently there was about one extra proton for every billion proton-antiproton pairs – an asymmetry on which our existence depends even if we don’t yet understand it) or the “cross section” – the probability for interacting – must be so low that particles and their antiparticles go their separate ways without meeting often enough to annihilate completely. This process leaves some relic density that depends on the properties of the particles.

If one asks what relic density is necessary to make up the cosmic dark matter, the cross section that comes out is about that of the weak nuclear force. A particle that interacts through the weak force but not the electromagnetic force will have the about the right relic density. Moreover, it won’t interfere with BBN or the CMB. The WIMPs hypothesized by supersymmetry fit the bill for cosmologists’ CDM. This coincidence of scales – the relic density and the weak force interaction scale – is sometimes referred to as the “WIMP miracle” and was part of the motivation to adopt the WIMP as the leading candidate for cosmological dark matter.

WIMP detection experiments
WIMPs as CDM is a well posed scientific hypothesis subject to experimental verification. From astronomical measurements, we know how much we need in the solar neighborhood – about 0.3 GeV c-2 cm-3. (That means there are a few hundred WIMPs passing through your body at any given moment, depending on the exact mass of the particle.) From particle physics, we know the weak interaction cross section, so can calculate the probability of a WIMP interacting with normal matter. In this respect, WIMPs are very much like neutrinos – they can pass right through solid matter because they do not experience the electromagnetic interactions that make ordinary matter solid. But once in a very rare while, they may come close enough to an atomic nucleus to interact with it via the weak force. This is the signature that can be sought experimentally.

There is a Nobel Prize waiting for whoever discovers the dark matter, so there are now many experiments seeking to do so. Generically, these use very pure samples of some element (like Germanium or Argon or Xenon) to act as targets for the WIMPs making up the dark matter component of our Milky Way Galaxy. The sensitivity required is phenomenal, and many mundane background events (cosmic rays, natural radioactivity, clumsy colleagues dropping beer cans) that might mimic WIMPs must be screened out. For this reason, there is a strong desire to perform these experiments in deep mine shafts where the apparatus can be shielded from the cosmic rays that bombard our planet and other practical nuisances.

The technology development involved in the hunt for WIMPs is amazing. The experimentalists have accomplished phenomenal things in the hunt for dark matter. That they have so far failed to detect it should give pause to any thinking person aquainted with the history, techniques, and successes of particle physics. This failure is both a surprise and disappointment to those who understand modern cosmology. It should not come as a surprise to anyone familiar with the dynamical evidence for – and against – dark matter.

Searches for WIMPs are proceeding apace. The sensitivity of these experiments is increasing at an accelerating rate. They already provide important constraints – see the figure:


Searching for WIMPs

This 2008 graph shows the status of searches for Weakly Interacting Massive Particles (WIMPs). The abscissa is the mass of the putative WIMP particle. For reference, the proton has a mass of about one in these units. The ordinate is a measure of the probability for WIMPs to interact with normal matter. Not much! The shaded regions represent theoretical expectations for WIMPs. The light red region is the original (Ellis et al.) forecast. The blue and green regions are more recent predictions (Trotta et al. 2008). The lines are representative experimental limits. The region above each line is excluded – if WIMPs had existed in that range of mass and interaction probability, they would have been detected already. The top line (from CDMS in 2004) excluded much of the original prediction. More recent work (colored lines, circa 2008) now approach the currently expected region.

April 2011 update: XENON100 sees nada. Note how the “expected” region continues to retreat downward in cross section as experiments exclude the previous sweet spots in this parameter. This is the express elevator to hell (see below).

September 2011 update: CREST-II claims a detection. Unfortunately, their positive result violates limits imposed by several other experiments, including XENON100. Somebody is doing their false event rejection wrong.

July 2012 update: XENON100 still seeing squat. Note that the “head” of the most probable (blue) region in the figure above is now excluded.
It is interesting to compare the time sequence of their results: first | run 8 | run 10.

November 2013 update: LUX sees nothing and excludes the various claims for detections of light dark matter (see inset). This exclusion of light dark matter appears to be highly significant as the very recent prediction was for about dozen of detections per month, which should have added up to an easy detection rather than the observed absence of events in excess of the expected background. Note also that the new exclusion boundary cuts deeply into the region predicted for traditional heavy (~ 100 GeV) WIMPs by Buchmuelleur et al. as depicted by Xenon100. The Buchmuelleur et al. “prediction” is already a downscaling from the bulk of probability predicted by Trotta et al. (2008 – the blue region in the figure above). This perpetual adjustment of the expectation for the WIMP cross-section is precisely the dodgy moving of the goal posts that prompted me to first write this web page years ago.

May 2014: “Crunch time” for dark matter comes and goes.

July 2016 update: PandaX sees nada.

August 2016 update: LUX continues to see nada. The minimum of their exclusion line now reaches the bottom axis of the 2009 plot (above the line, with the now-excluded blue blob). The “predicted” WIMP (gray area in the plot within this section) appears to have migrated to higher mass in addition to the downward migration of the cross-section. I guess this is the sideways turbolift to evil-Kirk universe hell.


Indeed, the experiments have perhaps been too successful. The original region of cross section-mass parameter space in which WIMPs were expected to reside was excluded years ago. Not easily dissuaded, theorists waved their hands, invoked the Majorana see-saw mechanism, and reduced the interaction probability to safely non-detectable levels. This is the vertical separation of the reddish and blue-green regions in the figure.

To quote a particle physicist, “The most appealing possibility – a weak scale dark matter particle interacting with matter via Z-boson exchange – leads to the cross section of order 10-39 cm2 which was excluded back in the ’80s by the first round of dark matter experiments. There exists another natural possibility for WIMP dark matter: a particle interacting via Higgs boson exchange. This would lead to the cross section in the 10-42-10-46 cm2 ballpark (depending on the Higgs mass and on the coupling of dark matter to the Higgs).”

From this 2011 Resonaances post

Though set back and discouraged by this theoretical slight of hand (the WIMP “miracle” is now more of a vague coincidence, like seeing an old flame in Grand Central Station but failing to say anything because (a) s/he is way over on another platform and (b) on reflection, you’re not really sure it was him or her after all), experimentallists have been gaining ground on the newly predicted region. If all goes as planned, most of the plausible parameter space will have been explored in a few more years. (I have heard it asserted that “we’ll know what the dark matter is in 5 years” every 5 years for the past two decades. Make that three decades now.)

The express elevator to hell

We’re on an express elevator to hell – going down!

There is a slight problem with the current predictions for WIMPs. While there is a clear focus point where WIMPs most probably reside (the blue blob in the figure), there is also a long tail to low interaction cross section. If we fail to detect WIMPs when experimental sensitivity encompasses the blob, the presumption will be that we’re just unlucky and WIMPs happen to live in the low-probability tail that is not yet excluded. (Low probability regions tend to seem more reasonable as higher probability regions are rejected and we forget about them.) This is the express elevator to hell. No matter how much time, money, and effort we invest in further experimentation, the answer will always be right around the corner. This process can go on forever.

Is dark matter a falsifiable hypothesis?

The existence of dark matter is an inference, not an experimental fact. Individual candidates for the dark matter can be tested and falsified. For example, it was once reasonable to imagine that innumerable brown dwarfs could be the dark matter. That is no longer true – were there that many brown dwarfs out there, we would have seen them directly by now. The brown dwarf hypothesis has been falsified. WIMPs are falsifiable dark matter candidates – provided we don’t continually revise their interaction probability. If we keep doing this, the hypothesis ceases to have predictive power and is no longer subject to falsification.

The concept of dark matter is not falsifiable. If we exclude one candidate, we are free to make up another one. After WIMPs, the next obvious candidate is axions. Should those be falsified, we invent something else. (Particle physicists love to do this. The literature is littered with half-baked dark matter candidates invented for dubious reasons, often to explain phenomena with obvious astrophysical causes. The ludicrous uproar over the ATIC and PAMELA cosmic ray experiments is a good example.) (Circa 2008, there was a lot of enthusiasm that certain signals detected by cosmic ray experiments were caused by dark matter. These have gone away.)


September 2011 update: Fermi confirms the PAMELA positron excess. Too well for it to be dark matter: there is no upper threshold energy corresponding to the WIMP mass. Apparently these cosmic rays are astrophysical in origin, which comes as no surprise to high energy astrophysicists.

April 2013 update: AMS makes claims to detect dark matter that are so laughably absurd they do not warrant commentary.

September 2016 update: There is no update. People seem to have given up on claiming that there is any sign of dark matter in cosmic rays. There have been claims of dark matter causing signatures in gamma ray data and separately in X-ray data. These never looked credible and went away on a time scale shorter so short that on one occasion, an entire session of a 2014 conference had been planned to discuss a gamma ray signal at 126 GeV as dark matter. I asked the organizers a few months in advance if that was even going to be a thing by the time we met. It wasn’t: every speaker scheduled for that session gave some completely unrelated talk.

November 2019 update: Xenon1T sees no sign of WIMPs. (There is some hint of an excess of electron recoils. These are completely the wrong energy scale to be the signal that this experiment was designed to detect.

WIMP prediction and limits. The shaded region marks the prediction of Trotta et al. (2008) for the WIMP mass and interaction cross-section. The lighter shade depicts the 95% confidence limit, the dark region the 68% c.l., and the cross the best fit. The heavy line shows the 90% c.l. exclusion limit from the Xenon1T experiment. Everything above the line is excluded, ruling out virtually all the parameter space in which WIMPs had been predicted to reside.

2020 comment: I was present at a meeting in 2009 when the predictions of Trotta et al (above, in grey, and higher up, in blue and green) was new and fresh. I was, at that point, already feeling like we’d been led down this garden path more than one too many times. So I explicitly asked about the long tail to low cross-section. I was assured that the probability in that tail was < 2%; we would surely detect the WIMP at somewhere around the favored value (the X in the gray figure). We did not. Essentially all of that predicted parameter space has been excluded, with only a tiny fraction of the 2% tail extending below current limits. Worse, the top border of the Trotta et al prediction was based on the knowledge that the parameter space to higher cross section – where the WIMP was originally predicted to reside – had already been experimentally excluded. So the grey region understates the range of parameter space over which WIMPs were reasonably expected to exist. I’m sure there are people who would like to pretend that the right “prediction” for the WIMP is at still lower cross section. That would be an example of how those who are ignorant (or in denial) of history are doomed to repeat it.

I predict that none the big, expensive WIMP experiments will ever find what they’re looking for. It is past time to admit that the lack of detections is because WIMPs don’t exist. I could be proven wrong by the simple expedient of obtaining a credible WIMP detection. I’m sure there are many bright, ambitious scientists who will take up that challenge. To them I say: after you’ve spent your career at the bottom of a mine shaft with no result to show for it, look up at the sky and remember that I tried to warn you.


Cosmology, then and now

Cosmology, then and now

I have been busy teaching cosmology this semester. When I started on the faculty of the University of Maryland in 1998, there was no advanced course on the subject. This seemed like an obvious hole to fill, so I developed one. I remember with fond bemusement the senior faculty, many of them planetary scientists, sending Mike A’Hearn as a stately ambassador to politely inquire if cosmology had evolved beyond a dodgy subject and was now rigorous enough to be worthy of a 3 credit graduate course.

Back then, we used transparencies or wrote on the board. It was novel to have a course web page. I still have those notes, and marvel at the breadth and depth of work performed by my younger self. Now that I’m teaching it for the first time in a decade, I find it challenging to keep up. Everything has to be adapted to an electronic format, and be delivered remotely during this damnable pandemic. It is a less satisfactory experience, and it has precluded posting much here.

Another thing I notice is that attitudes have evolved along with the subject. The baseline cosmology, LCDM, has not changed much. We’ve tilted the power spectrum and spiked it with extra baryons, but the basic picture is that which emerged from the application of classical observational cosmology – measurements of the Hubble constant, the mass density, the ages of the oldest stars, the abundances of the light elements, number counts of faint galaxies, and a wealth of other observational constraints built up over decades of effort. Here is an example of combining such constraints, and exercise I have students do every time I teach the course:

Observational constraints in the mass density-Hubble constant plane assembled by students in my cosmology course in 2002. The gray area is excluded. The open window is the only space allowed; this is LCDM. The box represents the first WMAP estimate in 2003. CMB estimates have subsequently migrated out of the allowed region to lower H0 and higher mass density, but the other constraints have not changed much, most famously H0, which remains entrenched in the low to mid-70s.

These things were known by the mid-90s. Nowadays, people seem to think Type Ia SN discovered Lambda, when really they were just icing on a cake that was already baked. The location of the first peak in the acoustic power spectrum of the microwave background was corroborative of the flat geometry required by the picture that had developed, but trailed the development of LCDM rather than informing its construction. But students entering the field now seem to have been given the impression that these were the only observations that mattered.

Worse, they seem to think these things are Known, as if there’s never been a time that we cosmologists have been sure about something only to find later that we had it quite wrong. This attitude is deleterious to the progress of science, as it precludes us from seeing important clues when they fail to conform to our preconceptions. To give one recent example, everyone seems to have decided that the EDGES observation of 21 cm absorption during the dark ages is wrong. The reason? Because it is impossible in LCDM. There are technical reasons why it might be wrong, but these are subsidiary to Attitude: we can’t believe it’s true, so we don’t. But that’s what makes a result important: something that makes us reexamine how we perceive the universe. If we’re unwilling to do that, we’re no longer doing science.

Second peak bang on

Second peak bang on

At the dawn of the 21st century, we were pretty sure we had solved cosmology. The Lambda Cold Dark Matter (LCDM) model made strong predictions for the power spectrum of the Cosmic Microwave Background (CMB). One was that the flat Robertson-Walker geometry that we were assuming for LCDM predicted the location of the first peak should be at ℓ = 220. As I discuss in the history of the rehabilitation of Lambda, this was a genuinely novel prediction that was clearly confirmed first by BOOMERanG and subsequently by many other experiments, especially WMAP. As such, it was widely (and rightly) celebrated among cosmologists. The WMAP team has been awarded major prizes, including the Gruber cosmology prize and the Breakthrough prize.

As I discussed in the previous post, the location of the first peak was not relevant to the problem I had become interested in: distinguishing whether dark matter existed or not. Instead, it was the amplitude of the second peak of the acoustic power spectrum relative to the first that promised a clear distinction between LCDM and the no-CDM ansatz inspired by MOND. This was also first tested by BOOMERanG:

postboomer
The CMB power spectrum observed by BOOMERanG in 2000. The first peak is located exactly where LCDM predicted it to be. The second peak was not detected, but was clearly smaller than expected in LCDM. It was consistent with the prediction of no-CDM.

In a nutshell, LCDM predicted a big second peak while no-CDM predicted a small second peak. Quantitatively, the amplitude ratio A1:2 was predicted to be in the range 1.54 – 1.83 for LCDM, and 2.22 – 2.57 for no-CDM. Note that A1:2 is smaller for LCDM because the second peak is relatively big compared to the first. 

BOOMERanG confirmed the major predictions of both competing theories. The location of the first peak was exactly where it was expected to be for a flat Roberston-Walker geometry. The amplitude of the second peak was that expected in no-CDM. One can have the best of both worlds by building a model with high Lambda and no CDM, but I don’t take that too seriously: Lambda is just a place holder for our ignorance – in either theory.

I had made this prediction in the hopes that cosmologists would experience the same crisis of faith that I had when MOND appeared in my data. Now it was the data that they valued that was misbehaving – in precisely the way I had predicted with a model that was motivated by MOND (albeit not MOND itself). Surely they would see reason?

There is a story that Diogenes once wandered the streets of Athens with a lamp in broad daylight in search of an honest man. I can relate. Exactly one member of the CMB community wrote to me to say “Gee, I was wrong to dismiss you.” [I paraphrase only a little.] When I had the opportunity to point out to them that I had made this prediction, the most common reaction was “no you didn’t.” Exactly one of the people with whom I had this conversation actually bothered to look up the published paper, and that person also wrote to say “Gee, I guess you did.” Everyone else simply ignored it.

The sociology gets worse from here. There developed a counter-narrative that the BOOMERang data were wrong, therefore my prediction fitting it was wrong. No one asked me about it; I learned of it in a chance conversation a couple of year later in which it was asserted as common knowledge that “the data changed on you.” Let’s examine this statement.

The BOOMERanG data were early, so you expect data to improve. At the time, I noted that the second peak “is only marginally suggested by the data so far”, so I said that “as data accumulate, the second peak should become clear.” It did.

The predicted range quoted above is rather generous. It encompassed the full variation allowed by Big Bang Nucleosynthesis (BBN) at the time (1998/1999). I intentionally considered the broadest range of parameters that were plausible to be fair to both theories. However, developments in BBN were by then disfavoring low-end baryon densities, so the real expectation for the predicted range was narrower. Excluding implausibly low baryon densities, the predicted ranges were 1.6 – 1.83 for LCDM and 2.36 – 2.4 for no-CDM. Note that the prediction of no-CDM is considerably more precise than that of LCDM. This happens because all the plausible models run together in the absence of the forcing term provided by CDM. For hypothesis testing, this is great: the ratio has to be this one value, and only this value.

A few years later, WMAP provided a much more accurate measurement of the peak locations and amplitudes. WMAP measured A1:2 = 2.34 ± 0.09. This is bang on the no-CDM prediction of 2.4.

peaks_predict_wmap
Peak locations measured by WMAP in 2003 (points) compared to the a priori (1999) predictions of LCDM (red tone lines) and no-CDM (blue tone lines).

The prediction for the amplitude ratio A1:2 that I made over twenty years ago remains correct in the most recent CMB data. The same model did not successfully predict the third peak, but I didn’t necessarily expect it to: the no-CDM ansatz (which is just General Relativity without cold dark matter) had to fail at some point. But that gets ahead of the story: no-CDM made a very precise prediction for the second peak. LCDM did not.

LCDM only survives because people were willing to disregard existing bounds – in this case, on the baryon density. It was easier to abandon the most accurately measured and the only over-constrained pillar of Big Bang cosmology than acknowledge a successful prediction that respected all those things. For a few years, the attitude was “BBN was close, but not quite right.” In time, what appears to be confirmation bias kicked in, and the measured abundances of the light elements migrated towards the “right” value – as  specified by CMB fits.

LCDM does give an excellent fit to the power spectrum of the CMB. However, only the location of the first peak was predicted correctly in advance. Everything subsequent to that (at higher ℓ) is the result of a multi-parameter fit with sufficient flexibility to accommodate any physically plausible power spectrum. However, there is no guarantee that the parameters of the fit will agree with independent data. For a long while they did, but now we see the emergence of tensions in not only the baryon density, but also the amplitude of the power spectrum, and most famously, the value of the Hubble constant. Perhaps this is the level of accuracy that is necessary to begin to perceive genuine anomalies. Beyond the need to invoke invisible entities in the first place.

I could say a lot more, and perhaps will in future. For now, I’d just like to emphasize that I made a very precise, completely novel prediction for the amplitude of the second peak. That prediction came true. No one else did that. Heck of a coincidence, if there’s nothing to it.