A surprising and ultimately career-altering result that I encountered while in my first postdoc was that low surface brightness galaxies fell precisely on the Tully-Fisher relation. This surprising result led me to test the limits of the relation in every conceivable way. Are there galaxies that fall off it? How far is it applicable? Often, that has meant pushing the boundaries of known galaxies to ever lower surface brightness, higher gas fraction, and lower mass where galaxies are hard to find because of unavoidable selection biases in galaxy surveys: dim galaxies are hard to see.
I made a summary plot in 2017 to illustrate what we had learned to that point. There is a clear break in the stellar mass Tully-Fisher relation (left panel) that results from neglecting the mass of interstellar gas that becomes increasingly important in lower mass galaxies. The break goes away when you add in the gas mass (right panel). The relation between baryonic mass and rotation speed is continuous down to Leo P, a tiny galaxy just outside the Local Group comparable in mass to a globular cluster and the current record holder for the slowest known rotating galaxy at a mere 15 km/s.
At the high mass end, galaxies aren’t hard to see, but they do become progressively rare: there is an exponential cut off in the intrinsic numbers of galaxies at the high mass end. So it is interesting to see how far up in mass we can go. Ogle et al. set out to do that, looking over a huge volume to identify a number of very massive galaxies, including what they dubbed “super spirals.” These extend the Tully-Fisher relation to higher masses.
Most of the super spirals lie on the top end of the Tully-Fisher relation. However, a half dozen of the most massive cases fall off to the right. Could this be a break in the relation? So it was claimed at the time, but looking at the data, I wasn’t convinced. It looked to me like they were not always getting out to the flat part of the rotation curve, instead measuring the maximum rotation speed.
Bright galaxies tend to have rapidly rising rotation curves that peak early then fall before flattening out. For very bright galaxies – and super spirals are by definition the brightest spirals – the amplitude of the decline can be substantial, several tens of km/s. So if one measures the maximum speed instead of the flat portion of the curve, points will fall to the right of the relation. I decided not to lose any sleep over it, and wait for better data.
Better data have now been provided by Di Teodoro et al. Here is an example from their paper. The morphology of the rotation curve is typical of what we see in massive spiral galaxies. The maximum rotation speed exceeds 300 km/s, but falls to 275 km/s where it flattens out.
Adding the updated data to the plot, we see that the super spirals now fall on the Tully-Fisher relation, with no hint of a break. There are a couple of outliers, but those are trees. The relation is the forest.
That’s a good plot, but it stops at 108 solar masses, so I couldn’t resist adding the super spirals to my plot from 2017. I’ve also included the dwarfs I discussed in the last post. Together, we see that the baryonic Tully-Fisher relation is continuous over six decades in mass – a factor of million from the smallest to the largest galaxies.
The strength of this correlation continues to amaze me. This never happens in extragalactic astronomy, where correlations are typically weak and have lots of intrinsic scatter. The opposite is true here. This must be telling us something.
The obvious thing that this is telling us is MOND. The initial report that super spirals fell off of the Tully-Fisher relation was widely hailed as a disproof of MOND. I’ve seen this movie many times, so I am not surprised that the answer changed in this fashion. It happens over and over again. Even less surprising is that there is no retraction, no self-examination of whether maybe we jumped to the wrong conclusion.
I get it. I couldn’t believe it myself, to start. I struggled for many years to explain the data conventionally in terms of dark matter. Worked my ass off trying to save the paradigm. Try as I might, nothing worked. Since then, many people have claimed to explain what I could not, but so far all I have seen are variations on models that I had already rejected as obviously unworkable. They either make unsubstantiated assumptions, building a tautology, or simply claim more than they demonstrate. As long as you say what people want to hear, you will be held to a very low standard. If you say what they don’t want to hear, what they are conditioned not to believe, then no standard of proof is high enough.
MOND was the only theory to predict the observed behavior a priori. There are no free parameters in the plots above. We measure the mass and the rotation speed. The data fall on the predicted line. Dark matter models did not predict this, and can at best hope to provide a convoluted, retroactive explanation. Why should I be impressed by that?
We have a new paper on the arXiv. This is a straightforward empiricist’s paper that provides a reality check on the calibration of the Baryonic Tully-Fisher relation (BTFR) and the distance scale using well-known Local Group galaxies. It also connects observable velocity measures in rotating and pressure supported dwarf galaxies: the flat rotation speed of disks is basically twice the line-of-sight velocity dispersion of dwarf spheroidals.
First, the reality check. Previously we calibrated the BTFR using galaxies with distances measured by reliable methods like Cepheids and the Tip of the Red Giant Branch (TRGB) method. Application of this calibration obtains the Hubble constant H0 = 75.1 +/- 2.3 km/s/Mpc, which is consistent with other local measurements but in tension with the value obtained from fitting the Planck CMB data. All of the calibrator galaxies are nearby (most are within 10 Mpc, which is close by extragalactic standards), but none of them are in the Local Group (galaxies within ~1 Mpc like Andromeda and M33). The distances to Local Group galaxies are pretty well known at this point, so if we got the BTFR calibration right, they had better fall right on it.
They do. From high to low mass, the circles in the plot below are Andromeda, the Milky Way, M33, the LMC, SMC, and NGC 6822. All fall on the externally calibrated BTFR, which extrapolates well to still lower mass dwarf galaxies like WLM, DDO 210, and DDO 216 (and even Leo P, the smallest rotating galaxy known).
The agreement of the BTFR with Local Group rotators is so good that it is tempting to say that there is no way to reconcile this with a low Hubble constant of 67 km/s/kpc. Doing so would require all of these galaxies to be more distant by the factor 75/67 = 1.11. That doesn’t sound too bad, but applying it means that Andromeda would have to be 875 kpc distant rather than the 785 ± 25 adopted by the source of our M31 data, Chemin et al. There is a long history of distance measurements to M31 so many opinions can be found, but it isn’t just M31 – all of the Local Group galaxy distances would have to be off by this factor. This seems unlikely to the point of absurdity, but as colleague and collaborator Jim Schombert reminds me, we’ve seen such things before with the distance scale.
So that’s the reality check: the BTFR works as it should in the Local Group – at least for the rotating galaxies (circles in the plot above). What about the pressure supported galaxies (the squares)?
Galaxies come in two basic kinematic types: rotating disks or pressure supported ellipticals. Disks are generally thin, with most of the stars orbiting in the same direction in the same plane on nearly circular orbits. Ellipticals are quasi-spherical blobs of stars on rather eccentric orbits oriented all over the place. This is an oversimplification, of course; real galaxies have a mix of orbits, but usually most of the kinetic energy is invested in one or the other, rotation or random motions. We can measure the speeds of stars and gas in these configurations, which provides information about the kinetic energy and corresponding gravitational binding energy. That’s how we get at the gravitational potential and infer the need for dark matter – or at least, the existence of acceleration discrepancies.
We would like to have full 6D phase space information for all stars – their location in 3D configuration space and their momentum in each direction. In practice, usually all we can measure is the Doppler line-of-sight speed. For rotating galaxies, we can [attempt to] correct the observed velocity for the inclination of the disk, and get an idea or the in-plane rotation speed. For ellipticals, we get the velocity dispersion along the line of sight in whatever orientation we happen to get. If the orbits are isotropic, then one direction of view is as good as any other. In general that need not be the case, but it is hard to constrain the anisotropy of orbits, so usually we assume isotropy and call it Close Enough for Astronomy.
For isotropic orbits, the velocity dispersion σ* is related to the circular velocity Vc of a test particle by Vc = √3 σ*. The square root of three appears because the kinetic energy of isotropic orbits is evenly divided among the three cardinal directions. These quantities depend in a straightforward way on the gravitational potential, which can be computed for the stuff we can see but not for that which we can’t. The stars tend to dominate the potential at small radii in bright galaxies. This is a complication we’ll ignore here by focusing on the outskirts of rotating galaxies where rotation curves are flat and dwarf spheroidals where stars never dominate. In both cases, we are in a limit where we can neglect the details of the stellar distribution: only the dark mass matters, or, in the case of MOND, only the total normal mass but not its detailed distribution (which does matter for the shape of a rotation curve, but not its flat amplitude).
Rather than worry about theory or the gory details of phase space, let’s just ask the data. How do we compare apples with apples? What is the factor βc that makes Vo = βc σ* an equality?
One notices that the data for pressure supported dwarfs nicely parallels that for rotating galaxies. We estimate βc by finding the shift that puts the dwarf spheroidals on the BTFR (on average). We only do this for the dwarfs that are not obviously affected by tidal effects whose velocity dispersions may not reflect the equilibrium gravitational potential. I have discussed this at great length in McGaugh & Wolf, so I refer the reader eager for more details there. Here I merely note that the exercise is meaningful only for those dwarfs that parallel the BTFR; it can’t apply to those that don’t regardless of the reason.
That caveat aside, this works quite well for βc = 2.
The numerically inclined reader will note that 2 > √3. One would expect the latter for isotropic orbits, which we implicitly average over by using the data for all these dwarfs together. So the likely explanation for the larger values of βc is that the outer velocities of rotation curves are measured at a larger radii than the velocity dispersions of dwarf spheroidals. The value of βc is accounts for the different effective radii of measurement as illustrated by the rotation curves below.
Once said, this seems obvious. The velocity dispersions of dwarf spheroidals are measured by observing the Doppler shifts of individual member stars. This measurement is necessarily made where the stars are. In contrast, the flat portions of rotation curves are traced by atomic gas at radii that typically extend beyond the edge of the optical disk. So we should expect a difference; βc = 2 quantifies it.
One small caveat is that in order to compare apples with apples, we have to adopt a mass-to-light ratio for the stars in dwarfs spheroidals in order to compare them with the combined mass of stars and gas in rotating galaxies. Indeed, the dwarf irregulars that overlap with the dwarf spheroidals in mass are made more of gas than stars, so there is always the risk of some systematic difference between the two mass scales. In the paper, we quantify the variation of βc with the choice of M*/L. If you’re interested in that level of detail, you should read the paper.
I should also note that MOND predicts βc = 2.12. Taken at face value, this implies that MOND prefers an average mass-to-light ratio slightly higher than what we assumed. This is well within the uncertainties, and we already know that MOND is the only theory capable of predicting the velocity dispersions of dwarf spheroidals in advance. We can always explain this after the fact with dark matter, which is what people generally do, often in apparent ignorance that MOND also correctly predicts which dwarfs they’ll have to invoke tidal disruption for. How such models can be considered satisfactory is quite beyond my capacity, but it does save one from the pain of having to critically reassess one’s belief system.
That’s all beyond the scope of the current paper. Here we just provide a nifty empirical result. If you want to make an apples-to-apples comparison of dwarf spheroidals with rotating dwarf irregulars, you will do well to assume Vo = 2σ*.
Last time, I expressed despondency about the lack of progress due to attitudes that in many ways remain firmly entrenched in the 1980s. Recently a nice result has appeared, so maybe there is some hope.
The radial acceleration relation (RAR) measured in rotationally supported galaxies extends down to an observed acceleration of about gobs = 10-11 m/s/s, about one part in 1000000000000 of the acceleration we feel here on the surface of the Earth. In some extreme dwarfs, we get down below 10-12 m/s/s. But accelerations this low are hard to find except in the depths of intergalactic space.
Weak lensing data
Brouwer et al have obtained a new constraint down to 10-12.5 m/s/s using weak gravitational lensing. This technique empowers one to probe the gravitational potential of massive galaxies out to nearly 1 Mpc. (The bulk of the luminous mass is typically confined within a few kpc.) To do this, one looks for the net statistical distortion in galaxies behind a lensing mass like a giant elliptical galaxy. I always found this approach a little scary, because you can’t see the signal directly with your eyes the way you can the velocities in a galaxy measured with a long slit spectrograph. Moreover, one has to bin and stack the data, so the result isn’t for an individual galaxy, but rather the average of galaxies within the bin, however defined. There are further technical issues that makes this challenging, but it’s what one has to do to get farther out.
To parse a few of the details: there are two basic results here, one from the GAMA survey (the blue points) and one from KiDS. KiDS is larger so has smaller formal errors, but relies on photometric redshifts (which uses lots of colors to guess the best match redshift). That’s probably OK in a statistical sense, but they are not as accurate as the spectroscopic redshifts measured for GAMA. There is a lot of structure in redshift space that gets washed out by photometric redshift estimates. The fact that the two basically agree hopefully means that this doesn’t matter here.
There are two versions of the KiDS data, one using just the stellar mass to estimate gbar, and another that includes an estimate of the coronal gas mass. Many galaxies are surrounded by a hot corona of gas. This is negligible at small radii where the stars dominate, but becomes progressively more important as part of the baryonic mass budget as one moves out. How important? Hard to say. But it certainly matters on scales of a few hundred kpc (this is the CGM in the baryon pie chart, which suggests roughly equal mass in stars (all within a few tens of kpc) and hot coronal gas (mostly out beyond 100 kpc). This corresponds to the orange points; the black points are what happens if we neglect this component (which certainly isn’t zero). So in there somewhere – this seems to be the dominant systematic uncertainty.
Getting past these pesky detail, this result is cool on many levels. First, the RAR appears to persist as a relation. That needn’t have happened. Second, it extends the RAR by a couple of decades to much lower accelerations. Third, it applies to non-rotating as well as rotationally supported galaxies (more on that in a bit). Fourth, the data at very low accelerations follow a straight line with a slope of about 1/2 in this log-log plot. That means gobs ~ gbar1/2. That provides a test of theory.
What does it mean?
Empirically, this is a confirmation that a known if widely unexpected relation extends further than previously known. That’s pretty neat in its own right, without any theoretical baggage. We used to be able to appreciate empirical relations better (e.g, the stellar main sequence!) before we understood what they meant. Now we seem to put the cart (theory) before the horse (data). That said, we do want to use data to test theories. Usually I discuss dark matter first, but that is complicated, so let’s start with MOND.
Test of MOND
MOND predicts what we see.
I am tempted to leave it at that, because it’s really that simple. But experience has taught me that no result is so obvious that someone won’t claim exactly the opposite, so let’s explore it a bit more.
There are three tests: whether the relation (i) exists, (ii) has the right slope, and (iii) has the right normalization. Tests (i) and (ii) are an immediate pass. It also looks like (iii) is very nearly correct, but it depends in detail on the baryonic mass-to-light ratio – that of the stars plus any coronal gas.
MOND is represented by the grey line that’s hard to see, but goes through the data at both high and low acceleration. At high accelerations, this particular line is a fitting function I chose for convenience. There’s nothing special about it, nor is it even specific to MOND. That was the point of our 2016 RAR paper: this relation exists in the data whether it is due to MOND or not. Conceivably, the RAR might be a relation that only applies to rotating galaxies for some reason that isn’t MOND. That’s hard to sustain, since the data look like MOND – so much so that the two are impossible to distinguish in this plane.
In terms of MOND, the RAR traces the interpolation function that quantifies the transition from the Newtonian regime where gobs = gbar to the deep MOND regime where gobs ~ gbar1/2. MOND does not specify the precise form of the interpolation function, just the asymptotic limits. The data trace that the transition, providing an empirical assessment of the shape of the interpolation function around the acceleration scale a0. That’s interesting and will hopefully inform further theory development, but it is not critical to testing MOND.
What MOND does very explicitly predict is the asymptotic behavior gobs ~ gbar1/2 in the deep MOND regime of low accelerations (gobs << a0). That the lensing data are well into this regime makes them an excellent test of this strong prediction of MOND. It passes with flying colors: the data have precisely the slope anticipated by Milgrom nearly 40 years ago.
This didn’t have to happen. All sorts of other things might have happened. Indeed, as we discussed in Lelli et al (2017), there were some hints that the relation flattened, saturating at a constant gobs around 10-11 m/s/s. I was never convinced that this was real, as it only appears in the least certain data, and there were already some weak lensing data to lower accelerations.
Milgrom (2013) analyzed weak lensing data that were available then, obtaining this figure:
The new data corroborate this result. Here is a similar figure from Brouwer et al:
Just looking at these figures, one can see the same type-dependent effect found by Milgrom. However, there is an important difference: Milgrom’s plot leaves the unknown mass-to-light ratio as a free parameter, while the new plot has an estimate of this built-in. So if the adopted M/L is correct, then the red and blue galaxies form parallel RARs that are almost but not quite exactly the same. That would not be consistent with MOND, which should place everything on the same relation. However, this difference is well within the uncertainty of the baryonic mass estimate – not just the M/L of the stars, but also the coronal gas content (i.e., the black vs. orange points in the first plot). MOND predicted this behavior well in advance of the observation, so one would have to bend over backwards, rub one’s belly, and simultaneously punch oneself in the face to portray this as anything short of a fantastic success of MOND.
I say that because I’m sure people will line up to punch themselves in the face in exactly this fashion*. One of the things that persuades me to suspect that there might be something to MOND is the lengths to which people will go to deny even its most obvious successes. At the same time, they are more than willing to cut any amount of slack necessary to save LCDM. An example is provided by Ludlow et al., who claim to explain the RAR ‘naturally’ from simulations – provided they spot themselves a magic factor of two in the stellar mass-to-light ratio. If it were natural, they wouldn’t need that arbitrary factor. By the same token, if you recognize that you might have been that far off about M*/L, you have to extend that same grace to MOND as you do to LCDM. That’s a basic tenet of objectivity, which used to be a value in science. It doesn’t look like a correction as large as a factor of two is necessary here given the uncertainty in the coronal gas. So, preemptively: Get a grip, people.
MOND predicts what we see. No other theory beat it to the punch. The best one can hope to do is to match its success after the fact by coming up with some other theory that looks just like MOND.
Test of LCDM
In order to test LCDM, we have to agree what LCDM predicts. That agreement is lacking. There is no clear prediction. This complicates the discussion, as the best one can hope to do is give a thorough discussion of all the possibilities that people have so far considered, which differ in important ways. That exercise is necessarily incomplete – people can always come up with new and different ideas for how to explain what they didn’t predict. I’ve been down the road of being thorough many times, which gets so complicated that no one reads it. So I will not attempt to be thorough here, and only explore enough examples to give a picture of where we’re currently at.
The tests are the same as above: should the relation (i) exist? (ii) have the observed slope? and (iii) normalization?
The first problem for LCDM is that the relation exists (i). There is no reason to expect this relation to exist. There was (and in some corners, continues to be) a lot of denial that the RAR even exists, because it shouldn’t. It does, and it looks just like what MOND predicts. LCDM is not MOND, and did not anticipate this behavior because there is no reason to do so.
If we persist past this point – and it is not obvious that we should – then we may say, OK, here’s this unexpected relation; how do we explain it? For starters, we do have a prediction for the density profiles of dark matter halos; these fall off as r-3. That translates to some slope in the RAR plane, but not a unique relation, as the normalization can and should be different for each halo. But it’s not even the right slope. The observed slope corresponds to a logarithmic potential in which the density profile falls off as r-2. That’s what is required to give a flat rotation curve in Newtonian dynamics, which is why the psedoisothermal halo was the standard model before simulations gave us the NFW halo with its r-3 fall off. The lensing data are like a flat rotation curve that extends indefinitely far out; they are not like an NFW halo.
That’s just stating the obvious. To do more requires building a model. Here is an example from Oman et al. of a model that follows the logic I just outlined, adding some necessary and reasonable assumptions about the baryons:
The model is the orange line. It deviates from the black line that is the prediction of MOND. The data look like MOND, not like the orange line.
One can of course build other models. Brouwer et al discuss some. I will not explore these in detail, and only note that the models are not consistent, so there is no clear prediction from LCDM. To explore just one a little further, this figure appears at the very end of their paper, in appendix C:
The orange line in this case is some extrapolation of the model of Navarro et al. (2017).** This also does not work, though it doesn’t fail by as much as the model of Oman et al. I don’t understand how they make the extrapolation here, as a major prediction of Navarro et al. was that gobs would saturate at 10-11 ms/s/s; the orange line should flatten out near the middle of this plot. Indeed, they argued that we would never observe any lower accelerations, and that
“extending observations to radii well beyond the inner halo regions should lead to systematic deviations from the MDAR.”
– Navarro et al (2017)
This is a reasonable prediction for LCDM, but it isn’t what happened – the RAR continues as predicted by MOND. (The MDAR is equivalent to the RAR).
The astute reader may notice that many of these theorists are frequently coauthors, so you might expect they’d come up with a self-consistent model and stick to it. Unfortunately, consistency is not a hobgoblin that afflicts galaxy formation theory, and there are as many predictions as there are theorists (more for the prolific ones). They’re all over the map – which is the problem. LCDM makes no prediction to which everyone agrees. This makes it impossible to test the theory. If one model is wrong, that is just because that particular model is wrong, not because the theory is under threat. The theory is never under threat as there always seems to be another modeler who will claim success where others fail, whether they genuinely succeed or not. That they claim success is all that is required. Cognitive dissonance then takes over, people believe what they want to hear, and all anomalies are forgiven and forgotten. There never seems to be a proper prior that everyone would agree falsifies the theory if it fails. Galaxy formation in LCDM has become epicycles on steroids.
I have no idea. Continue to improve the data, of course. But the more important thing that needs to happen is a change in attitude. The attitude is that LCDM as a cosmology must be right so the mass discrepancy must be caused by non-baryonic dark matter so any observation like this must have a conventional explanation, no matter how absurd and convoluted. We’ve been stuck in this rut since before we even put the L in CDM. We refuse to consider alternatives so long as the standard model has not been falsified, but I don’t see how it can be falsified to the satisfaction of all – there’s always a caveat, a rub, some out that we’re willing to accept uncritically, no matter how silly. So in the rut we remain.
A priori predictions are an important part of the scientific method because they can’t be fudged. On the rare occasions when they come true, it is supposed to make us take note – even change our minds. These lensing results are just another of many previous corroborations of a priori predictions by MOND. What people do with that knowledge – build on it, choose to ignore it, or rant in denial – is up to them.
*Bertolt Brecht mocked this attitude amongst the Aristotelian philosophers in his play about Galileo, noting how they were eager to criticize the new dynamics if the heavier rock beat the lighter rock to the ground by so much as a centimeter in the Leaning Tower of Pisa experiment while turning a blind eye to their own prediction being off by a hundred meters.
**I worked hard to salvage dark matter, which included a lot of model building. I recognize the model of Navarro et al as a slight variation on a model I built in 2000 but did not publish because it was obviously wrong. It takes a lot of time to write a scientific paper, so a lot of null results never get reported. In 2000 when I did this, the natural assumption to make was that galaxies all had about the same disk fraction (the ratio of stars to dark matter, e.g., assumption (i) of Mo et al 1998). This predicts far too much scatter in the RAR, which is why I abandoned the model. Since then, this obvious and natural assumption has been replaced by abundance matching, in which the stellar mass fraction is allowed to vary to account for the difference between the predicted halo mass function and the observed galaxy luminosity function. In effect, we replaced a universal constant with a rolling fudge factor***. This has the effect of compressing the range of halo masses for a given range of stellar masses. This in turn reduces the “predicted” scatter in the RAR, just by taking away some of the variance that was naturally there. One could do better still with even more compression, as the data are crudely consistent with all galaxies living in the same dark matter halo. This is of course a consequence of MOND, in which the conventionally inferred dark matter halo is just the “extra” force specified by the interpolation function.
***This is an example of what I’ll call prediction creep for want of a better term. Originally, we thought that galaxies corresponded to balls of gas that had had time to cool and condense. As data accumulated, we realized that the baryon fractions of galaxies were not equal to the cosmic value fb; they were rather less. That meant that only a fraction of the baryons available in a dark matter halo had actually cooled to form the visible disk. So we introduced a parameter md = Mdisk/Mtot (as Mo et al. called it) where the disk is the visible stars and gas and the total includes that and all the dark matter out to the notional edge of the dark matter halo. We could have any md < fb, but they were in the same ballpark for massive galaxies, so it seemed reasonable to think that the disk fraction was a respectable fraction of the baryons – and the same for all galaxies, perhaps with some scatter. This also does not work; low mass galaxies have much lower md than high mass galaxies. Indeed, md becomes ridiculously small for the smallest galaxies, less than 1% of the available fb (a problem I’ve been worried about since the previous century). At each step, there has been a creep in what we “predict.” All the baryons should condense. Well, most of them. OK, fewer in low mass galaxies. Why? Feedback! How does that work? Don’t ask! You don’t want to know. So for a while the baryon fraction of a galaxy was just a random number stochastically generated by chance and feedback. That is reasonable (feedback is chaotic) but it doesn’t work; the variation of the disk fraction is a clear function of mass that has to have little scatter (or it pumps up the scatter in the Tully-Fisher relation). So we gradually backed our way into a paradigm where the disk fraction is a function md(M*). This has been around long enough that we have gotten used to the idea. Instead of seeing it for what it is – a rolling fudge factor – we call it natural as if it had been there from the start, as if we expected it all along. This is prediction creep. We did not predict anything of the sort. This is just an expectation built through familiarity with requirements imposed by the data, not genuine predictions made by the theory. It has become common to assert that some unnatural results are natural; this stems in part from assuming part of the answer: any model built on abundance matching is unnatural to start, because abundance matching is unnatural. Necessary, but not remotely what we expected before all the prediction creep. It’s creepy how flexible our predictions can be.
Before we can agree on the interpretation of a set of facts, we have to agree on what those facts are. Even if we agree on the facts, we can differ about their interpretation. It is OK to disagree, and anyone who practices astrophysics is going to be wrong from time to time. It is the inevitable risk we take in trying to understand a universe that is vast beyond human comprehension. Heck, some people have made successful careers out of being wrong. This is OK, so long as we recognize and correct our mistakes. That’s a painful process, and there is an urge in human nature to deny such things, to pretend they never happened, or to assert that what was wrong was right all along.
This happens a lot, and it leads to a lot of weirdness. Beyond the many people in the field whom I already know personally, I tend to meet two kinds of scientists. There are those (usually other astronomers and astrophysicists) who might be familiar with my work on low surface brightness galaxies or galaxy evolution or stellar populations or the gas content of galaxies or the oxygen abundances of extragalactic HII regions or the Tully-Fisher relation or the cusp-core problem or faint blue galaxies or big bang nucleosynthesis or high redshift structure formation or joint constraints on cosmological parameters. These people behave like normal human beings. Then there are those (usually particle physicists) who have only heard of me in the context of MOND. These people often do not behave like normal human beings. They conflate me as a person with a theory that is Milgrom’s. They seem to believe that both are evil and must be destroyed. My presence, even the mere mention of my name, easily destabilizes their surprisingly fragile grasp on sanity.
One of the things that scientists-gone-crazy do is project their insecurities about the dark matter paradigm onto me. People who barely know me frequently attribute to me motivations that I neither have nor recognize. They presume that I have some anti-cosmology, anti-DM, pro-MOND agenda, and are remarkably comfortably about asserting to me what it is that I believe. What they never explain, or apparently bother to consider, is why I would be so obtuse? What is my motivation? I certainly don’t enjoy having the same argument over and over again with their ilk, which is the only thing it seems to get me.
The only agenda I have is a pro-science agenda. I want to know how the universe works.
This agenda is not theory-specific. In addition to lots of other astrophysics, I have worked on both dark matter and MOND. I will continue to work on both until we have a better understanding of how the universe works. Right now we’re very far away from obtaining that goal. Anyone who tells you otherwise is fooling themselves – usually by dint of ignoring inconvenient aspects of the evidence. Everyone is susceptible to cognitive dissonance. Scientists are no exception – I struggle with it all the time. What disturbs me is the number of scientists who apparently do not. The field is being overrun with posers who lack the self-awareness to question their own assumptions and biases.
So, I feel like I’m repeating myself here, but let me state my bias. Oh wait. I already did. That’s why it felt like repetition. It is.
The following bit of this post is adapted from an old web page I wrote well over a decade ago. I’ve lost track of exactly when – the file has been through many changes in computer systems, and unix only records the last edit date. For the linked page, that’s 2016, when I added a few comments. The original is much older, and was written while I was at the University of Maryland. Judging from the html style, it was probably early to mid-’00s. Of course, the sentiment is much older, as it shouldn’t need to be said at all.
I will make a few updates as seem appropriate, so check the link if you want to see the changes. I will add new material at the end.
Long standing remarks on intellectual honesty
The debate about MOND often degenerates into something that falls well short of the sober, objective discussion that is suppose to characterize scientific debates. One can tell when voices are raised and baseless ad hominem accusations made. I have, with disturbing frequency, found myself accused of partisanship and intellectual dishonesty, usually by people who are as fair and balanced as Fox News.
Let me state with absolute clarity that intellectual honesty is a bedrock principle of mine. My attitude is summed up well by the quote
When a man lies, he murders some part of the world.
This is a great quote for science, as the intent is clear. We don’t get to pick and choose our facts. Outright lying about them is antithetical to science.
I would extend this to ignoring facts. One should not only be honest, but also as complete as possible. It does not suffice to be truthful while leaving unpleasant or unpopular facts unsaid. This is lying by omission.
I “grew up” believing in dark matter. Specifically, Cold Dark Matter, presumably a WIMP. I didn’t think MOND was wrong so much as I didn’t think about it at all. Barely heard of it; not worth the bother. So I was shocked – and angered – when it its predictions came true in my data for low surface brightness galaxies. So I understand when my colleagues have the same reaction.
Nevertheless, Milgrom got the prediction right. I had a prediction, it was wrong. There were other conventional predictions, they were also wrong. Indeed, dark matter based theories generically have a very hard time explaining these data. In a Bayesian sense, given the prior that we live in a ΛCDM universe, the probability that MONDian phenomenology would be observed is practically zero. Yet it is. (This is very well established, and has been for some time.)
So – confronted with an unpopular theory that nevertheless had some important predictions come true, I reported that fact. I could have ignored it, pretended it didn’t happen, covered my eyes and shouted LA LA LA NOT LISTENING. With the benefit of hindsight, that certainly would have been the savvy career move. But it would also be ignoring a fact, and tantamount to a lie.
In short, though it was painful and protracted, I changed my mind. Isn’t that what the scientific method says we’re suppose to do when confronted with experimental evidence?
That was my experience. When confronted with evidence that contradicted my preexisting world view, I was deeply troubled. I tried to reject it. I did an enormous amount of fact-checking. The people who presume I must be wrong have not had this experience, and haven’t bothered to do any fact-checking. Why bother when you already are sure of the answer?
I understand being skeptical about MOND. I understand being more comfortable with dark matter. That’s where I started from myself, so as I said above, I can empathize with people who come to the problem this way. This is a perfectly reasonable place to start.
To give an example of disinformation, I still hear said things like “MOND fits rotation curves but nothing else.” This is not true. The first thing I did was check into exactly that. Years of fact-checking went into McGaugh & de Blok (1998), and I’ve done plenty more since. It came as a great surprise to me that MOND explained the vast majority of the data as well or better than dark matter. Not everything, to be sure, but lots more than “just” rotation curves. Yet this old falsehood still gets repeated as if it were not a misconception that was put to rest in the previous century. We’re stuck in the dark ages by choice.
It is not a defensible choice. There is no excuse to remain ignorant of MOND at this juncture in the progress of astrophysics. It is incredibly biased to point to its failings without contending with its many predictive successes. It is tragi-comically absurd to assume that dark matter provides a better explanation when it cannot make the same predictions in advance. MOND may not be correct in every particular, and makes no pretense to be a complete theory of everything. But it is demonstrably less wrong than dark matter when it comes to predicting the dynamics of systems in the low acceleration regime. Pretending like this means nothing is tantamount to ignoring essential facts.
Even a lie of omission murders a part of the world.
This title is an example of what has come to be called Betteridge’s law. This is a relatively recent name for an old phenomenon: if a title is posed as a question, the answer is no. This is especially true in science, whether the authors are conscious of it or not.
Pengfei Li completed his Ph.D. recently, fitting all manner of dark matter halos as well as the radial acceleration relation (RAR) to galaxies in the SPARC database. For the RAR, he found that galaxy data were consistent with a single, universal acceleration scale, g+. There is of course scatter in the data, but this appears to us to be consistent with what we expect from variation in the mass-to-light ratios of stars and the various uncertainties in the data.
This conclusion has been controversial despite being painfully obvious. I have my own law for data interpretation in astronomy:
Obvious results provoke opposition. The more obvious the result, the stronger the opposition.
The constancy of the acceleration scale is such a case. Where we do not believe we can distinguish between galaxies, others think they can – using our own data! Here it is worth contemplating what all is involved in building a database like SPARC – we were the ones who did the work, after all. In the case of the photometry, we observed the galaxies, we reduced the data, we cleaned the images of foreground contaminants (stars), we fit isophotes, we built mass models – that’s a very short version of what we did in order to be able to estimate the acceleration predicted by Newtonian gravity for the observed distribution of stars. That’s one axis of the RAR. The other is the observed acceleration, which comes from rotation curves, which require even more work. I will spare you the work flow; we did some galaxies ourselves, and took others from the literature in full appreciation of what we could and could not believe — which we have a deep appreciation for because we do the same kind of work ourselves. In contrast, the people claiming to find the opposite of what we find obtained the data by downloading it from our website. The only thing they do is the very last step in the analysis, making fits with Bayesian statistics the same as we do, but in manifest ignorance of the process by which the data came to be. This leads to an underappreciation of the uncertainty in the uncertainties.
This is another rule of thumb in science: outside groups are unlikely to discover important things that were overlooked by the group that did the original work. An example from about seven years ago was the putative 126 GeV line in Fermi satellite data. This was thought by some at the time to be evidence for dark matter annihilating into gamma rays with energy corresponding to the rest mass of the dark matter particles and their anti-particles. This would be a remarkable, Nobel-winning discovery, if true. Strange then that the claim was not made by the Fermi team themselves. Did outsiders beat them to the punch with their own data? It can happen: sometimes large collaborations can be slow to move on important results, wanting to vet everything carefully or warring internally over its meaning while outside investigators move more swiftly. But it can also be that the vetting shows that the exciting result is not credible.
I recall the 126 GeV line being a big deal. There was an entire session devoted to it at a conference I was scheduled to attend. Our time is valuable: I can’t go to every interesting conference, and don’t want to spend time on conferences that aren’t interesting. I was skeptical, simply because of the rule of thumb. I wrote the organizers, and asked if they really thought that this would still be a thing by the time the conference happened in few months’ time. Some of them certainly thought so, so it went ahead. As it happened, it wasn’t. Not a single speaker who was scheduled to talk about the 126 GeV line actually did so. In a few short months, if had gone from an exciting result sure to win a Nobel prize to nada.
This happens all the time. Science isn’t as simple as a dry table of numbers and error bars. This is especially true in astronomy, where we are observing objects in the sky. It is never possible to do an ideal experiment in which one controls for all possible systematics: the universe is not a closed box in which we can control the conditions. Heck, we don’t even know what all the unknowns are. It is a big friggin’ universe.
The practical consequence of this is that the uncertainty in any astronomical measurement is almost always larger than its formal error bar. There are effects we can quantify and include appropriately in the error assessment. There are things we can not. We know they’re there, but that doesn’t mean we can put a meaningful number on them.
Indeed, the sociology of this has evolved over the course of my career. Back in the day, everybody understood these things, and took the stated errors with a grain of salt. If it was important to estimate the systematic uncertainty, it was common to estimate a wide band, in effect saying “I’m pretty sure it is in this range.” Nowadays, it has become common to split out terms for random and systematic error. This is helpful to the non-specialist, but it can also be misleading because, so stated, the confidence interval on the systematic looks like a 1 sigma error even though it is not likely to have a Gaussian distribution. Being 3 sigma off of the central value might be a lot more likely than this implies — or a lot less.
People have become more careful in making error estimates, which ironically has made matters worse. People seem to think that they can actually believe the error bars. Sometimes you can, but sometimes not. Many people don’t know how much salt to take it with, or realize that they should take it with a grain of salt at all. Worse, more and more folks come over from particle physics where extraordinary accuracy is the norm. They are completely unprepared to cope with astronomical data, or even fully process that the error bars may not be what they think they are. There is no appreciation for the uncertainties in the uncertainties, which is absolutely fundamental in astrophysics.
Consequently, one gets overly credulous analyses. In the case of the RAR, a number of papers have claimed that the acceleration scale isn’t constant. Not even remotely! Why do they make this claim?
Below is a histogram of raw acceleration scales from SPARC galaxies. In effect, they are claiming that they can tell the difference between galaxies in the tail on one side of the histogram from those on the opposite side. We don’t think we can, which is the more conservative claim. The width of the histogram is just the scatter that one expects from astronomical data, so the data are consistent with zero intrinsic scatter. That’s not to say that’s necessarily what Nature is doing: we can never measure zero scatter, so it is always conceivable that there is some intrinsic variation in the characteristic acceleration scale. All we can say is that if is there, it is so small that we cannot yet resolve it.
Posed as a histogram like this, it is easy to see that there is a characteristic value – the peak – with some scatter around it. The entire issue it whether that scatter is due to real variation from galaxy to galaxy, or if it is just noise. One way to check this is to make quality cuts: in the plot above, the gray-striped histogram plots every available galaxy. The solid blue one makes some mild quality cuts, like knowing the distance to better than 20%. That matters, because the acceleration scale is a quantity that depends on distance – a notoriously difficult quantity to measure accurately in astronomy. When this quality cut is imposed, the width of the histogram shrinks. The better data make a tighter histogram – just as one would expect if the scatter is due to noise. If instead the scatter is a real, physical effect, it should, if anything, be more pronounced in the better data.
This should not be difficult to understand. And yet – other representations of the data give a different impression, like this one:
This figure tells a very different story. The characteristic acceleration does not just scatter around a universal value. There is a clear correlation from one end of the plot to the other. Indeed, it is a perfectly smooth transition, because “Galaxy” is the number of each galaxy ordered by the value of its acceleration, from lowest to highest. The axes are not independent, they represent identically the same quantity. It is a plot of x against x. If properly projected it into a histogram, it would look like the one above.
This is a terrible way to plot data. It makes it look like there is a correlation where there is none. Setting this aside, there is a potential issue with the most discrepant galaxies – those at either extreme. There are more points that are roughly 3 sigma from a constant value than there should be for a sample this size. If this is the right assessment of the uncertainty, then there is indeed some variation from galaxy to galaxy. Not much, but the galaxies at the left hand side of the plot are different from those on the right hand side.
But can we believe the formal uncertainties that inform this error analysis? If you’ve read this far, you will anticipate that the answer to this question obeys Betteridge’s law. No.
One of the reasons we can’t just assign confidence intervals and believe them like a common physicist is that there are other factors in the analysis – nuisance parameters in Bayesian verbiage – with which the acceleration scale covaries. That’s a fancy way of saying that if we turn one knob, it affects another. We assign priors to the nuisance parameters (e.g., the distance to each galaxy and its inclination) based on independent measurements. But there is still some room to slop around. The question is really what to believe at the end of the analysis. We don’t think we can distinguish the acceleration scale from one galaxy to another, but this other analysis says we should. So which is it?
It is easy at this point to devolve into accusations of picking priors to obtain a preconceived result. I don’t think anyone is doing that. But how to show it?
Pengfei had the brilliant idea to perform the same analysis as Marra et al., but allowing Newton’s constant to vary. This is Big G, a universal constant that’s been known to be a constant of nature for centuries. It surely does not vary. However, G appears in our equations, so we can test for variation therein. Pengfei did this, following the same procedure as Mara et al., and finds the same kind of graph – now for G instead of g+.
You see here the same kind of trend for Newton’s constant as one sees above for the acceleration scale. The same data have been analyzed in the same way. It has also been plotted in the same way, giving the impression of a correlation where there is none. The result is also the same: if we believe the formal uncertainties, the best-fit G is different for the galaxies at the left than from those to the right.
I’m pretty sure Newton’s constant does not vary this much. I’m entirely sure that the rotation curve data we analyze are not capable of making this determination. It would be absurd to claim so. The same absurdity extends to the acceleration scale g+. If we don’t believe the variation in G, there’s no reason to believe that in g+.
So what is going on here? It boils down to the errors on the rotation curves not representing the uncertainty in the circular velocity as we would like for them to. There are all sorts of reasons for this, observational, physical, and systematic. I’ve written about this at great lengths elsewhere, and I haven’t the patience to do so again here. it is turgidly technical to the extent that even the pros don’t read it. It boils down to the ancient, forgotten wisdom of astronomy: you have to take the errors with a grain of salt.
Here is the cumulative distribution (CDF) of reduced chi squared for the plot above.
Two things to notice here. First, the CDF looks the same regardless of whether we let Newton’s constant vary or not, or how we assign the Bayesian priors. There’s no value added in letting it vary – just as we found for the characteristic acceleration scale in the first place. Second, the reduced chi squared is rarely close to one. It should be! As a goodness of fit measure, one claims to have a good fit when chi squared equal to one. The majority of these are not good fits! Rather than the gradual slope we see here, the CDF of chi squared should be a nearly straight vertical line. That’s nothing like what we see.
If one interprets this literally, there are many large chi squared values well in excess of unity. These are bad fits, and the model should be rejected. That’s exactly what Rodrigues et al. (2018) found, rejecting the constancy of the acceleration scale at 10 sigma. By their reasoning, we must also reject the constancy of Newton’s constant with the same high confidence. That’s just silly.
One strange thing: the people complaining that the acceleration scale is not constant are only testing that hypothesis. Their presumption is that if the data reject that, it falsifies MOND. The attitude is that this is an automatic win for dark matter. Is it? They don’t bother checking.
We do. We can do the same exercise with dark matter. We find the same result. The CDF looks the same; there are many galaxies with chi squared that is too large.
Having found the same result for dark matter halos that we found for the RAR, if we apply the same logic, then all proposed model halos are excluded. There are too many bad fits with overly large chi squared.
We have now ruled out all conceivable models. Dark matter is falsified. MOND is falsified. Nothing works. Look on these data, ye mighty, and despair.
But wait! Should we believe the error bars that lead to the end of all things? What would Betteridge say?
Here is the rotation curve of DDO 170 fit with the RAR. Look first at the left box, with the data (points) and the fit (red line). Then look at the fit parameters in the right box.
Looking at the left panel, this is a good fit. The line representing the model provides a reasonable depiction of the data.
Looking at the right panel, this is a terrible fit. The reduced chi squared is 4.9. That’s a lot larger than one! The model is rejected with high confidence.
Well, which is it? Lots of people fall into the trap of blindly trusting statistical tests like chi squared. Statistics can only help your brain. They can’t replace it. Trust your eye-brain. This is a good fit. Chi squared is overly large not because this is a bad model but because the error bars are too small. The absolute amount by which the data “miss” is just a few km/s. This is not much by the standards of galaxies, and could easily be explained by a small departure of the tracer from a purely circular orbit – a physical effect we expect at that level. Or it could simply be that the errors are underestimated. Either way, it isn’t a big deal. It would be incredibly naive to take chi squared at face value.
If you want to see a dozen plots like this for all the various models fit to each of over a hundred galaxies, see Li et al. (2020). The bottom line is always the same. The same galaxies are poorly fit by any model — dark matter or MOND. Chi squared is too big not because all conceivable models are wrong, but because the formal errors are underestimated in many cases.
This comes as no surprise to anyone with experience working with astronomical data. We can work to improve the data and the error estimation – see, for example, Sellwood et al (2021). But we can’t blindly turn the crank on some statistical black box and expect all the secrets of the universe to tumble out onto a silver platter for our delectation. There’s a little more to it than that.
The distance scale is fundamental to cosmology. How big is the universe? is pretty much the first question we ask when we look at the Big Picture.
The primary yardstick we use to describe the scale of the universe is Hubble’s constant: the H0 in
v = H0 D
that relates the recession velocity (redshift) of a galaxy to its distance. More generally, this is the current expansion rate of the universe. Pick up any book on cosmology and you will find a lengthy disquisition on the importance of this fundamental parameter that encapsulates the size, age, critical density, and potential fate of the cosmos. It is the first of the Big Two numbers in cosmology that expresses the still-amazing fact that the entire universe is expanding.
Quantifying the distance scale is hard. Throughout my career, I have avoided working on it. There are quite enough, er, personalities on the case already.
No need for me to add to the madness.
Not that I couldn’t. The Tully-Fisher relation has long been used as a distance indicator. It played an important role in breaking the stranglehold that H0 = 50 km/s/Mpc had on the minds of cosmologists, including myself. Tully & Fisher (1977) found that it was approximately 80 km/s/Mpc. Their method continues to provide strong constraints to this day: Kourkchi et al. find H0 = 76.0 ± 1.1(stat) ± 2.3(sys) km s-1 Mpc-1. So I’ve been happy to stay out of it.
I am motivated in part by the calibration opportunity provided by gas rich galaxies, in part by the fact that tension in independent approaches to constrain the Hubble constant only seems to be getting worse, and in part by a recent conference experience. (Remember when we traveled?) Less than a year ago, I was at a cosmology conference in which I heard an all-too-typical talk that asserted that the Planck H0 = 67.4 ± 0.5 km/s/Mpc had to be correct and everybody who got something different was a stupid-head. I’ve seen this movie before. It is the same community (often the very same people) who once insisted that H0 had to be 50, dammit. They’re every bit as overconfident as before, suffering just as much from confirmation bias (LCDM! LCDM! LCDM!), and seem every bit as likely to be correct this time around.
So, is it true? We have the data, we’ve just refrained from using it in this particular way because other people were on the case. Let’s check.
The big hassle here is not measuring H0 so much as quantifying the uncertainties. That’s the part that’s really hard. So all credit goes to Jim Schombert, who rolled up his proverbial sleeves and did all the hard work. Federico Lelli and I mostly just played the mother-of-all-jerks referees (I’ve had plenty of role models) by asking about every annoying detail. To make a very long story short, none of the items under our control matter at a level we care about, each making < 1 km/s/Mpc difference to the final answer.
In principle, the Baryonic Tully-Fisher relation (BTFR) helps over the usual luminosity-based version by including the gas, which extends application of the relation to lower mass galaxies that can be quite gas rich. Ignoring this component results in a mess that can only be avoided by restricting attention to bright galaxies. But including it introduces an extra parameter. One has to adopt a stellar mass-to-light ratio to put the stars and the gas on the same footing. I always figured that would make things worse – and for a long time, it did. That is no longer the case. So long as we treat the calibration sample that defines the BTFR and the sample used to measure the Hubble constant self-consistently, plausible choices for the mass-to-light ratio return the same answer for H0. It’s all relative – the calibration changes with different choices, but the application to more distant galaxies changes in the same way. Same for the treatment of molecular gas and metallicity. It all comes out in the wash. Our relative distance scale is very precise. Putting an absolute number on it simply requires a lot of calibrating galaxies with accurate, independently measured distances.
Here is the absolute calibration of the BTFR that we obtain:
In constructing this calibrated BTFR, we have relied on distance measurements made or compiled by the Extragalactic Distance Database, which represents the cumulative efforts of Tully and many others to map out the local universe in great detail. We have also benefited from the work of Ponomareva et al, which provides new calibrator galaxies not already in our SPARC sample. Critically, they also measure the flat velocity from rotation curves, which is a huge improvement in accuracy over the more readily available linewidths commonly employed in Tully-Fisher work, but is expensive to obtain so remains the primary observational limitation on this procedure.
Still, we’re in pretty good shape. We now have 50 galaxies with well measured distances as well as the necessary ingredients to construct the BTFR: extended, resolved rotation curves, HI fluxes to measure the gas mass, and Spitzer near-IR data to estimate the stellar mass. This is a huge sample for which to have all of these data simultaneously. Measuring distances to individual galaxies remains challenging and time-consuming hard work that has been done by others. We are not about to second-guess their results, but we can note that they are sensible and remarkably consistent.
There are two primary methods by which the distances we use have been measured. One is Cepheids – the same type of variable stars that Hubble used to measure the distance to spiral nebulae to demonstrate their extragalactic nature. The other is the tip of the red giant branch (TRGB) method, which takes advantage of the brightest red giants having nearly the same luminosity. The sample is split nearly 50/50: there are 27 galaxies with a Cepheid distance measurement, and 23 with the TRGB. The two methods (different colored points in the figure) give the same calibration, within the errors, as do the two samples (circles vs. diamonds). There have been plenty of mistakes in the distance scale historically, so this consistency is important. There are many places where things could go wrong: differences between ourselves and Ponomareva, differences between Cepheids and the TRGB as distance indicators, mistakes in the application of either method to individual galaxies… so many opportunities to go wrong, and yet everything is consistent.
Having followed the distance scale problem my entire career, I cannot express how deeply impressive it is that all these different measurements paint a consistent picture. This is a credit to a large community of astronomers who have worked diligently on this problem for what seems like aeons. There is a temptation to dismiss distance scale work as having been wrong in the past, so it can be again. Of course that is true, but it is also true that matters have improved considerably. Forty years ago, it was not surprising when a distance indicator turned out to be wrong, and distances changed by a factor of two. That stopped twenty years ago, thanks in large part to the Hubble Space Telescope, a key goal of which had been to nail down the distance scale. That mission seems largely to have been accomplished, with small differences persisting only at the level that one expects from experimental error. One cannot, for example, make a change to the Cepheid calibration without creating a tension with the TRGB data, or vice-versa: both have to change in concert by the same amount in the same direction. That is unlikely to the point of wishful thinking.
Having nailed down the absolute calibration of the BTFR for galaxies with well-measured distances, we can apply it to other galaxies for which we know the redshift but not the distance. There are nearly 100 suitable galaxies available in the SPARC database. Consistency between them and the calibrator galaxies requires
H0 = 75.1 +/- 2.3 (stat) +/- 1.5 (sys) km/s/Mpc.
This is consistent with the result for the standard luminosity-linewidth version of the Tully-Fisher relation reported by Kourkchi et al. Note also that our statistical (random/experimental) error is larger, but our systematic error is smaller. That’s because we have a much smaller number of galaxies. The method is, in principle, more precise (mostly because rotation curves are more accurate than linewidhts), so there is still a lot to be gained by collecting more data.
Our measurement is also consistent with many other “local” measurements of the distance scale,
So, where does this leave us? In the past, it was easy to dismiss a tension of this sort as due to some systematic error, because that happened all the time – in the 20th century. That’s not so true anymore. It looks to me like the tension is real.
This Thanksgiving, I’d highlight something positive. Recently, Bob Sanders wrote a paper pointing out that gas rich galaxies are strong tests of MOND. The usual fit parameter, the stellar mass-to-light ratio, is effectively negligible when gas dominates. The MOND prediction follows straight from the gas distribution, for which there is no equivalent freedom. We understand the 21 cm spin-flip transition well enough to relate observed flux directly to gas mass.
In any human endeavor, there are inevitably unsung heroes who carry enormous amounts of water but seem to get no credit for it. Sanders is one of those heroes when it comes to the missing mass problem. He was there at the beginning, and has a valuable perspective on how we got to where we are. I highly recommend his books, The Dark Matter Problem: A Historical Perspective and Deconstructing Cosmology.
In bright spiral galaxies, stars are usually 80% or so of the mass, gas only 20% or less. But in many dwarf galaxies, the mass ratio is reversed. These are often low surface brightness and challenging to observe. But it is a worthwhile endeavor, as their rotation curve is predicted by MOND with extraordinarily little freedom.
Though gas rich galaxies do indeed provide an excellent test of MOND, nothing in astronomy is perfectly clean. The stellar mass-to-light ratio is an irreducible need-to-know parameter. We also need to know the distance to each galaxy, as we do not measure the gas mass directly, but rather the flux of the 21 cm line. The gas mass scales with flux and the square of the distance (see equation 7E7), so to get the gas mass right, we must first get the distance right. We also need to know the inclination of a galaxy as projected on the sky in order to get the rotation to which we’re fitting right, as the observed line of sight Doppler velocity is only sin(i) of the full, in-plane rotation speed. The 1/sin(i) correction becomes increasingly sensitive to errors as i approaches zero (face-on galaxies).
The mass-to-light ratio is a physical fit parameter that tells us something meaningful about the amount of stellar mass that produces the observed light. In contrast, for our purposes here, distance and inclination are “nuisance” parameters. These nuisance parameters can be, and generally are, measured independently from mass modeling. However, these measurements have their own uncertainties, so one has to be careful about taking these measured values as-is. One of the powerful aspects of Bayesian analysis is the ability to account for these uncertainties to allow for the distance to be a bit off the measured value, so long as it is not too far off, as quantified by the measurement uncertainties. This is what current graduate student Pengfei Li did in Li et al. (2018). The constraints on MOND are so strong in gas rich galaxies that often the nuisance parameters cannot be ignored, even when they’re well measured.
To illustrate what I’m talking about, let’s look at one famous example, DDO 154. This galaxy is over 90% gas. The stars (pictured above) just don’t matter much. If the distance and inclination are known, the MOND prediction for the rotation curve follows directly. Here is an example of a MOND fit from a recent paper:
This is terrible! The MOND fit – essentially a parameter-free prediction – misses all of the data. MOND is falsified. If one is inclined to hate MOND, as many seem to be, then one stops here. No need to think further.
If one is familiar with the ups and downs in the history of astronomy, one might not be so quick to dismiss it. Indeed, one might notice that the shape of the MOND prediction closely tracks the shape of the data. There’s just a little difference in scale. That’s kind of amazing for a theory that is wrong, especially when it is amplifying the green line to predict the red one: it needn’t have come anywhere close.
Here is the fit to the same galaxy using the same data [already] published in Li et al.:
Now we have a good fit, using the same data! How can this be so?
I have not checked what Ren et al. did to obtain their MOND fits, but having done this exercise myself many times, I recognize the slight offset they find as a typical consequence of holding the nuisance parameters fixed. What if the measured distance is a little off?
Distance estimates to DDO 154 in the literature range from 3.02 Mpc to 6.17 Mpc. The formally most accurate distance measurement is 4.04 ± 0.08 Mpc. In the fit shown here, we obtained 3.87 ± 0.16 Mpc. The error bars on these distances overlap, so they are the same number, to measurement accuracy. These data do not falsify MOND. They demonstrate that it is sensitive enough to tell the difference between 3.8 and 4.1 Mpc.
One will never notice this from a dark matter fit. Ren et al. also make fits with self-interacting dark matter (SIDM). The nifty thing about SIDM is that it makes quasi-constant density cores in dark matter halos. Halos of this form are not predicted by “ordinary” cold dark matter (CDM), but often give better fits than either MOND of the NFW halos of dark matter-only CDM simulations. For this galaxy, Ren et al. obtain the following SIDM fit.
This is a great fit. Goes right through the data. That makes it better, right?
Not necessarily. In addition to the mass-to-light ratio (and the nuisance parameters of distance and inclination), dark matter halo fits have [at least] two additional free parameters to describe the dark matter halo, such as its mass and core radius. These parameters are highly degenerate – one can obtain equally good fits for a range of mass-to-light ratios and core radii: one makes up for what the other misses. Parameter degeneracy of this sort is usually a sign that there is too much freedom in the model. In this case, the data are adequately described by one parameter (the MOND fit M*/L, not counting the nuisances in common), so using three (M*/L, Mhalo, Rcore) is just an exercise in fitting a French curve. There is ample freedom to fit the data. As a consequence, you’ll never notice that one of the nuisance parameters might be a tiny bit off.
In other words, you can fool a dark matter fit, but not MOND. Erwin de Blok and I demonstrated this 20 years ago. A common myth at that time was that “MOND is guaranteed to fit rotation curves.” This seemed patently absurd to me, given how it works: once you stipulate the distribution of baryons, the rotation curve follows from a simple formula. If the two don’t match, they don’t match. There is no guarantee that it’ll work. Instead, it can’t be forced.
As an illustration, Erwin and I tried to trick it. We took two galaxies that are identical in the Tully-Fisher plane (NGC 2403 and UGC 128) and swapped their mass distribution and rotation curve. These galaxies have the same total mass and the same flat velocity in the outer part of the rotation curve, but the detailed distribution of their baryons differs. If MOND can be fooled, this closely matched pair ought to do the trick. It does not.
Our failure to trick MOND should not surprise anyone who bothers to look at the math involved. There is a one-to-one relation between the distribution of the baryons and the resulting rotation curve. If there is a mismatch between them, a fit cannot be obtained.
We also attempted to play this same trick on dark matter. The standard dark matter halo fitting function at the time was the pseudo-isothermal halo, which has a constant density core. It is very similar to the halos of SIDM and to the cored dark matter halos produced by baryonic feedback in some simulations. Indeed, that is the point of those efforts: they are trying to capture the success of cored dark matter halos in fitting rotation curve data.
Dark matter halos with a quasi-constant density core do indeed provide good fits to rotation curves. Too good. They are easily fooled, because they have too many degrees of freedom. They will fit pretty much any plausible data that you throw at them. This is why the SIDM fit to DDO 154 failed to flag distance as a potential nuisance. It can’t. You could double (or halve) the distance and still find a good fit.
This is why parameter degeneracy is bad. You get lost in parameter space. Once lost there, it becomes impossible to distinguish between successful, physically meaningful fits and fitting epicycles.
Astronomical data are always subject to improvement. For example, the THINGS project obtained excellent data for a sample of nearby galaxies. I made MOND fits to all the THINGS (and other) data for the MOND review Famaey & McGaugh (2012). Here’s the residual diagram, which has been on my web page for many years:
These are, by and large, good fits. The residuals have a well defined peak centered on zero. DDO 154 was one of the THINGS galaxies; lets see what happens if we use those data.
The first thing one is likely to notice is that the THINGS data are much better resolved than the previous generation used above. The first thing I noticed was that THINGS had assumed a distance of 4.3 Mpc. This was prior to the measurement of 4.04, so lets just start over from there. That gives the MOND prediction shown above.
And it is a prediction. I haven’t adjusted any parameters yet. The mass-to-light ratio is set to the mean I expect for a star forming stellar population, 0.5 in solar units in the Sptizer 3.6 micron band. D=4.04 Mpc and i=66 as tabulated by THINGS. The result is pretty good considering that no parameters have been harmed in the making of this plot. Nevertheless, MOND overshoots a bit at large radii.
Constraining the inclinations for gas rich dwarf galaxies like DDO 154 is a bit of a nightmare. Literature values range from 20 to 70 degrees. Seriously. THINGS itself allows the inclination to vary with radius; 66 is just a typical value. Looking at the fit Pengfei obtained, i=61. Let’s try that.
The fit is now satisfactory. One tweak to the inclination, and we’re done. This tweak isn’t even a fit to these data; it was adopted from Pengfei’s fit to the above data. This tweak to the inclination is comfortably within any plausible assessment of the uncertainty in this quantity. The change in sin(i) corresponds to a mere 4% in velocity. I could probably do a tiny bit better with further adjustment – I have left both the distance and the mass-to-light ratio fixed – but that would be a meaningless exercise in statistical masturbation. The result just falls out: no muss, no fuss.
Hence the point Bob Sanders makes. Given the distribution of gas, the rotation curve follows. And it works, over and over and over, within the bounds of the uncertainties on the nuisance parameters.
One cannot do the same exercise with dark matter. It has ample ability to fit rotation curve data, once those are provided, but zero power to predict it. If all had been well with ΛCDM, the rotation curves of these galaxies would look like NFW halos. Or any number of other permutations that have been discussed over the years. In contrast, MOND makes one unique prediction (that was not at all anticipated in dark matter), and that’s what the data do. Out of the huge parameter space of plausible outcomes from the messy hierarchical formation of galaxies in ΛCDM, Nature picks the one that looks exactly like MOND.
It is a bad sign for a theory when it can only survive by mimicking its alternative. This is the case here: ΛCDM must imitate MOND. There are now many papers asserting that it can do just this, but none of those were written before the data were provided. Indeed, I consider it to be problematic that clever people can come with ways to imitate MOND with dark matter. What couldn’t it imitate? If the data had all looked like technicolor space donkeys, we could probably find a way to make that so as well.
Cosmologists will rush to say “microwave background!” I have some sympathy for that, because I do not know how to explain the microwave background in a MOND-like theory. At least I don’t pretend to, even if I had more predictive success there than their entire community. But that would be a much longer post.
For now, note that the situation is even worse for dark matter than I have so far made it sound. In many dwarf galaxies, the rotation velocity exceeds that attributable to the baryons (with Newton alone) at practically all radii. By a lot. DDO 154 is a very dark matter dominated galaxy. The baryons should have squat to say about the dynamics. And yet, all you need to know to predict the dynamics is the baryon distribution. The baryonic tail wags the dark matter dog.
But wait, it gets better! If you look closely at the data, you will note a kink at about 1 kpc, another at 2, and yet another around 5 kpc. These kinks are apparent in both the rotation curve and the gas distribution. This is an example of Sancisi’s Law: “For any feature in the luminosity profile there is a corresponding feature in the rotation curve and vice versa.” This is a general rule, as Sancisi observed, but it makes no sense when the dark matter dominates. The features in the baryon distribution should not be reflected in the rotation curve.
The observed baryons orbit in a disk with nearly circular orbits confined to the same plane. The dark matter moves on eccentric orbits oriented every which way to provide pressure support to a quasi-spherical halo. The baryonic and dark matter occupy very different regions of phase space, the six dimensional volume of position and momentum. The two are not strongly coupled, communicating only by the weak force of gravity in the standard CDM paradigm.
One of the first lessons of galaxy dynamics is that galaxy disks are subject to a variety of instabilities that grow bars and spiral arms. These are driven by disk self-gravity. The same features do not appear in elliptical galaxies because they are pressure supported, 3D blobs. They don’t have disks so they don’t have disk self-gravity, much less the features that lead to the bumps and wiggles observed in rotation curves.
Elliptical galaxies are a good visual analog for what dark matter halos are believed to be like. The orbits of dark matter particles are unable to sustain features like those seen in baryonic disks. They are featureless for the same reasons as elliptical galaxies. They don’t have disks. A rotation curve dominated by a spherical dark matter halo should bear no trace of the features that are seen in the disk. And yet they’re there, often enough for Sancisi to have remarked on it as a general rule.
It gets worse still. One of the original motivations for invoking dark matter was to stabilize galactic disks: a purely Newtonian disk of stars is not a stable configuration, yet the universe is chock full of long-lived spiral galaxies. The cure was to place them in dark matter halos.
The problem for dwarfs is that they have too much dark matter. The halo stabilizes disks by suppressing the formation of structures that stem from disk self-gravity. But you need some disk self-gravity to have the observed features. That can be tuned to work in bright spirals, but it fails in dwarfs because the halo is too massive. As a practical matter, there is no disk self-gravity in dwarfs – it is all halo, all the time. And yet, we do see such features. Not as strong as in big, bright spirals, but definitely present. Whenever someone tries to analyze this aspect of the problem, they inevitably come up with a requirement for more disk self-gravity in the form of unphysically high stellar mass-to-light ratios (something I predicted would happen). In contrast, this is entirely natural in MOND (see, e.g., Brada & Milgrom 1999 and Tiret & Combes 2008), where it is all disk self-gravity since there is no dark matter halo.
The net upshot of all this is that it doesn’t suffice to mimic the radial acceleration relation as many simulations now claim to do. That was not a natural part of CDM to begin with, but perhaps it can be done with smooth model galaxies. In most cases, such models lack the resolution to see the features seen in DDO 154 (and in NGC 1560 and in IC 2574, etc.) If they attain such resolution, they better not show such features, as that would violate some basic considerations. But then they wouldn’t be able to describe this aspect of the data.
Simulators by and large seem to remain sanguine that this will all work out. Perhaps I have become too cynical, but I recall hearing that 20 years ago. And 15. And ten… basically, they’ve always assured me that it will work out even though it never has. Maybe tomorrow will be different. Or would that be the definition of insanity?
I have been wanting to write about dwarf satellites for a while, but there is so much to tell that I didn’t think it would fit in one post. I was correct. Indeed, it was worse than I thought, because my own experience with low surface brightness (LSB) galaxies in the field is a necessary part of the context for my perspective on the dwarf satellites of the Local Group. These are very different beasts – satellites are pressure supported, gas poor objects in orbit around giant hosts, while field LSB galaxies are rotating, gas rich galaxies that are among the most isolated known. However, so far as their dynamics are concerned, they are linked by their low surface density.
Where we left off with the dwarf satellites, circa 2000, Ursa Minor and Draco remained problematic for MOND, but the formal significance of these problems was not great. Fornax, which had seemed more problematic, was actually a predictive success: MOND returned a low mass-to-light ratio for Fornax because it was full of young stars. The other known satellites, Carina, Leo I, Leo II, Sculptor, and Sextans, were all consistent with MOND.
The Sloan Digital Sky Survey resulted in an explosion in the number of satellites galaxies discovered around the Milky Way. These were both fainter and lower surface brightness than the classical dwarfs named above. Indeed, they were often invisible as objects in their own right, being recognized instead as groupings of individual stars that shared the same position in space and – critically – velocity. They weren’t just in the same place, they were orbiting the Milky Way together. To give short shrift to a long story, these came to be known as ultrafaint dwarfs.
Ultrafaint dwarf satellites have fewer than 100,000 stars. That’s tiny for a stellar system. Sometimes they had only a few hundred. Most of those stars are too faint to see directly. Their existence is inferred from a handful of red giants that are actually observed. Where there are a few red giants orbiting together, there must be a source population of fainter stars. This is a good argument, and it is likely true in most cases. But the statistics we usually rely on become dodgy for such small numbers of stars: some of the ultrafaints that have been reported in the literature are probably false positives. I have no strong opinion on how many that might be, but I’d be really surprised if it were zero.
Nevertheless, assuming the ultrafaints dwarfs are self-bound galaxies, we can ask the same questions as before. I was encouraged to do this by Joe Wolf, a clever grad student at UC Irvine. He had a new mass estimator for pressure supported dwarfs that we decided to apply to this problem. We used the Baryonic Tully-Fisher Relation (BTFR) as a reference, and looked at it every which-way. Most of the text is about conventional effects in the dark matter picture, and I encourage everyone to read the full paper. Here I’m gonna skip to the part about MOND, because that part seems to have been overlooked in more recent commentary on the subject.
For starters, we found that the classical dwarfs fall along the extrapolation of the BTFR, but the ultrafaint dwarfs deviate from it.
The deviation is not subtle, at least not in terms of mass. The ultrataints had characteristic circular velocities typical of systems 100 times their mass! But the BTFR is steep. In terms of velocity, the deviation is the difference between the 8 km/s typically observed, and the ~3 km/s needed to put them on the line. There are a large number of systematic effects errors that might arise, and all act to inflate the characteristic velocity. See the discussion in the paper if you’re curious about such effects; for our purposes here we will assume that the data cannot simply be dismissed as the result of systematic errors, though one should bear in mind that they probably play a role at some level.
Taken at face value, the ultrafaint dwarfs are a huge problem for MOND. An isolated system should fall exactly on the BTFR. These are not isolated systems, being very close to the Milky Way, so the external field effect (EFE) can cause deviations from the BTFR. However, these are predicted to make the characteristic internal velocities lower than the isolated case. This may in fact be relevant for the red points that deviate a bit in the plot above, but we’ll return to that at some future point. The ultrafaints all deviate to velocities that are too high, the opposite of what the EFE predicts.
The ultrafaints falsify MOND! When I saw this, all my original confirmation bias came flooding back. I had pursued this stupid theory to ever lower surface brightness and luminosity. Finally, I had found where it broke. I felt like Darth Vader in the original Star Wars:
The first draft of my paper with Joe included a resounding renunciation of MOND. No way could it escape this!
I had this nagging feeling I was missing something. Darth should have looked over his shoulder. Should I?
Surely I had missed nothing. Many people are unaware of the EFE, just as we had been unaware that Fornax contained young stars. But not me! I knew all that. Surely this was it.
Nevertheless, the nagging feeling persisted. One part of it was sociological: if I said MOND was dead, it would be well and truly buried. But did it deserve to be? The scientific part of the nagging feeling was that maybe there had been some paper that addressed this, maybe a decade before… perhaps I’d better double check.
Indeed, Brada & Milgrom (2000) had run numerical simulations of dwarf satellites orbiting around giant hosts. MOND is a nonlinear dynamical theory; not everything can be approximated analytically. When a dwarf satellite is close to its giant host, the external acceleration of the dwarf falling towards its host can exceed the internal acceleration of the stars in the dwarf orbiting each other – hence the EFE. But the EFE is not a static thing; it varies as the dwarf orbits about, becoming stronger on closer approach. At some point, this variation becomes to fast for the dwarf to remain in equilibrium. This is important, because the assumption of dynamical equilibrium underpins all these arguments. Without it, it is hard to know what to expect short of numerically simulating each individual dwarf. There is no reason to expect them to remain on the equilibrium BTFR.
Brada & Milgrom suggested a measure to gauge the extent to which a dwarf might be out of equilibrium. It boils down to a matter of timescales. If the stars inside the dwarf have time to adjust to the changing external field, a quasi-static EFE approximation might suffice. So the figure of merit becomes the ratio of internal orbits per external orbit. If the stars inside a dwarf are swarming around many times for every time it completes an orbit around the host, then they have time to adjust. If the orbit of the dwarf around the host is as quick as the internal motions of the stars within the dwarf, not so much. At some point, a satellite becomes a collection of associated stars orbiting the host rather than a self-bound object in its own right.
Brada & Milgrom provide the formula to compute the ratio of orbits, shown in the figure above. The smaller the ratio, the less chance an object has to adjust, and the more subject it is to departures from equilibrium. Remarkably, the amplitude of deviation from the BTFR – the problem I could not understand initially – correlates with the ratio of orbits. The more susceptible a dwarf is to disequilibrium effects, the farther it deviated from the BTFR.
This completely inverted the MOND interpretation. Instead of falsifying MOND, the data now appeared to corroborate the non-equilibrium prediction of Brada & Milgrom. The stronger the external influence, the more a dwarf deviated from the equilibrium expectation. In conventional terms, it appeared that the ultrafaints were subject to tidal stirring: their internal velocities were being pumped up by external influences. Indeed, the originally problematic cases, Draco and Ursa Minor, fall among the ultrafaint dwarfs in these terms. They can’t be in equilibrium in MOND.
If the ultrafaints are out of equilibrium, the might show some independent evidence of this. Stars should leak out, distorting the shape of the dwarf and forming tidal streams. Can we see this?
A definite maybe:
The dwarfs that are more subject to external influence tend to be more elliptical in shape. A pressure supported system in equilibrium need not be perfectly round, but one departing from equilibrium will tend to get stretched out. And indeed, many of the ultrafaints look Messed Up.
I am not convinced that all this requires MOND. But it certainly doesn’t falsify it. Tidal disruption can happen in the dark matter context, but it happens differently. The stars are buried deep inside protective cocoons of dark matter, and do not feel tidal effects much until most of the dark matter is stripped away. There is no reason to expect the MOND measure of external influence to apply (indeed, it should not), much less that it would correlate with indications of tidal disruption as seen above.
This seems to have been missed by more recent papers on the subject. Indeed, Fattahi et al. (2018) have reconstructed very much the chain of thought I describe above. The last sentence of their abstract states “In many cases, the resulting velocity dispersions are inconsistent with the predictions from Modified Newtonian Dynamics, a result that poses a possibly insurmountable challenge to that scenario.” This is exactly what I thought. (I have you now.) I was wrong.
Fattahi et al. are wrong for the same reasons I was wrong. They are applying equilibrium reasoning to a non-equilibrium situation. Ironically, the main point of the their paper is that many systems can’t be explained with dark matter, unless they are tidally stripped – i.e., the result of a non-equilibrium process. Oh, come on. If you invoke it in one dynamical theory, you might want to consider it in the other.
To quote the last sentence of our abstract from 2010, “We identify a test to distinguish between the ΛCDM and MOND based on the orbits of the dwarf satellites of the Milky Way and how stars are lost from them.” In ΛCDM, the sub-halos that contain dwarf satellites are expected to be on very eccentric orbits, with all the damage from tidal interactions with the host accruing during pericenter passage. In MOND, substantial damage may accrue along lower eccentricity orbits, leading to the expectation of more continuous disruption.
Gaia is measuring proper motions for stars all over the sky. Some of these stars are in the dwarf satellites. This has made it possible to estimate orbits for the dwarfs, e.g., work by Amina Helmi (et al!) and Josh Simon. So far, the results are definitely mixed. There are more dwarfs on low eccentricity orbits than I had expected in ΛCDM, but there are still plenty that are on high eccentricity orbits, especially among the ultrafaints. Which dwarfs have been tidally affected by interactions with their hosts is far from clear.
In short, reality is messy. It is going to take a long time to sort these matters out. These are early days.
The Milky Way and its nearest giant neighbor Andromeda (M31) are surrounded by a swarm of dwarf satellite galaxies. Aside from relatively large beasties like the Large Magellanic Cloud or M32, the majority of these are the so-called dwarf spheroidals. There are several dozen examples known around each giant host, like the Fornax dwarf pictured above.
Dwarf Spheroidal (dSph) galaxies are ellipsoidal blobs devoid of gas that typically contain a million stars, give or take an order of magnitude. Unlike globular clusters, that may have a similar star count, dSphs are diffuse, with characteristic sizes of hundreds of parsecs (vs. a few pc for globulars). This makes them among the lowest surface brightness systems known.
This subject has a long history, and has become a major industry in recent years. In addition to the “classical” dwarfs that have been known for decades, there have also been many comparatively recent discoveries, often of what have come to be called “ultrafaint” dwarfs. These are basically dSphs with luminosities less than 100,000 suns, sometimes being comprised of as little as a few hundred stars. New discoveries are being made still, and there is reason to hope that the LSST will discover many more. Summed up, the known dwarf satellites are proverbial drops in the bucket compared to their giant hosts, which contain hundreds of billions of stars. Dwarfs could rain in for a Hubble time and not perturb the mass budget of the Milky Way.
Nevertheless, tiny dwarf Spheroidals are excellent tests of theories like CDM and MOND. Going back to the beginning, in the early ’80s, Milgrom was already engaged in a discussion about the predictions of his then-new theory (before it was even published) with colleagues at the IAS, where he had developed the idea during a sabbatical visit. They were understandably skeptical, preferring – as many still do – to believe that some unseen mass was the more conservative hypothesis. Dwarf spheroidals came up even then, as their very low surface brightness meant low acceleration in MOND. This in turn meant large mass discrepancies. If you could measure their dynamics, they would have large mass-to-light ratios. Larger than could be explained by stars conventionally, and larger than the discrepancies already observed in bright galaxies like Andromeda.
This prediction of Milgrom’s – there from the very beginning – is important because of how things change (or don’t). At that time, Scott Tremaine summed up the contrasting expectation of the conventional dark matter picture:
“There is no reason to expect that dwarfs will have more dark matter than bright galaxies.” *
This was certainly the picture I had in my head when I first became interested in low surface brightness (LSB) galaxies in the mid-80s. At that time I was ignorant of MOND; my interest was piqued by the argument of Disney that there could be a lot of as-yet undiscovered LSB galaxies out there, combined with my first observing experiences with the then-newfangled CCD cameras which seemed to have a proclivity for making clear otherwise hard-to-see LSB features. At the time, I was interested in finding LSB galaxies. My interest in what made them rotate came later.
The first indication, to my knowledge, that dSph galaxies might have large mass discrepancies was provided by Marc Aaronson in 1983. This tentative discovery was hugely important, but the velocity dispersion of Draco (one of the “classical” dwarfs) was based on only 3 stars, so was hardly definitive. Nevertheless, by the end of the ’90s, it was clear that large mass discrepancies were a defining characteristic of dSphs. Their conventionally computed M/L went up systematically as their luminosity declined. This was not what we had expected in the dark matter picture, but was, at least qualitatively, in agreement with MOND.
My own interests had focused more on LSB galaxies in the field than on dwarf satellites like Draco. Greg Bothun and Jim Schombert had identified enough of these to construct a long list of LSB galaxies that served as targets my for Ph.D. thesis. Unlike the pressure-supported ellipsoidal blobs of stars that are the dSphs, the field LSBs we studied were gas rich, rotationally supported disks – mostly late type galaxies (Sd, Sm, & Irregulars). Regardless of composition, gas or stars, low surface density means that MOND predicts low acceleration. This need not be true conventionally, as the dark matter can do whatever the heck it wants. Though I was blissfully unaware of it at the time, we had constructed the perfect sample for testing MOND.
Having studied the properties of our sample of LSB galaxies, I developed strong ideas about their formation and evolution. Everything we had learned – their blue colors, large gas fractions, and low star formation rates – suggested that they evolved slowly compared to higher surface brightness galaxies. Star formation gradually sputtered along, having a hard time gathering enough material to make stars in their low density interstellar media. Perhaps they even formed late, an idea I took a shining to in the early ’90s. This made two predictions: field LSB galaxies should be less strongly clustered than bright galaxies, and should spin slower at a given mass.
The first prediction follows because the collapse time of dark matter halos correlates with their larger scale environment. Dense things collapse first and tend to live in dense environments. If LSBs were low surface density because they collapsed late, it followed that they should live in less dense environments.
I didn’t know how to test this prediction. Fortunately, fellow postdoc and office mate in Cambridge at the time, Houjun Mo, did. It came true. The LSB galaxies I had been studying were clustered like other galaxies, but not as strongly. This was exactly what I expected, and I thought sure we were on to something. All that remained was to confirm the second prediction.
At the time, we did not have a clear idea of what dark matter halos should be like. NFW halos were still in the future. So it seemed reasonable that late forming halos should have lower densities (lower concentrations in the modern terminology). More importantly, the sum of dark and luminous density was certainly less. Dynamics follow from the distribution of mass as Velocity2 ∝ Mass/Radius. For a given mass, low surface brightness galaxies had a larger radius, by construction. Even if the dark matter didn’t play along, the reduction in the concentration of the luminous mass should lower the rotation velocity.
Indeed, the standard explanation of the Tully-Fisher relation was just this. Aaronson, Huchra, & Mould had argued that galaxies obeyed the Tully-Fisher relation because they all had essentially the same surface brightness (Freeman’s law) thereby taking variation in the radius out of the equation: galaxies of the same mass all had the same radius. (If you are a young astronomer who has never heard of Freeman’s law, you’re welcome.) With our LSB galaxies, we had a sample that, by definition, violated Freeman’s law. They had large radii for a given mass. Consequently, they should have lower rotation velocities.
Up to that point, I had not taken much interest in rotation curves. In contrast, colleagues at the University of Groningen were all about rotation curves. Working with Thijs van der Hulst, Erwin de Blok, and Martin Zwaan, we set out to quantify where LSB galaxies fell in relation to the Tully-Fisher relation. I confidently predicted that they would shift off of it – an expectation shared by many at the time. They did not.
I was flummoxed. My prediction was wrong. That of Aaronson et al. was wrong. Poking about the literature, everyone who had made a clear prediction in the conventional context was wrong. It made no sense.
I spent months banging my head against the wall. One quick and easy solution was to blame the dark matter. Maybe the rotation velocity was set entirely by the dark matter, and the distribution of luminous mass didn’t come into it. Surely that’s what the flat rotation velocity was telling us? All about the dark matter halo?
Problem is, we measure the velocity where the luminous mass still matters. In galaxies like the Milky Way, it matters quite a lot. It does not work to imagine that the flat rotation velocity is set by some property of the dark matter halo alone. What matters to what we measure is the combination of luminous and dark mass. The luminous mass is important in high surface brightness galaxies, and progressively less so in lower surface brightness galaxies. That should leave some kind of mark on the Tully-Fisher relation, but it doesn’t.
I worked long and hard to understand this in terms of dark matter. Every time I thought I had found the solution, I realized that it was a tautology. Somewhere along the line, I had made an assumption that guaranteed that I got the answer I wanted. It was a hopeless fine-tuning problem. The only way to satisfy the data was to have the dark matter contribution scale up as that of the luminous mass scaled down. The more stretched out the light, the more compact the dark – in exact balance to maintain zero shift in Tully-Fisher.
This made no sense at all. Over twenty years on, I have yet to hear a satisfactory conventional explanation. Most workers seem to assert, in effect, that “dark matter does it” and move along. Perhaps they are wise to do so.
As I was struggling with this issue, I happened to hear a talk by Milgrom. I almost didn’t go. “Modified gravity” was in the title, and I remember thinking, “why waste my time listening to that nonsense?” Nevertheless, against my better judgement, I went. Not knowing that anyone in the audience worked on either LSB galaxies or Tully-Fisher, Milgrom proceeded to derive the MOND prediction:
“The asymptotic circular velocity is determined only by the total mass of the galaxy: Vf4 = a0GM.”
In a few lines, he derived rather trivially what I had been struggling to understand for months. The lack of surface brightness dependence in Tully-Fisher was entirely natural in MOND. It falls right out of the modified force law, and had been explicitly predicted over a decade before I struggled with the problem.
I scraped my jaw off the floor, determined to examine this crazy theory more closely. By the time I got back to my office, cognitive dissonance had already started to set it. Couldn’t be true. I had more pressing projects to complete, so I didn’t think about it again for many moons.
When I did, I decided I should start by reading the original MOND papers. I was delighted to find a long list of predictions, many of them specifically to do with surface brightness. We had just collected fresh data on LSB galaxies, which provided a new window on the low acceleration regime. I had the data to finally falsify this stupid theory.
Or so I thought. As I went through the list of predictions, my assumption that MOND had to be wrong was challenged by each item. It was barely an afternoon’s work: check, check, check. Everything I had struggled for months to understand in terms of dark matter tumbled straight out of MOND.
I was faced with a choice. I knew this would be an unpopular result. I could walk away and simply pretend I had never run across it. That’s certainly how it had been up until then: I had been blissfully unaware of MOND and its perniciously successful predictions. No need to admit otherwise.
Had I realized just how unpopular it would prove to be, maybe that would have been the wiser course. But even contemplating such a course felt criminal. I was put in mind of Paul Gerhardt’s admonition for intellectual honesty:
“When a man lies, he murders some part of the world.”
Ignoring what I had learned seemed tantamount to just that. So many predictions coming true couldn’t be an accident. There was a deep clue here; ignoring it wasn’t going to bring us closer to the truth. Actively denying it would be an act of wanton vandalism against the scientific method.
Still, I tried. I looked long and hard for reasons not to report what I had found. Surely there must be some reason this could not be so?
Indeed, the literature provided many papers that claimed to falsify MOND. To my shock, few withstood critical examination. Commonly a straw man representing MOND was falsified, not MOND itself. At a deeper level, it was implicitly assumed that any problem for MOND was an automatic victory for dark matter. This did not obviously follow, so I started re-doing the analyses for both dark matter and MOND. More often than not, I found either that the problems for MOND were greatly exaggerated, or that the genuinely problematic cases were a problem for both theories. Dark matter has more flexibility to explain outliers, but outliers happen in astronomy. All too often the temptation was to refuse to see the forest for a few trees.
The first MOND analysis of the classical dwarf spheroidals provides a good example. Completed only a few years before I encountered the problem, these were low surface brightness systems that were deep in the MOND regime. These were gas poor, pressure supported dSph galaxies, unlike my gas rich, rotating LSB galaxies, but the critical feature was low surface brightness. This was the most directly comparable result. Better yet, the study had been made by two brilliant scientists (Ortwin Gerhard & David Spergel) whom I admire enormously. Surely this work would explain how my result was a mere curiosity.
Indeed, reading their abstract, it was clear that MOND did not work for the dwarf spheroidals. Whew: LSB systems where it doesn’t work. All I had to do was figure out why, so I read the paper.
As I read beyond the abstract, the answer became less and less clear. The results were all over the map. Two dwarfs (Sculptor and Carina) seemed unobjectionable in MOND. Two dwarfs (Draco and Ursa Minor) had mass-to-light ratios that were too high for stars, even in MOND. That is, there still appeared to be a need for dark matter even after MOND had been applied. One the flip side, Fornax had a mass-to-light ratio that was too low for the old stellar populations assumed to dominate dwarf spheroidals. Results all over the map are par for the course in astronomy, especially for a pioneering attempt like this. What were the uncertainties?
Milgrom wrote a rebuttal. By then, there were measured velocity dispersions for two more dwarfs. Of these seven dwarfs, he found that
“within just the quoted errors on the velocity dispersions and the luminosities, the MOND M/L values for all seven dwarfs are perfectly consistent with stellar values, with no need for dark matter.”
Well, he would say that, wouldn’t he? I determined to repeat the analysis and error propagation.
The net result: they were both right. M/L was still too high for Draco and Ursa Minor, and still too low for Fornax. But this was only significant at the 2σ level, if that – hardly enough to condemn a theory. Carina, Leo I, Leo II, Sculptor, and Sextans all had fairly reasonable mass-to-light ratios. The voting is different now. Instead of going 2 for 5 as Gerhard & Spergel found, MOND was now 5 for 8. One could choose to obsess about the outliers, or one could choose to see a more positive pattern. Either a positive or a negative spin could be put on this result. But it was clearly more positive than the first attempt had indicated.
The mass estimator in MOND scales as the fourth power of velocity (or velocity dispersion in the case of isolated dSphs), so the too-high M*/L of Draco and Ursa Minor didn’t disturb me too much. A small overestimation of the velocity dispersion would lead to a large overestimation of the mass-to-light ratio. Just about every systematic uncertainty one can think of pushes in this direction, so it would be surprising if such an overestimate didn’t happen once in a while.
Given this, I was more concerned about the low M*/L of Fornax. That was weird.
Up until that point (1998), we had been assuming that the stars in dSphs were all old, like those in globular clusters. That corresponds to a high M*/L, maybe 3 in solar units in the V-band. Shortly after this time, people started to look closely at the stars in the classical dwarfs with the Hubble. Low and behold, the stars in Fornax were surprisingly young. That means a low M*/L, 1 or less. In retrospect, MOND was trying to tell us that: it returned a low M*/L for Fornax because the stars there are young. So what was taken to be a failing of the theory was actually a predictive success.
And Gee. This is a long post. There is a lot more to tell, but enough for now.
*I have a long memory, but it is not perfect. I doubt I have the exact wording right, but this does accurately capture the sentiment from the early ’80s when I was an undergraduate at MIT and Scott Tremaine was on the faculty there.
As soon as I wrote it, I realized that the title is much more general than anything that can be fit in a blog post. Bekenstein argued long ago that the missing mass problem should instead be called the acceleration discrepancy, because that’s what it is – a discrepancy that occurs in conventional dynamics at a particular acceleration scale. So in that sense, it is the entire history of dark matter. For that, I recommend the excellent book The Dark Matter Problem: A Historical Perspective by Bob Sanders.
Here I mean more specifically my own attempts to empirically constrain the relation between the mass discrepancy and acceleration. Milgrom introduced MOND in 1983, no doubt after a long period of development and refereeing. He anticipated essentially all of what I’m going to describe. But not everyone is eager to accept MOND as a new fundamental theory, and often suffer from a very human tendency to confuse fact and theory. So I have gone out of my way to demonstrate what is empirically true in the data – facts – irrespective of theoretical interpretation (MOND or otherwise).
What is empirically true, and now observationally established beyond a reasonable doubt, is that the mass discrepancy in rotating galaxies correlates with centripetal acceleration. The lower the acceleration, the more dark matter one appears to need. Or, as Bekenstein might have put it, the amplitude of the acceleration discrepancy grows as the acceleration itself declines.
Bob Sanders made the first empirical demonstration that I am aware of that the mass discrepancy correlates with acceleration. In a wide ranging and still relevant 1990 review, he showed that the amplitude of the mass discrepancy correlated with the acceleration at the last measured point of a rotation curve. It did not correlate with radius.
I was completely unaware of this when I became interested in the problem a few years later. I wound up reinventing the very same term – the mass discrepancy, which I defined as the ratio of dynamically measured mass to that visible in baryons: D = Mtot/Mbar. When there is no dark matter, Mtot = Mbar and D = 1.
My first demonstration of this effect was presented at a conference at Rutgers in 1998. This considered the mass discrepancy at every radius and every acceleration within all the galaxies that were available to me at that time. Though messy, as is often the case in extragalactic astronomy, the correlation was clear. Indeed, this was part of a broader review of galaxy formation; the title, abstract, and much of the substance remains relevant today.
I spent much of the following five years collecting more data, refining the analysis, and sweating the details of uncertainties and systematic instrumental effects. In 2004, I published an extended and improved version, now with over 5 dozen galaxies.
Here I’ve used a population synthesis model to estimate the mass-to-light ratio of the stars. This is the only unknown; everything else is measured. Note that the vast majority galaxies land on top of each other. There are a few that do not, as you can perceive in the parallel sets of points offset from the main body. But that happens in only a few cases, as expected – no population model is perfect. Indeed, this one was surprisingly good, as the vast majority of the individual galaxies are indistinguishable in the pile that defines the main relation.
I explored the how the estimation of the stellar mass-to-light ratio affected this mass discrepancy-acceleration relation in great detail in the 2004 paper. The details differ with the choice of estimator, but the bottom line was that the relation persisted for any plausible choice. The relation exists. It is an empirical fact.
At this juncture, further improvement was no longer limited by rotation curve data, which is what we had been working to expand through the early ’00s. Now it was the stellar mass. The measurement of stellar mass was based on optical measurements of the luminosity distribution of stars in galaxies. These are perfectly fine data, but it is hard to map the starlight that we measured to the stellar mass that we need for this relation. The population synthesis models were good, but they weren’t good enough to avoid the occasional outlier, as can be seen in the figure above.
One thing the models all agreed on (before they didn’t, then they did again) was that the near-infrared would provide a more robust way of mapping stellar mass than the optical bands we had been using up till then. This was the clear way forward, and perhaps the only hope for improving the data further. Fortunately, technology was keeping pace. Around this time, I became involved in helping the effort to develop the NEWFIRM near-infrared camera for the national observatories, and NASA had just launched the Spitzer space telescope. These were the right tools in the right place at the right time. Ultimately, the high accuracy of the deep images obtained from the dark of space by Spitzer at 3.6 microns were to prove most valuable.
Jim Schombert and I spent much of the following decade observing in the near-infrared. Many other observers were doing this as well, filling the Spitzer archive with useful data while we concentrated on our own list of low surface brightness galaxies. This paragraph cannot suffice to convey the long term effort and enormity of this program. But by the mid-teens, we had accumulated data for hundreds of galaxies, including all those for which we also had rotation curves and HI observations. The latter had been obtained over the course of decades by an entire independent community of radio observers, and represent an integrated effort that dwarfs our own.
On top of the observational effort, Jim had been busy building updated stellar population models. We have a sophisticated understanding of how stars work, but things can get complicated when you put billions of them together. Nevertheless, Jim’s work – and that of a number of independent workers – indicated that the relation between Spitzer’s 3.6 micron luminosity measurements and stellar mass should be remarkably simple – basically just a constant conversion factor for nearly all star forming galaxies like those in our sample.
Things came together when Federico Lelli joined Case Western as a postdoc in 2014. He had completed his Ph.D. in the rich tradition of radio astronomy, and was the perfect person to move the project forward. After a couple more years of effort, curating the rotation curve data and building mass models from the Spitzer data, we were in the position to build the relation for over a dozen dozen galaxies. With all the hard work done, making the plot was a matter of running a pre-prepared computer script.
Federico ran his script. The plot appeared on his screen. In a stunned voice, he called me into his office. We had expected an improvement with the Spitzer data – hence the decade of work – but we had also expected there to be a few outliers. There weren’t. Any.
All. the. galaxies. fell. right. on. top. of. each. other.
This plot differs from those above because we had decided to plot the measured acceleration against that predicted by the observed baryons so that the two axes would be independent. The discrepancy, defined as the ratio, depended on both. D is essentially the ratio of the y-axis to the x-axis of this last plot, dividing out the unity slope where D = 1.
This was one of the most satisfactory moments of my long career, in which I have been fortunate to have had many satisfactory moments. It is right up there with the eureka moment I had that finally broke the long-standing loggerhead about the role of selection effects in Freeman’s Law. (Young astronomers – never heard of Freeman’s Law? You’re welcome.) Or the epiphany that, gee, maybe what we’re calling dark matter could be a proxy for something deeper. It was also gratifying that it was quickly recognized as such, with many of the colleagues I first presented it to saying it was the highlight of the conference where it was first unveiled.
Regardless of the ultimate interpretation of the radial acceleration relation, it clearly exists in the data for rotating galaxies. The discrepancy appears at a characteristic acceleration scale, g† = 1.2 x 10-10 m/s/s. That number is in the data. Why? is a deeply profound question.
It isn’t just that the acceleration scale is somehow fundamental. The amplitude of the discrepancy depends systematically on the acceleration. Above the critical scale, all is well: no need for dark matter. Below it, the amplitude of the discrepancy – the amount of dark matter we infer – increases systematically. The lower the acceleration, the more dark matter one infers.
The relation for rotating galaxies has no detectable scatter – it is a near-perfect relation. Whether this persists, and holds for other systems, is the interesting outstanding question. It appears, for example, that dwarf spheroidal galaxies may follow a slightly different relation. However, the emphasis here is on slighlty. Very few of these data pass the same quality criteria that the SPARC data plotted above do. It’s like comparing mud pies with diamonds.
Whether the scatter in the radial acceleration relation is zero or merely very tiny is important. That’s the difference between a new fundamental force law (like MOND) and a merely spectacular galaxy scaling relation. For this reason, it seems to be controversial. It shouldn’t be: I was surprised at how tight the relation was myself. But I don’t get to report that there is lots of scatter when there isn’t. To do so would be profoundly unscientific, regardless of the wants of the crowd.
Of course, science is hard. If you don’t do everything right, from the measurements to the mass models to the stellar populations, you’ll find some scatter where perhaps there isn’t any. There are so many creative ways to screw up that I’m sure people will continue to find them. Myself, I prefer to look forward: I see no need to continuously re-establish what has been repeatedly demonstrated in the history briefly outlined above.