Leveling the Playing Field of Dwarf Galaxy Kinematics

Leveling the Playing Field of Dwarf Galaxy Kinematics

We have a new paper on the arXiv. This is a straightforward empiricist’s paper that provides a reality check on the calibration of the Baryonic Tully-Fisher relation (BTFR) and the distance scale using well-known Local Group galaxies. It also connects observable velocity measures in rotating and pressure supported dwarf galaxies: the flat rotation speed of disks is basically twice the line-of-sight velocity dispersion of dwarf spheroidals.

First, the reality check. Previously we calibrated the BTFR using galaxies with distances measured by reliable methods like Cepheids and the Tip of the Red Giant Branch (TRGB) method. Application of this calibration obtains the Hubble constant H0 = 75.1 +/- 2.3 km/s/Mpc, which is consistent with other local measurements but in tension with the value obtained from fitting the Planck CMB data. All of the calibrator galaxies are nearby (most are within 10 Mpc, which is close by extragalactic standards), but none of them are in the Local Group (galaxies within ~1 Mpc like Andromeda and M33). The distances to Local Group galaxies are pretty well known at this point, so if we got the BTFR calibration right, they had better fall right on it.

They do. From high to low mass, the circles in the plot below are Andromeda, the Milky Way, M33, the LMC, SMC, and NGC 6822. All fall on the externally calibrated BTFR, which extrapolates well to still lower mass dwarf galaxies like WLM, DDO 210, and DDO 216 (and even Leo P, the smallest rotating galaxy known).

The BTFR for Local Group galaxies. Rotationally supported galaxies with measured flat rotation velocities (circles) are in good agreement with the BTFR calibrated independently with fifty galaxies external to the Local Group (solid line; the dashed line is the extrapolation below the lowest mass calibrator). Pressure supported dwarfs (squares) are plotted with their observed velocity dispersions in lieu of a flat rotation speed. Filled squares are color coded by their proximity to M31 (red) or the Milky Way (orange) or neither (green). Open squares are dwarfs whose velocity dispersions may not be reliable tracers of their equilibrium gravitational potential (see McGaugh & Wolf).

The agreement of the BTFR with Local Group rotators is so good that it is tempting to say that there is no way to reconcile this with a low Hubble constant of 67 km/s/kpc. Doing so would require all of these galaxies to be more distant by the factor 75/67 = 1.11. That doesn’t sound too bad, but applying it means that Andromeda would have to be 875 kpc distant rather than the 785 ± 25 adopted by the source of our M31 data, Chemin et al. There is a long history of distance measurements to M31 so many opinions can be found, but it isn’t just M31 – all of the Local Group galaxy distances would have to be off by this factor. This seems unlikely to the point of absurdity, but as colleague and collaborator Jim Schombert reminds me, we’ve seen such things before with the distance scale.

So that’s the reality check: the BTFR works as it should in the Local Group – at least for the rotating galaxies (circles in the plot above). What about the pressure supported galaxies (the squares)?

Galaxies come in two basic kinematic types: rotating disks or pressure supported ellipticals. Disks are generally thin, with most of the stars orbiting in the same direction in the same plane on nearly circular orbits. Ellipticals are quasi-spherical blobs of stars on rather eccentric orbits oriented all over the place. This is an oversimplification, of course; real galaxies have a mix of orbits, but usually most of the kinetic energy is invested in one or the other, rotation or random motions. We can measure the speeds of stars and gas in these configurations, which provides information about the kinetic energy and corresponding gravitational binding energy. That’s how we get at the gravitational potential and infer the need for dark matter – or at least, the existence of acceleration discrepancies.

The elliptical galaxy M105 (left) and the spiral galaxy NGC 628 (right). Typical orbits are illustrated by the colored lines: predominantly radial (highly eccentric in & out) orbits in the pressure supported elliptical; more nearly circular (low eccentricity, round & round) orbits in rotationally supported disks. (Galaxy images are based on photographic data obtained using the Oschin Schmidt Telescope on Palomar Mountain as part of the Palomar Observatory Sky Survey-II. Digital versions of the scanned photographic plates were obtained for reproduction from the Digitized Sky Survey.)

We would like to have full 6D phase space information for all stars – their location in 3D configuration space and their momentum in each direction. In practice, usually all we can measure is the Doppler line-of-sight speed. For rotating galaxies, we can [attempt to] correct the observed velocity for the inclination of the disk, and get an idea or the in-plane rotation speed. For ellipticals, we get the velocity dispersion along the line of sight in whatever orientation we happen to get. If the orbits are isotropic, then one direction of view is as good as any other. In general that need not be the case, but it is hard to constrain the anisotropy of orbits, so usually we assume isotropy and call it Close Enough for Astronomy.

For isotropic orbits, the velocity dispersion σ* is related to the circular velocity Vc of a test particle by Vc = √3 σ*. The square root of three appears because the kinetic energy of isotropic orbits is evenly divided among the three cardinal directions. These quantities depend in a straightforward way on the gravitational potential, which can be computed for the stuff we can see but not for that which we can’t. The stars tend to dominate the potential at small radii in bright galaxies. This is a complication we’ll ignore here by focusing on the outskirts of rotating galaxies where rotation curves are flat and dwarf spheroidals where stars never dominate. In both cases, we are in a limit where we can neglect the details of the stellar distribution: only the dark mass matters, or, in the case of MOND, only the total normal mass but not its detailed distribution (which does matter for the shape of a rotation curve, but not its flat amplitude).

Rather than worry about theory or the gory details of phase space, let’s just ask the data. How do we compare apples with apples? What is the factor βc that makes Vo = βc σ* an equality?

One notices that the data for pressure supported dwarfs nicely parallels that for rotating galaxies. We estimate βc by finding the shift that puts the dwarf spheroidals on the BTFR (on average). We only do this for the dwarfs that are not obviously affected by tidal effects whose velocity dispersions may not reflect the equilibrium gravitational potential. I have discussed this at great length in McGaugh & Wolf, so I refer the reader eager for more details there. Here I merely note that the exercise is meaningful only for those dwarfs that parallel the BTFR; it can’t apply to those that don’t regardless of the reason.

That caveat aside, this works quite well for βc = 2.

The BTFR plane with the outer velocity of dwarf spheroidals taken to be Vo = 2σ.

The numerically inclined reader will note that 2 > √3. One would expect the latter for isotropic orbits, which we implicitly average over by using the data for all these dwarfs together. So the likely explanation for the larger values of βc is that the outer velocities of rotation curves are measured at a larger radii than the velocity dispersions of dwarf spheroidals. The value of βc is accounts for the different effective radii of measurement as illustrated by the rotation curves below.

The rotation curve of the gas rich Local Group dIrr WLM (left, Iorio et al.) and the equivalent circular velocity curve of the pressure supported dSph Leo I (right). The filled point represents the luminosity weighted circular speed Vc = √3 σ* at the 3D half light radius where variation due to anisotropy is minimized (Wolf et al). The dotted lines illustrate how the uncertainty grows away from this point due to the compounding effects of anisotropy. The outer circular speed Vo is marked for both. Note that Vo > √3 σ* simply because of the shape of the circular velocity curve, which has not yet reached the flat plateau where the velocity dispersion is measured.

Once said, this seems obvious. The velocity dispersions of dwarf spheroidals are measured by observing the Doppler shifts of individual member stars. This measurement is necessarily made where the stars are. In contrast, the flat portions of rotation curves are traced by atomic gas at radii that typically extend beyond the edge of the optical disk. So we should expect a difference; βc = 2 quantifies it.

One small caveat is that in order to compare apples with apples, we have to adopt a mass-to-light ratio for the stars in dwarfs spheroidals in order to compare them with the combined mass of stars and gas in rotating galaxies. Indeed, the dwarf irregulars that overlap with the dwarf spheroidals in mass are made more of gas than stars, so there is always the risk of some systematic difference between the two mass scales. In the paper, we quantify the variation of βc with the choice of M*/L. If you’re interested in that level of detail, you should read the paper.

I should also note that MOND predicts βc = 2.12. Taken at face value, this implies that MOND prefers an average mass-to-light ratio slightly higher than what we assumed. This is well within the uncertainties, and we already know that MOND is the only theory capable of predicting the velocity dispersions of dwarf spheroidals in advance. We can always explain this after the fact with dark matter, which is what people generally do, often in apparent ignorance that MOND also correctly predicts which dwarfs they’ll have to invoke tidal disruption for. How such models can be considered satisfactory is quite beyond my capacity, but it does save one from the pain of having to critically reassess one’s belief system.

That’s all beyond the scope of the current paper. Here we just provide a nifty empirical result. If you want to make an apples-to-apples comparison of dwarf spheroidals with rotating dwarf irregulars, you will do well to assume Vo = 2σ*.

The neutrino mass hierarchy and cosmological limits on their mass

The neutrino mass hierarchy and cosmological limits on their mass

I’ve been busy. There is a lot I’d like to say here, but I’ve been writing the actual science papers. Can’t keep up with myself, let alone everything else. I am prompted to write here now because of a small rant by Maury Goodman in the neutrino newsletter he occasionally sends out. It resonated with me.

First, some context. Neutrinos are particles of the Standard Model of particle physics. They come in three families with corresponding leptons: the electron (νe), muon (νμ), and tau (ντ) neutrinos. Neutrinos only interact through the weak nuclear force, feeling neither the strong force nor electromagnetism. This makes them “ghostly” particles. Their immunity to these forces means they have such a low cross-section for interacting with other matter that they mostly don’t. Zillions are created every second by the nuclear reactions in the sun, and the vast majority of them breeze right through the Earth as if it were no more than a pane of glass. Their existence was first inferred indirectly from the apparent failure of some nuclear decays to conserve energy – the sum of the products seemed less than that initially present because the neutrinos were running off with mass-energy without telling anyone about it by interacting with detectors of the time.

Clever people did devise ways to detect neutrinos, if only at the rate of one in a zillion. Neutrinos are the template for WIMP dark matter, which is imagined to be some particle from beyond the Standard Model that is more massive than neutrinos but similarly interact only through the weak force. That’s how laboratory experiments search for them.

While a great deal of effort has been invested in searching for WIMPs, so far the most interesting new physics is in the neutrinos themselves. They move at practically the speed of light, and for a long time it was believed that like photons, they were pure energy with zero rest mass. Indeed, I’m old enough to have been taught that neutrinos must have zero mass; it would screw everything up if they didn’t. This attitude is summed up by an anecdote about the late, great author of the Standard Model, Steven Weinberg:

A colleague at UT once asked Weinberg if there was neutrino mass in the Standard Model. He told her “not in my Standard Model.”

Steven Weinberg, as related by Maury Goodman

As I’ve related before, In 1984 I heard a talk by Hans Bethe in which he made the case for neutrino dark matter. I was flabbergasted – I had just learned neutrinos couldn’t possibly have mass! But, as he pointed out, there were a lot of them, so it wouldn’t take much – a tiny mass each, well below the experimental limits that existed at the time – and that would suffice to make all the dark matter. So, getting over the theoretical impossibility of this hypothesis, I reckoned that if it turned out that neutrinos did indeed have mass, then surely that would be the solution to the dark matter problem.

Wrong and wrong. Neutrinos do have mass, but not enough to explain the missing mass problem. At least not that of the whole universe, as the modern estimate is that they might have a mass density that is somewhat shy of that of ordinary baryons (see below). They are too lightweight to stick to individual galaxies, which they would boil right out of: even with lots of cold dark matter, there isn’t enough mass to gravitationally bind these relativistic particles. It seems unlikely, but it is at least conceivable that initially fast-moving but heavy neutrinos might by now have slowed down enough to stick to and make up part of some massive clusters of galaxies. While interesting, that is a very far cry from being the dark matter.

We know neutrinos have mass because they have been observed to transition between flavors as they traverse space. This can only happen if there are different quantum states for them to transition between. They can’t all just be the same zero-mass photon-like entity, at least two of them need to have some mass to make for split quantum levels so there is something to oscillate between.

Here’s where it gets really weird. Neutrino mass states do not correspond uniquely to neutrino flavors. We’re used to thinking of particles as having a mass: a proton weighs 0.938272 GeV; a neutron 0.939565 GeV. (The neutron being only 0.1% heavier than the proton is itself pretty weird; this comes up again later in the context of neutrinos if I remember to bring it up.) No, there are three separate mass states, each of which are fractional probabilistic combinations of the three neutrino flavors. This sounds completely insane, so let’s turn to an illustration:

Neutrino mass states, from Adrián-Martínez et al (2016). There are two possible mass hierarchies for neutrinos, the so-called “normal” (left) and “inverted” (right) hierarchies. There are three mass states – the different bars – that are cleverly named ν1, ν2, and, you guessed it, ν3. The separation between these states is measured from oscillations in solar neutrinos (sol) or atmospheric neutrinos (atm) spawned by cosmic rays. The mass states do not correspond uniquely to neutrino flavors (νe, νμ, and ντ); instead, each mass state is made up of a combination of the three flavors as illustrated by the colored portions of the bars.

So we have three flavors of neutrino, νe, νμ, and ντ, that mix and match to make up the three mass eigenstates, ν1, ν2, and ν3. We would like to know what the masses, m1, m2, and m3, of the mass eignestates are. We don’t. All that we glean from the solar and atmospheric oscillation data is that there is a transition between these states with a corresponding squared mass difference (e.g., Δm2sol = m22-m12). These are now well measured by astronomical standards, with Δm2sol = 0.000075 eV2 and Δm2atm = 0.0025 eV2 depending a little bit on which hierarchy is correct.

OK, so now we guess. If the hierarchy is normal and m1 = 0, then m2 = √Δm2sol = 0.0087 eV and m3 = √(Δm2atm+m22) = 0.0507 eV. The first eigenstate mass need not be zero, though I’ve often heard it argued that it should be that or close to it, as the “natural” scale is m ~ √Δm2. So maybe we have something like m1 = 0.01 eV and m2 = 0.013 eV in sorta the same ballpark.

Maybe, but I am underwhelmed by the naturalness of this argument. If we apply this reasoning to the proton and neutron (Ha! I remembered!), then the mass of the proton should be of order 1 MeV not 1 GeV. That’d be interesting because the proton, neutron, and electron would all have a mass within a factor of two of each other (the electron mass is 0.511 MeV). That almost sounds natural. It’d also make for some very different atomic physics, as we’d now have hydrogen atoms that are quasi-binary systems rather than a lightweight electron orbiting a heavy proton. That might make for an interesting universe, but it wouldn’t be the one we live in.

One very useful result of assuming m1 = 0 is that it provides a hard lower limit on the sum of the neutrino masses: ∑mi = m1 + m2 + m3 > 0.059 eV. Here the hierarchy matters, with the lower limit becoming about 0.1 eV in the inverted hierarchy. So we know neutrinos weigh at least that much, maybe more.

There are of course efforts to measure the neutrino mass directly. There is a giant experiment called Katrin dedicated to this. It is challenging to measure a mass this close to zero, so all we have so far are upper limits. The first measurement from Katrin placed the 90% confidence limit < 1.1 eV. That’s about a factor of 20 larger than the lower limit, so in there somewhere.

Katrin on the move.

There is a famous result in cosmology concerning the sum of neutrino masses. Particles have a relic abundance that follows from thermodynamics. The cosmic microwave background is the thermal relic of photons. So too there should be a thermal relic of cosmic neutrinos with slightly lower temperature than the photon field. One can work out the relic abundance, so if one knows their mass, then their cosmic mass density is

Ωνh2 = ∑mi/(93.5 eV)

where h is the Hubble constant in units of 100 km/s/Mpc (e.g., equation 9.31 in my edition of Peacock’s text Cosmological Physics). For the cosmologists’ favorite (but not obviously correct) h=0.67, the lower limit on the neutrino mass translates to a mass density Ων > 0.0014, rather less than the corresponding baryon density, Ωb = 0.049. The experimental upper limit from Katrin yields Ων < 0.026, still a factor of two less than the baryons but in the same ballpark. These are nowhere near the ΩCDM ~ 0.25 needed for cosmic dark matter.

Nevertheless, the neutrino mass potentially plays an important role in structure formation. Where cold dark matter (CDM) clumps easily to facilitate the formation of structure, neutrinos retard the process. They start out relativistic in the early universe, becoming non-relativistic (slow moving) at some redshift that depends on their mass. Early on, the represent a fast-moving component of gravitating mass that counteracts the slow moving CDM. The nascent clumps formed by CDM can capture baryons (this is how galaxies are thought to form), but they are not even speed bumps to the relativistic neutrinos. If the latter have too large a mass, they pull lumps apart rather then help them grow larger. The higher the neutrino mass, the more damage they do. This in turn impacts the shape of the power spectrum by imprinting a free-streaming scale.

The power spectrum is a key measurement fit by ΛCDM. Indeed, it is arguably its crowning glory. The power spectrum is well fit by ΛCDM assuming zero neutrino mass. If Ων gets too big, it becomes a serious problem.

Consequently, cosmological observations place an indirect limit on the neutrino mass. There are a number of important assumptions that go into this limit, not all of which I am inclined to grant – most especially, the existence of CDM. But that makes it an important test, as the experimentally measured neutrino mass (whenever that happens) better not exceed the cosmological limit. If it does, that falsifies the cosmic structure formation theory based on cold dark matter.

The cosmological limit on neutrino mass obtained assuming ΛCDM structure formation is persistently an order of magnitude tighter than the experimental upper limit. For example, the Dark Energy Survey obtains ∑mi < 0.13 eV at 95% confidence. This is similar to other previous results, and only a factor of two more than the lower limit from neutrino oscillations. The window of allowed space is getting rather narrow. Indeed, it is already close to ruling out the inverted hierarchy for which ∑mi > 0.1 eV – or the assumptions on which the cosmological limit is made.

This brings us finally to Dr. Goodman’s rant, which I quote directly:

In the normal (inverted) mass order, s=m1+m2+m3 > 59 (100) meV. If as DES says, s < 130 meV, degenerate solutions are impossible. But DES “…model(s) massive neutrinos as three degenerate species of equal mass.” It’s been 34 years since we suspected neutrino masses were different and 23 years since that was accepted. Why don’t cosmology “measurements” of neutrino parameters do it right?

Maury Goodman

Here, s = ∑mi and of course 1 eV = 1000 meV. Degenerate solutions are those in which m1=m2=m3. When the absolute mass scale is large – say the neutrino mass were a huge (for it) 100 eV, then the sub-eV splittings between the mass levels illustrated above would be negligible and it would be fair to treat “massive neutrinos as three degenerate species of equal mass.” This is no longer the case when the implied upper limit on the mass is small; there is a clear difference between m1 and m2 and m3.

So why don’t cosmologists do this right? Why do they persist in pretending that m1=m2=m3?

Far be it from me to cut those guys slack, but I suspect there are two answers. One, it probably doesn’t matter (much), and two, habit. By habit, I mean that the tools used to compute the power spectrum were written at a time when degenerate species of equal mass was a perfectly safe assumption. Indeed, in those days, neutrinos were thought not to matter much at all to cosmological structure formation, so their inclusion was admirably forward looking – or, I suspect, a nerdy indulgence: “neutrinos probably don’t matter but I know how to code for them so I’ll do it by making the simplifying assumption that m1=m2=m3.”

So how much does it matter? I don’t know without editing & running the code (e.g, CAMB or CMBEASY), which would be a great project for a grad student if it hasn’t already been done. Nevertheless, the difference between neutrino mass states and the degenerate assumption is presumably small for small differences in mass. To get an idea that is human-friendly, let’s think about the redshift at which neutrinos become non-relativistic. OK, maybe that doesn’t sound too friendly, but it is less likely to make your eyes cross than a discussion of power spectra Fourier transforms and free-streaming wave numbers.

Neutrinos are very lightweight, so start out as relativistic particles in the early universe (high redshift z). As the universe expands it cools, and the neutrinos slow down. At some point, they transition from behaving like a photon field to a non-relativistic gas of particles. This happens at

1+znr ≈ 1987 mν/(1 eV)

(eq. 4 of Agarwal & Feldman 2012; they also discuss the free-streaming scale and power spectra for those of you who want to get into it). For a 0.5 eV neutrino that is comfortably acceptable to the current experimental upper limit, znr = 992. This is right around recombination, and would mess everything up bigly – hence the cosmological limit being much stricter. For a degenerate neutrino of 0.13 eV, znr = 257. So one way to think about the cosmological limit is that we need to delay the impact of neutrinos on the power spectrum for at least this long in order to maintain the good fit to the data.

How late can the impact of neutrinos be delayed? For the minimum masses m1 = 0, m2 = 0.0087, m3 = 0.0507 eV, zero mass neutrinos always remain relativistic, but z2 = 16 and z3 = 100. These redshifts are readily distinguishable, so maybe Dr. Goodman has a valid point. Well, he definitely has a valid point, but these redshifts aren’t probed by the currently available data, so cosmologists probably figure it is OK to stick to degenerate neutrino masses for now.

The redshifts z2 = 16 and z3 = 100 are coincident with other important events in cosmic history, cosmic dawn and the dark ages, so it is worth considering the potential impact of neutrinos on the power spectra predicted for 21 cm absorption at those redshifts. There are experiments working to detect this, but measurement of the power spectrum is still a ways off. I am not aware of any theoretical consideration of this topic, so let’s consult an expert. Thanks to Avi Loeb for pointing out these (and a lot more!) references on short notice: Pritchard & Pierpaoli (2008), Villaescusa-Navarro et al. (2015), Obuljen et al. (2018). That’s a lot to process, and more than I’m willing to digest on the fly. But it looks like at least some cosmologists are grappling with the issue Dr. Goodman raises.

Any way we slice it, it looks like there are things still to learn. The direct laboratory measurement of the neutrino mass is not guaranteed to be less than the upper limit from cosmology. It would be surprising, but that would make matters a lot more interesting.

Divergence

Divergence

I read somewhere – I don’t think it was Kuhn himself, but someone analyzing Kuhn – that there came a point in the history of science where there was a divergence between scientists, with different scientists disagreeing about what counts as a theory, what counts as a test of a theory, what even counts as evidence. We have reached that point with the mass discrepancy problem.

For many years, I worried that if the field ever caught up with me, it would zoom past. That hasn’t happened. Instead, it has diverged towards a place that I barely recognize as science. It looks more like the Matrix – a simulation – that is increasingly sophisticated yet self-contained, making only parsimonious contact with observational reality and unable to make predictions that apply to real objects. Scaling relations and statistical properties, sure. Actual galaxies with NGC numbers, not so much. That, to me, is not science.

I have found it increasingly difficult to communicate across the gap built on presumptions buried so deep that they cannot be questioned. One obvious one is the existence of dark matter. This has been fueled by cosmologists who take it for granted and particle physicists eager to discover it who repeat “we know dark matter exists*; we just need to find it” like a religious mantra. This is now ingrained so deeply that it has become difficult to convey even the simple concept that what we call “dark matter” is really just evidence of a discrepancy: we do not know whether it is literally some kind of invisible mass, or a breakdown of the equations that lead us to infer invisible mass.

I try to look at all sides of a problem. I can say nice things about dark matter (and cosmology); I can point out problems with it. I can say nice things about MOND; I can point out problems with it. The more common approach is to presume that any failing of MOND is an automatic win for dark matter. This is a simple-minded logical fallacy: just because MOND gets something wrong doesn’t mean dark matter gets it right. Indeed, my experience has been that cases that don’t make any sense in MOND don’t make any sense in terms of dark matter either. Nevertheless, this attitude persists.

I made this flowchart as a joke in 2012, but it persists in being an uncomfortably fair depiction of how many people who work on dark matter approach the problem.

I don’t know what is right, but I’m pretty sure this attitude is wrong. Indeed, it empowers a form of magical thinking: dark matter has to be correct, so any data that appear to contradict it are either wrong, or can be explained with feedback. Indeed, the usual trajectory has been denial first (that can’t be true!) and explanation later (we knew it all along!) This attitude is an existential threat to the scientific method, and I am despondent in part because I worry we are slipping into a post-scientific reality, where even scientists are little more than priests of a cold, dark religion.


*If we’re sure dark matter exists, it is not obvious that we need to be doing expensive experiments to find it.

Why bother?

The RAR extended by weak lensing

The RAR extended by weak lensing

Last time, I expressed despondency about the lack of progress due to attitudes that in many ways remain firmly entrenched in the 1980s. Recently a nice result has appeared, so maybe there is some hope.

The radial acceleration relation (RAR) measured in rotationally supported galaxies extends down to an observed acceleration of about gobs = 10-11 m/s/s, about one part in 1000000000000 of the acceleration we feel here on the surface of the Earth. In some extreme dwarfs, we get down below 10-12 m/s/s. But accelerations this low are hard to find except in the depths of intergalactic space.

Weak lensing data

Brouwer et al have obtained a new constraint down to 10-12.5 m/s/s using weak gravitational lensing. This technique empowers one to probe the gravitational potential of massive galaxies out to nearly 1 Mpc. (The bulk of the luminous mass is typically confined within a few kpc.) To do this, one looks for the net statistical distortion in galaxies behind a lensing mass like a giant elliptical galaxy. I always found this approach a little scary, because you can’t see the signal directly with your eyes the way you can the velocities in a galaxy measured with a long slit spectrograph. Moreover, one has to bin and stack the data, so the result isn’t for an individual galaxy, but rather the average of galaxies within the bin, however defined. There are further technical issues that makes this challenging, but it’s what one has to do to get farther out.

Doing all that, Brouwer et al obtained this RAR:

The radial acceleration relation from weak lensing measured by Brouwer et al (2021). The red squares and bluescale at the top right are the RAR from rotating galaxies (McGaugh et al 2016). The blue, black, and orange points are the new weak lensing results.

To parse a few of the details: there are two basic results here, one from the GAMA survey (the blue points) and one from KiDS. KiDS is larger so has smaller formal errors, but relies on photometric redshifts (which uses lots of colors to guess the best match redshift). That’s probably OK in a statistical sense, but they are not as accurate as the spectroscopic redshifts measured for GAMA. There is a lot of structure in redshift space that gets washed out by photometric redshift estimates. The fact that the two basically agree hopefully means that this doesn’t matter here.

There are two versions of the KiDS data, one using just the stellar mass to estimate gbar, and another that includes an estimate of the coronal gas mass. Many galaxies are surrounded by a hot corona of gas. This is negligible at small radii where the stars dominate, but becomes progressively more important as part of the baryonic mass budget as one moves out. How important? Hard to say. But it certainly matters on scales of a few hundred kpc (this is the CGM in the baryon pie chart, which suggests roughly equal mass in stars (all within a few tens of kpc) and hot coronal gas (mostly out beyond 100 kpc). This corresponds to the orange points; the black points are what happens if we neglect this component (which certainly isn’t zero). So in there somewhere – this seems to be the dominant systematic uncertainty.

Getting past these pesky detail, this result is cool on many levels. First, the RAR appears to persist as a relation. That needn’t have happened. Second, it extends the RAR by a couple of decades to much lower accelerations. Third, it applies to non-rotating as well as rotationally supported galaxies (more on that in a bit). Fourth, the data at very low accelerations follow a straight line with a slope of about 1/2 in this log-log plot. That means gobs ~ gbar1/2. That provides a test of theory.

What does it mean?

Empirically, this is a confirmation that a known if widely unexpected relation extends further than previously known. That’s pretty neat in its own right, without any theoretical baggage. We used to be able to appreciate empirical relations better (e.g, the stellar main sequence!) before we understood what they meant. Now we seem to put the cart (theory) before the horse (data). That said, we do want to use data to test theories. Usually I discuss dark matter first, but that is complicated, so let’s start with MOND.

Test of MOND

MOND predicts what we see.

I am tempted to leave it at that, because it’s really that simple. But experience has taught me that no result is so obvious that someone won’t claim exactly the opposite, so let’s explore it a bit more.

There are three tests: whether the relation (i) exists, (ii) has the right slope, and (iii) has the right normalization. Tests (i) and (ii) are an immediate pass. It also looks like (iii) is very nearly correct, but it depends in detail on the baryonic mass-to-light ratio – that of the stars plus any coronal gas.

MOND is represented by the grey line that’s hard to see, but goes through the data at both high and low acceleration. At high accelerations, this particular line is a fitting function I chose for convenience. There’s nothing special about it, nor is it even specific to MOND. That was the point of our 2016 RAR paper: this relation exists in the data whether it is due to MOND or not. Conceivably, the RAR might be a relation that only applies to rotating galaxies for some reason that isn’t MOND. That’s hard to sustain, since the data look like MOND – so much so that the two are impossible to distinguish in this plane.

In terms of MOND, the RAR traces the interpolation function that quantifies the transition from the Newtonian regime where gobs = gbar to the deep MOND regime where gobs ~ gbar1/2. MOND does not specify the precise form of the interpolation function, just the asymptotic limits. The data trace that the transition, providing an empirical assessment of the shape of the interpolation function around the acceleration scale a0. That’s interesting and will hopefully inform further theory development, but it is not critical to testing MOND.

What MOND does very explicitly predict is the asymptotic behavior gobs ~ gbar1/2 in the deep MOND regime of low accelerations (gobs << a0). That the lensing data are well into this regime makes them an excellent test of this strong prediction of MOND. It passes with flying colors: the data have precisely the slope anticipated by Milgrom nearly 40 years ago.

This didn’t have to happen. All sorts of other things might have happened. Indeed, as we discussed in Lelli et al (2017), there were some hints that the relation flattened, saturating at a constant gobs around 10-11 m/s/s. I was never convinced that this was real, as it only appears in the least certain data, and there were already some weak lensing data to lower accelerations.

Milgrom (2013) analyzed weak lensing data that were available then, obtaining this figure:

Velocity dispersion-luminosity relation obtained from weak lensing data by Milgrom (2013). Lines are the expectation of MOND for mass-to-light ratios ranging from 1 to 6 in the r’-band, as labeled. The sample is split into red (early type, elliptical) and blue (late type, spiral) galaxies. The early types have a systematically higher M/L, as expected for their older stellar populations.

The new data corroborate this result. Here is a similar figure from Brouwer et al:

The RAR from weak lensing for galaxies split by Sesic index (left) and color (right).

Just looking at these figures, one can see the same type-dependent effect found by Milgrom. However, there is an important difference: Milgrom’s plot leaves the unknown mass-to-light ratio as a free parameter, while the new plot has an estimate of this built-in. So if the adopted M/L is correct, then the red and blue galaxies form parallel RARs that are almost but not quite exactly the same. That would not be consistent with MOND, which should place everything on the same relation. However, this difference is well within the uncertainty of the baryonic mass estimate – not just the M/L of the stars, but also the coronal gas content (i.e., the black vs. orange points in the first plot). MOND predicted this behavior well in advance of the observation, so one would have to bend over backwards, rub one’s belly, and simultaneously punch oneself in the face to portray this as anything short of a fantastic success of MOND.

The data! Look at the data!

I say that because I’m sure people will line up to punch themselves in the face in exactly this fashion*. One of the things that persuades me to suspect that there might be something to MOND is the lengths to which people will go to deny even its most obvious successes. At the same time, they are more than willing to cut any amount of slack necessary to save LCDM. An example is provided by Ludlow et al., who claim to explain the RAR ‘naturally’ from simulations – provided they spot themselves a magic factor of two in the stellar mass-to-light ratio. If it were natural, they wouldn’t need that arbitrary factor. By the same token, if you recognize that you might have been that far off about M*/L, you have to extend that same grace to MOND as you do to LCDM. That’s a basic tenet of objectivity, which used to be a value in science. It doesn’t look like a correction as large as a factor of two is necessary here given the uncertainty in the coronal gas. So, preemptively: Get a grip, people.

MOND predicts what we see. No other theory beat it to the punch. The best one can hope to do is to match its success after the fact by coming up with some other theory that looks just like MOND.

Test of LCDM

In order to test LCDM, we have to agree what LCDM predicts. That agreement is lacking. There is no clear prediction. This complicates the discussion, as the best one can hope to do is give a thorough discussion of all the possibilities that people have so far considered, which differ in important ways. That exercise is necessarily incomplete – people can always come up with new and different ideas for how to explain what they didn’t predict. I’ve been down the road of being thorough many times, which gets so complicated that no one reads it. So I will not attempt to be thorough here, and only explore enough examples to give a picture of where we’re currently at.

The tests are the same as above: should the relation (i) exist? (ii) have the observed slope? and (iii) normalization?

The first problem for LCDM is that the relation exists (i). There is no reason to expect this relation to exist. There was (and in some corners, continues to be) a lot of denial that the RAR even exists, because it shouldn’t. It does, and it looks just like what MOND predicts. LCDM is not MOND, and did not anticipate this behavior because there is no reason to do so.

If we persist past this point – and it is not obvious that we should – then we may say, OK, here’s this unexpected relation; how do we explain it? For starters, we do have a prediction for the density profiles of dark matter halos; these fall off as r-3. That translates to some slope in the RAR plane, but not a unique relation, as the normalization can and should be different for each halo. But it’s not even the right slope. The observed slope corresponds to a logarithmic potential in which the density profile falls off as r-2. That’s what is required to give a flat rotation curve in Newtonian dynamics, which is why the psedoisothermal halo was the standard model before simulations gave us the NFW halo with its r-3 fall off. The lensing data are like a flat rotation curve that extends indefinitely far out; they are not like an NFW halo.

That’s just stating the obvious. To do more requires building a model. Here is an example from Oman et al. of a model that follows the logic I just outlined, adding some necessary and reasonable assumptions about the baryons:

The “slight offset” from the observed RAR mentioned in the caption is the factor of two in stellar mass they spotted themselves in Ludlow et al. (2017).

The model is the orange line. It deviates from the black line that is the prediction of MOND. The data look like MOND, not like the orange line.

One can of course build other models. Brouwer et al discuss some. I will not explore these in detail, and only note that the models are not consistent, so there is no clear prediction from LCDM. To explore just one a little further, this figure appears at the very end of their paper, in appendix C:

The orange line in this case is some extrapolation of the model of Navarro et al. (2017).** This also does not work, though it doesn’t fail by as much as the model of Oman et al. I don’t understand how they make the extrapolation here, as a major prediction of Navarro et al. was that gobs would saturate at 10-11 ms/s/s; the orange line should flatten out near the middle of this plot. Indeed, they argued that we would never observe any lower accelerations, and that

“extending observations to radii well beyond the inner halo regions should lead to systematic deviations from the MDAR.”

– Navarro et al (2017)

This is a reasonable prediction for LCDM, but it isn’t what happened – the RAR continues as predicted by MOND. (The MDAR is equivalent to the RAR).

The astute reader may notice that many of these theorists are frequently coauthors, so you might expect they’d come up with a self-consistent model and stick to it. Unfortunately, consistency is not a hobgoblin that afflicts galaxy formation theory, and there are as many predictions as there are theorists (more for the prolific ones). They’re all over the map – which is the problem. LCDM makes no prediction to which everyone agrees. This makes it impossible to test the theory. If one model is wrong, that is just because that particular model is wrong, not because the theory is under threat. The theory is never under threat as there always seems to be another modeler who will claim success where others fail, whether they genuinely succeed or not. That they claim success is all that is required. Cognitive dissonance then takes over, people believe what they want to hear, and all anomalies are forgiven and forgotten. There never seems to be a proper prior that everyone would agree falsifies the theory if it fails. Galaxy formation in LCDM has become epicycles on steroids.

Whither now?

I have no idea. Continue to improve the data, of course. But the more important thing that needs to happen is a change in attitude. The attitude is that LCDM as a cosmology must be right so the mass discrepancy must be caused by non-baryonic dark matter so any observation like this must have a conventional explanation, no matter how absurd and convoluted. We’ve been stuck in this rut since before we even put the L in CDM. We refuse to consider alternatives so long as the standard model has not been falsified, but I don’t see how it can be falsified to the satisfaction of all – there’s always a caveat, a rub, some out that we’re willing to accept uncritically, no matter how silly. So in the rut we remain.

A priori predictions are an important part of the scientific method because they can’t be fudged. On the rare occasions when they come true, it is supposed to make us take note – even change our minds. These lensing results are just another of many previous corroborations of a priori predictions by MOND. What people do with that knowledge – build on it, choose to ignore it, or rant in denial – is up to them.


*Bertolt Brecht mocked this attitude amongst the Aristotelian philosophers in his play about Galileo, noting how they were eager to criticize the new dynamics if the heavier rock beat the lighter rock to the ground by so much as a centimeter in the Leaning Tower of Pisa experiment while turning a blind eye to their own prediction being off by a hundred meters.

**I worked hard to salvage dark matter, which included a lot of model building. I recognize the model of Navarro et al as a slight variation on a model I built in 2000 but did not publish because it was obviously wrong. It takes a lot of time to write a scientific paper, so a lot of null results never get reported. In 2000 when I did this, the natural assumption to make was that galaxies all had about the same disk fraction (the ratio of stars to dark matter, e.g., assumption (i) of Mo et al 1998). This predicts far too much scatter in the RAR, which is why I abandoned the model. Since then, this obvious and natural assumption has been replaced by abundance matching, in which the stellar mass fraction is allowed to vary to account for the difference between the predicted halo mass function and the observed galaxy luminosity function. In effect, we replaced a universal constant with a rolling fudge factor***. This has the effect of compressing the range of halo masses for a given range of stellar masses. This in turn reduces the “predicted” scatter in the RAR, just by taking away some of the variance that was naturally there. One could do better still with even more compression, as the data are crudely consistent with all galaxies living in the same dark matter halo. This is of course a consequence of MOND, in which the conventionally inferred dark matter halo is just the “extra” force specified by the interpolation function.

***This is an example of what I’ll call prediction creep for want of a better term. Originally, we thought that galaxies corresponded to balls of gas that had had time to cool and condense. As data accumulated, we realized that the baryon fractions of galaxies were not equal to the cosmic value fb; they were rather less. That meant that only a fraction of the baryons available in a dark matter halo had actually cooled to form the visible disk. So we introduced a parameter md = Mdisk/Mtot (as Mo et al. called it) where the disk is the visible stars and gas and the total includes that and all the dark matter out to the notional edge of the dark matter halo. We could have any md < fb, but they were in the same ballpark for massive galaxies, so it seemed reasonable to think that the disk fraction was a respectable fraction of the baryons – and the same for all galaxies, perhaps with some scatter. This also does not work; low mass galaxies have much lower md than high mass galaxies. Indeed, md becomes ridiculously small for the smallest galaxies, less than 1% of the available fb (a problem I’ve been worried about since the previous century). At each step, there has been a creep in what we “predict.” All the baryons should condense. Well, most of them. OK, fewer in low mass galaxies. Why? Feedback! How does that work? Don’t ask! You don’t want to know. So for a while the baryon fraction of a galaxy was just a random number stochastically generated by chance and feedback. That is reasonable (feedback is chaotic) but it doesn’t work; the variation of the disk fraction is a clear function of mass that has to have little scatter (or it pumps up the scatter in the Tully-Fisher relation). So we gradually backed our way into a paradigm where the disk fraction is a function md(M*). This has been around long enough that we have gotten used to the idea. Instead of seeing it for what it is – a rolling fudge factor – we call it natural as if it had been there from the start, as if we expected it all along. This is prediction creep. We did not predict anything of the sort. This is just an expectation built through familiarity with requirements imposed by the data, not genuine predictions made by the theory. It has become common to assert that some unnatural results are natural; this stems in part from assuming part of the answer: any model built on abundance matching is unnatural to start, because abundance matching is unnatural. Necessary, but not remotely what we expected before all the prediction creep. It’s creepy how flexible our predictions can be.

Despondency

Despondency

I have become despondent for the progress of science.

Despite enormous progress both observational and computational, we have made little progress in solving the missing mass problem. The issue is not one of technical progress. It is psychological.

Words matter. We are hung up on missing mass as literal dark matter. As Bekenstein pointed out, a less misleading name would have been the acceleration discrepancy, because the problem only appears at low accelerations. But that sounds awkward. We humans like our simple catchphrases, and often cling to them no matter what. We called it dark matter, so it must be dark matter!

Vera Rubin succinctly stated the appropriately conservative attitude of most scientists in 1982 during the discussion at IAU 100:

To highlight the end of her quote:

I believe most of us would rather alter Newtonian gravitational theory only as a last resort.

Rubin, V.C. 1983, in the proceedings of IAU Symposium 100: Internal Kinematics and Dynamics of Galaxies, p. 10.

Exactly.

In 1982, this was exactly the right attitude. It had been clearly established that there was a discrepancy between what you see and what you get. But that was about it. So, we could add a little mass that’s hard to see, or we could change a fundamental law of nature. Easy call.

By this time, the evidence for a discrepancy was clear, but the hypothesized solutions were still in development. This was before the publication of the suggestion of Peebles and separately by Steigman & Turner of cold dark matter. This was before the publication of Milgrom’s first papers on MOND. (Note that these ideas took years to develop, so much of this work was simultaneous and not done in a vacuum.) All that was clear was that something extra was needed. It wasn’t even clear how much – a factor of two in mass sufficed for many of the early observations. At that time, it was easy to imagine that amount to be lurking in low mass stars. No need for new physics, either gravitational or particle.

The situation quickly snowballed. From a factor of two, we soon needed a factor of ten. Whatever was doing the gravitating, it exceeded the mass density allowed in normal matter by big bang nucleosynthesis. By the time I was a grad student in the late ’80s, it was obvious that there had to be some kind of dark mass, and it had to be non-baryonic. That meant new particle physics (e.g., a WIMP). The cold dark matter paradigm took root.

Like a fifty year mortgage, we are basically still stuck with this decision we made in the ’80s. It made sense then, given what was then known. Does it still? At what point have we reached the last resort? More importantly, apparently, how do we persuade ourselves that we have reached this point?

Peebles provides a nice recent summary of all the ways in which LCDM is a good approximation to cosmologically relevant observations. There are a lot, and I don’t disagree with him. The basic argument is that it is very unlikely that these things all agree unless LCDM is basically correct.

Trouble is, the exact same argument applies for MOND. I’m not going to justify this here – it should be obvious. If it isn’t, you haven’t been paying attention. It is unlikely to the point of absurdity that a wholly false theory should succeed in making so many predictions of such diversity and precision as MOND has.

These are both examples of what philosophers of science call a No Miracles Argument. The problem is that it cuts both ways. I will refrain from editorializing here on which would be the bigger miracle, and simply note that the obvious thing to do is try to combine the successes of both, especially given that they don’t overlap much. And yet, the Venn diagram of scientists working to satisfy both ends is vanishingly small. Not zero, but the vast majority of the community remains stuck in the ’80s: it has to be cold dark matter. I remember having this attitude, and how hard it was to realize that it might be wrong. The intellectual blinders imposed by this attitude are more opaque than a brick wall. This psychological hangup is the primary barrier to real scientific progress (as opposed to incremental progress in the sense used by Kuhn).

Unfortunately, both CDM and MOND rely on a tooth fairy. In CDM, it is the conceit that non-baryonic dark matter actually exists. This requires new physics beyond the Standard Model of particle physics. All the successes of LCDM follow if and only if dark matter actually exists. This we do not know (contrary to many assertions to this effect); all we really know is that there are discrepancies. Whether the discrepancies are due to literal dark matter or a change in the force law is maddeningly ambiguous. Of course, the conceit in MOND is not just that there is a modified force law, but that there must be a physical mechanism by which it occurs. The first part is the well-established discrepancy. The last part remains wanting.

When we think we know, we cease to learn.

Dr. Radhakrishnan

The best scientists are always in doubt. As well as enumerating its successes, Peebles also discusses some of the ways in which LCDM might be better. Should massive galaxies appear as they do? (Not really.) Should the voids really be so empty? (MOND predicted that one.) I seldom hear these concerns from other cosmologists. That’s because they’re not in doubt. The attitude is that dark matter has to exist, and any contrary evidence is simply a square peg that can be made to fit the round hole if we pound hard enough.

And so, we’re stuck still pounding the ideas of the ’80s into the heads of innocent students, creating a closed ecosystem of stagnant ideas self-perpetuated by the echo chamber effect. I see no good way out of this; indeed, the quality of debate is palpably lower now than it was in the previous century.

So I have become despondent for the progress of science.

Bias all the way down

Bias all the way down

It often happens that data are ambiguous and open to multiple interpretations. The evidence for dark matter is an obvious example. I frequently hear permutations on the statement

We know dark matter exists; we just need to find it.

This is said in all earnestness by serious scientists who clearly believe what they say. They mean it. Unfortunately, meaning something in all seriousness, indeed, believing it with the intensity of religious fervor, does not guarantee that it is so.

The way the statement above is phrased is a dangerous half-truth. What the data show beyond any dispute is that there is a discrepancy between what we observe in extragalactic systems (including cosmology) and the predictions of Newton & Einstein as applied to the visible mass. If we assume that the equations Newton & Einstein taught us are correct, then we inevitably infer the need for invisible mass. That seems like a very reasonable assumption, but it is just that: an assumption. Moreover, it is an assumption that is only tested on the relevant scales by the data that show a discrepancy. One could instead infer that theory fails this test – it does not work to predict observed motions when applied to the observed mass. From this perspective, it could just as legitimately be said that

A more general theory of dynamics must exist; we just need to figure out what it is.

That puts an entirely different complexion on exactly the same problem. The data are the same; they are not to blame. The difference is how we interpret them.

Neither of these statements are correct: they are both half-truths; two sides of the same coin. As such, one risks being wildly misled. If one only hears one, the other gets discounted. That’s pretty much where the field is now, and has it been stuck there for a long time.

That’s certainly where I got my start. I was a firm believer in the standard dark matter interpretation. The evidence was obvious and overwhelming. Not only did there need to be invisible mass, it had to be some new kind of particle, like a WIMP. Almost certainly a WIMP. Any other interpretation (like MACHOs) was obviously stupid, as it violated some strong constraint, like Big Bang Nucleosynthesis (BBN). It had to be non-baryonic cold dark matter. HAD. TO. BE. I was sure of this. We were all sure of this.

What gets us in trouble is not what we don’t know. It’s what we know for sure that just ain’t so.

Josh Billings

I realized in the 1990s that the above reasoning was not airtight. Indeed, it has a gaping hole: we were not even considering modifications of dynamical laws (gravity and inertia). That this was a possibility, even a remote one, came as a profound and deep shock to me. It took me ages of struggle to admit it might be possible, during which I worked hard to save the standard picture. I could not. So it pains me to watch the entire community repeat the same struggle, repeat the same failures, and pretend like it is a success. That last step follows from the zeal of religious conviction: the outcome is predetermined. The answer still HAS TO BE dark matter.

So I asked myself – what if we’re wrong? How could we tell? Once one has accepted that the universe is filled with invisible mass that can’t be detected by any craft available known to us, how can we disabuse ourselves of this notion should it happen to be wrong?

One approach that occurred to me was a test in the power spectrum of the cosmic microwave background. Before any of the peaks had been measured, the only clear difference one expected was a bigger second peak with dark matter, and a smaller one without it for the same absolute density of baryons as set by BBN. I’ve written about the lead up to this prediction before, and won’t repeat it here. Rather, I’ll discuss some of the immediate fall out – some of which I’ve only recently pieced together myself.

The first experiment to provide a test of the prediction for the second peak was Boomerang. The second was Maxima-1. I of course checked the new data when they became available. Maxima-1 showed what I expected. So much so that it barely warranted comment. One is only supposed to write a scientific paper when one has something genuinely new to say. This didn’t rise to that level. It was more like checking a tick box. Besides, lots more data were coming; I couldn’t write a new paper every time someone tacked on an extra data point.

There was one difference. The Maxima-1 data had a somewhat higher normalization. The shape of the power spectrum was consistent with that of Boomerang, but the overall amplitude was a bit higher. The latter mattered not at all to my prediction, which was for the relative amplitude of the first to second peaks.

Systematic errors, especially in the amplitude, were likely in early experiments. That’s like rule one of observing the sky. After examining both data sets and the model expectations, I decided the Maxima-1 amplitude was more likely to be correct, so I asked what offset was necessary to reconcile the two. About 14% in temperature. This was, to me, no big deal – it was not relevant to my prediction, and it is exactly the sort of thing one expects to happen in the early days of a new kind of observation. It did seem worth remarking on, if not writing a full blown paper about, so I put it in a conference presentation (McGaugh 2000), which was published in a journal (IJMPA, 16, 1031) as part of the conference proceedings. This correctly anticipated the subsequent recalibration of Boomerang.

The figure from McGaugh (2000) is below. Basically, I said “gee, looks like the Boomerang calibration needs to be adjusted upwards a bit.” This has been done in the figure. The amplitude of the second peak remained consistent with the prediction for a universe devoid of dark matter. In fact, if got better (see Table 4 of McGaugh 2004).

Plot from McGaugh (2000): The predictions of LCDM (left) and no-CDM (right) compared to Maxima-1 data (open points) and Boomerang data (filled points, corrected in normalization). The LCDM model shown is the most favorable prediction that could be made prior to observation of the first two peaks; other then-viable choices of cosmic parameters predicted a higher second peak. The no-CDM got the relative amplitude right a priori, and remains consistent with subsequent data from WMAP and Planck.

This much was trivial. There was nothing new to see, at least as far as the test I had proposed was concerned. New data were pouring in, but there wasn’t really anything worth commenting on until WMAP data appeared several years later, which persisted in corroborating the peak ratio prediction. By this time, the cosmological community had decided that despite persistent corroborations, my prediction was wrong.

That’s right. I got it right, but then right turned into wrong according to the scuttlebutt of cosmic gossip. This was a falsehood, but it took root, and seems to have become one of the things that cosmologists know for sure that just ain’t so.

How did this come to pass? I don’t know. People never asked me. My first inkling was 2003, when it came up in a chance conversation with Marv Leventhal (then chair of Maryland Astronomy), who opined “too bad the data changed on you.” This shocked me. Nothing relevant in the data had changed, yet here was someone asserting that it had like it was common knowledge. Which I suppose it was by then, just not to me.

Over the years, I’ve had the occasional weird conversation on the subject. In retrospect, I think the weirdness stemmed from a divergence of assumed knowledge. They knew I was right then wrong. I knew the second peak prediction had come true and remained true in all subsequent data, but the third peak was a different matter. So there were many opportunities for confusion. In retrospect, I think many of these people were laboring under the mistaken impression that I had been wrong about the second peak.

I now suspect this started with the discrepancy between the calibration of Boomerang and Maxima-1. People seemed to be aware that my prediction was consistent with the Boomerang data. Then they seem to have confused the prediction with those data. So when the data changed – i.e., Maxima-1 was somewhat different in amplitude, then it must follow that the prediction now failed.

This is wrong on many levels. The prediction is independent of the data that test it. It is incredibly sloppy thinking to confuse the two. More importantly, the prediction, as phrased, was not sensitive to this aspect of the data. If one had bothered to measure the ratio in the Maxima-1 data, one would have found a number consistent with the no-CDM prediction. This should be obvious from casual inspection of the figure above. Apparently no one bothered to check. They didn’t even bother to understand the prediction.

Understanding a prediction before dismissing it is not a hard ask. Unless, of course, you already know the answer. Then laziness is not only justified, but the preferred course of action. This sloppy thinking compounds a number of well known cognitive biases (anchoring bias, belief bias, confirmation bias, to name a few).

I mistakenly assumed that other people were seeing the same thing in the data that I saw. It was pretty obvious, after all. (Again, see the figure above.) It did not occur to me back then that other scientists would fail to see the obvious. I fully expected them to complain and try and wriggle out of it, but I could not imagine such complete reality denial.

The reality denial was twofold: clearly, people were looking for any excuse to ignore anything associated with MOND, however indirectly. But they also had no clear prior for LCDM, which I did establish as a point of comparison. A theory is only as good as its prior, and all LCDM models made before these CMB data showed the same thing: a bigger second peak than was observed. This can be fudged: there are ample free parameters, so it can be made to fit; one just had to violate BBN (as it was then known) by three or four sigma.

In retrospect, I think the very first time I had this alternate-reality conversation was at a conference at the University of Chicago in 2001. Andrey Kravtsov had just joined the faculty there, and organized a conference to get things going. He had done some early work on the cusp-core problem, which was still very much a debated thing at the time. So he asked me to come address that topic. I remember being on the plane – a short ride from Cleveland – when I looked at the program. Nearly did a spit take when I saw that I was to give the first talk. There wasn’t a lot of time to organize my transparencies (we still used overhead projectors in those days) but I’d given the talk many times before, so it was enough.

I only talked about the rotation curves of low surface brightness galaxies in the context of the cusp-core problem. That was the mandate. I didn’t talk about MOND or the CMB. There’s only so much you can address in a half hour talk. [This is a recurring problem. No matter what I say, there always seems to be someone who asks “why didn’t you address X?” where X is usually that person’s pet topic. Usually I could do so, but not in the time allotted.]

About halfway through this talk on the cusp-core problem, I guess it became clear that I wasn’t going to talk about things that I hadn’t been asked to talk about, and I was interrupted by Mike Turner, who did want to talk about the CMB. Or rather, extract a confession from me that I had been wrong about it. I forget how he phrased it exactly, but it was the academic equivalent of “Have you stopped beating your wife lately?” Say yes, and you admit to having done so in the past. Say no, and you’re still doing it. What I do clearly remember was him prefacing it with “As a test of your intellectual honesty” as he interrupted to ask a dishonest and intentionally misleading question that was completely off-topic.

Of course, the pretext for his attack question was the Maxima-1 result. He phrased it in a way that I had to agree that those disproved my prediction, or be branded a liar. Now, at the time, there were rumors swirling that the experiment – some of the people who worked on it were there – had detected the third peak, so I thought that was what he was alluding to. Those data had not yet been published and I certainly had not seen them, so I could hardly answer that question. Instead, I answered the “intellectual honesty” affront by pointing to a case where I had said I was wrong. At one point, I thought low surface brightness galaxies might explain the faint blue galaxy problem. On closer examination, it became clear that they could not provide a complete explanation, so I said so. Intellectual honesty is really important to me, and should be to all scientists. I have no problem admitting when I’m wrong. But I do have a problem with demands to admit that I’m wrong when I’m not.

To me, it was obvious that the Maxima-1 data were consistent with the second peak. The plot above was already published by then. So it never occurred to me that he thought the Maxima-1 data were in conflict with what I had predicted – it was already known that it was not. Only to him, it was already known that it was. Or so I gather – I have no way to know what others were thinking. But it appears that this was the juncture in which the field suffered a psychotic break. We are not operating on the same set of basic facts. There has been a divergence in personal realities ever since.

Arthur Kosowsky gave the summary talk at the end of the conference. He told me that he wanted to address the elephant in the room: MOND. I did not think the assembled crowd of luminary cosmologists were mature enough for that, so advised against going there. He did, and was incredibly careful in what he said: empirical, factual, posing questions rather than making assertions. Why does MOND work as well as it does?

The room dissolved into chaotic shouting. Every participant was vying to say something wrong more loudly than the person next to him. (Yes, everyone shouting was male.) Joel Primack managed to say something loudly enough for it to stick with me, asserting that gravitational lensing contradicted MOND in a way that I had already shown it did not. It was just one of dozens of superficial falsehoods that people take for granted to be true if they align with one’s confirmation bias.

The uproar settled down, the conference was over, and we started to disperse. I wanted to offer Arthur my condolences, having been in that position many times. Anatoly Klypin was still giving it to him, keeping up a steady stream of invective as everyone else moved on. I couldn’t get a word in edgewise, and had a plane home to catch. So when I briefly caught Arthur’s eye, I just said “told you” and moved on. Anatoly paused briefly, apparently fathoming that his behavior, like that of the assembled crowd, was entirely predictable. Then the moment of awkward self-awareness passed, and he resumed haranguing Arthur.

Divergence

Divergence

Reality check

Before we can agree on the interpretation of a set of facts, we have to agree on what those facts are. Even if we agree on the facts, we can differ about their interpretation. It is OK to disagree, and anyone who practices astrophysics is going to be wrong from time to time. It is the inevitable risk we take in trying to understand a universe that is vast beyond human comprehension. Heck, some people have made successful careers out of being wrong. This is OK, so long as we recognize and correct our mistakes. That’s a painful process, and there is an urge in human nature to deny such things, to pretend they never happened, or to assert that what was wrong was right all along.

This happens a lot, and it leads to a lot of weirdness. Beyond the many people in the field whom I already know personally, I tend to meet two kinds of scientists. There are those (usually other astronomers and astrophysicists) who might be familiar with my work on low surface brightness galaxies or galaxy evolution or stellar populations or the gas content of galaxies or the oxygen abundances of extragalactic HII regions or the Tully-Fisher relation or the cusp-core problem or faint blue galaxies or big bang nucleosynthesis or high redshift structure formation or joint constraints on cosmological parameters. These people behave like normal human beings. Then there are those (usually particle physicists) who have only heard of me in the context of MOND. These people often do not behave like normal human beings. They conflate me as a person with a theory that is Milgrom’s. They seem to believe that both are evil and must be destroyed. My presence, even the mere mention of my name, easily destabilizes their surprisingly fragile grasp on sanity.

One of the things that scientists-gone-crazy do is project their insecurities about the dark matter paradigm onto me. People who barely know me frequently attribute to me motivations that I neither have nor recognize. They presume that I have some anti-cosmology, anti-DM, pro-MOND agenda, and are remarkably comfortably about asserting to me what it is that I believe. What they never explain, or apparently bother to consider, is why I would be so obtuse? What is my motivation? I certainly don’t enjoy having the same argument over and over again with their ilk, which is the only thing it seems to get me.

The only agenda I have is a pro-science agenda. I want to know how the universe works.

This agenda is not theory-specific. In addition to lots of other astrophysics, I have worked on both dark matter and MOND. I will continue to work on both until we have a better understanding of how the universe works. Right now we’re very far away from obtaining that goal. Anyone who tells you otherwise is fooling themselves – usually by dint of ignoring inconvenient aspects of the evidence. Everyone is susceptible to cognitive dissonance. Scientists are no exception – I struggle with it all the time. What disturbs me is the number of scientists who apparently do not. The field is being overrun with posers who lack the self-awareness to question their own assumptions and biases.

So, I feel like I’m repeating myself here, but let me state my bias. Oh wait. I already did. That’s why it felt like repetition. It is.

The following bit of this post is adapted from an old web page I wrote well over a decade ago. I’ve lost track of exactly when – the file has been through many changes in computer systems, and unix only records the last edit date. For the linked page, that’s 2016, when I added a few comments. The original is much older, and was written while I was at the University of Maryland. Judging from the html style, it was probably early to mid-’00s. Of course, the sentiment is much older, as it shouldn’t need to be said at all.

I will make a few updates as seem appropriate, so check the link if you want to see the changes. I will add new material at the end.


Long standing remarks on intellectual honesty

The debate about MOND often degenerates into something that falls well short of the sober, objective discussion that is suppose to characterize scientific debates. One can tell when voices are raised and baseless ad hominem accusations made. I have, with disturbing frequency, found myself accused of partisanship and intellectual dishonesty, usually by people who are as fair and balanced as Fox News.

Let me state with absolute clarity that intellectual honesty is a bedrock principle of mine. My attitude is summed up well by the quote

When a man lies, he murders some part of the world.

Paul Gerhardt

I first heard this spoken by the character Merlin in the movie Excalibur (1981 version). Others may have heard it in a song by Metallica. As best I can tell, it is originally attributable to the 17th century cleric Paul Gerhardt.

This is a great quote for science, as the intent is clear. We don’t get to pick and choose our facts. Outright lying about them is antithetical to science.

I would extend this to ignoring facts. One should not only be honest, but also as complete as possible. It does not suffice to be truthful while leaving unpleasant or unpopular facts unsaid. This is lying by omission.

I “grew up” believing in dark matter. Specifically, Cold Dark Matter, presumably a WIMP. I didn’t think MOND was wrong so much as I didn’t think about it at all. Barely heard of it; not worth the bother. So I was shocked – and angered – when it its predictions came true in my data for low surface brightness galaxies. So I understand when my colleagues have the same reaction.

Nevertheless, Milgrom got the prediction right. I had a prediction, it was wrong. There were other conventional predictions, they were also wrong. Indeed, dark matter based theories generically have a very hard time explaining these data. In a Bayesian sense, given the prior that we live in a ΛCDM universe, the probability that MONDian phenomenology would be observed is practically zero. Yet it is. (This is very well established, and has been for some time.)

So – confronted with an unpopular theory that nevertheless had some important predictions come true, I reported that fact. I could have ignored it, pretended it didn’t happen, covered my eyes and shouted LA LA LA NOT LISTENING. With the benefit of hindsight, that certainly would have been the savvy career move. But it would also be ignoring a fact, and tantamount to a lie.

In short, though it was painful and protracted, I changed my mind. Isn’t that what the scientific method says we’re suppose to do when confronted with experimental evidence?

That was my experience. When confronted with evidence that contradicted my preexisting world view, I was deeply troubled. I tried to reject it. I did an enormous amount of fact-checking. The people who presume I must be wrong have not had this experience, and haven’t bothered to do any fact-checking. Why bother when you already are sure of the answer?


Willful Ignorance

I understand being skeptical about MOND. I understand being more comfortable with dark matter. That’s where I started from myself, so as I said above, I can empathize with people who come to the problem this way. This is a perfectly reasonable place to start.

For me, that was over a quarter century ago. I can understand there being some time lag. That is not what is going on. There has been ample time to process and assimilate this information. Instead, most physicists have chosen to remain ignorant. Worse, many persist in spreading what can only be described as misinformation. I don’t think they are liars; rather, it seems that they believe their own bullshit.

To give an example of disinformation, I still hear said things like “MOND fits rotation curves but nothing else.” This is not true. The first thing I did was check into exactly that. Years of fact-checking went into McGaugh & de Blok (1998), and I’ve done plenty more since. It came as a great surprise to me that MOND explained the vast majority of the data as well or better than dark matter. Not everything, to be sure, but lots more than “just” rotation curves. Yet this old falsehood still gets repeated as if it were not a misconception that was put to rest in the previous century. We’re stuck in the dark ages by choice.

It is not a defensible choice. There is no excuse to remain ignorant of MOND at this juncture in the progress of astrophysics. It is incredibly biased to point to its failings without contending with its many predictive successes. It is tragi-comically absurd to assume that dark matter provides a better explanation when it cannot make the same predictions in advance. MOND may not be correct in every particular, and makes no pretense to be a complete theory of everything. But it is demonstrably less wrong than dark matter when it comes to predicting the dynamics of systems in the low acceleration regime. Pretending like this means nothing is tantamount to ignoring essential facts.

Even a lie of omission murders a part of the world.

Galaxy Stellar and Halo Masses: tension between abundance matching and kinematics

Galaxy Stellar and Halo Masses: tension between abundance matching and kinematics

Mass is a basic quantity. How much stuff does an astronomical object contain? For a galaxy, mass can mean many different things: that of its stars, stellar remnants (e.g., white dwarfs, neutron stars), atomic gas, molecular clouds, plasma (ionized gas), dust, Bok globules, black holes, habitable planets, biomass, intelligent life, very small rocks… these are all very different numbers for the same galaxy, because galaxies contain lots of different things. Two things that many scientists have settled on as Very Important are a galaxy’s stellar mass and its dark matter halo mass.

The mass of a galaxy’s dark matter halo is not well known. Most measurement provide only lower limits, as tracers fade out before any clear end is reached. Consequently, the “total” mass is a rather notional quantity. So we’ve adopted as a convention the mass M200 contained within an over-density of 200 times the critical density of the universe. This is a choice motivated by an ex-theory that would take an entire post to explain unsatisfactorily, so do not question the convention: all choices are bad, so we stick with it.

One of the long-standing problems the cold dark matter paradigm has is that the galaxy luminosity function should be steep but is observed to be shallow. This sketch shows the basic issue. The number density of dark matter halos as a function of mass is expected to be a power law – one that is well specified once the cosmology is known and a convention for the mass is adopted. The obvious expectation is that the galaxy luminosity function should just be a downshifted version of the halo mass function: one galaxy per halo, with the stellar mass proportional to the halo mass. This was such an obvious assumption [being provision (i) of canonical galaxy formation in LCDM] that it was not seriously questioned for over a decade. (Minor point: a turn down at the high mass end could be attributed to gas cooling times: the universe didn’t have time to cool and assemble a galaxy above some threshold mass, but smaller things had plenty of time for gas to cool and form stars.)

The number density of galaxies (blue) and dark matter halos (red) as a function of their mass. Our original expectation is on the left: the galaxy mass function should be a down-shifted version of the halo mass function, up to a gas cooling limit. Dashed grey lines illustrate the correspondence of galaxies with dark matter halos of proportional mass: M* = md M200. On the right is the current picture of abundance matching with the grey lines connecting galaxies with dark matter halos of equal cosmic density in which they are supposed to reside. In effect, we make the proportionality factor md a rolling, mass-dependent fudge factor.

The galaxy luminosity function does not look like a shifted version of the halo mass function. It has the wrong slope at the faint end. At no point is the size of the shift equal to what one would expect from the mass of available baryons. The proportionality factor md is too small; this is sometimes called the over-cooling problem, in that a lot more baryons should have cooled to form stars than apparently did so. So, aside from the shape and the normalization, it’s a great match.

We obsessed about this problem all through the ’90s. At one point, I thought I had solved it. Low surface brightness galaxies were under-represented in galaxy surveys. They weren’t missed entirely, but their masses could be systematically underestimated. This might matter a lot because the associated volume corrections are huge. A small systematic in mass would get magnified into a big one in density. Sadly, after a brief period of optimism, it became clear that this could not work to solve the entire problem, which persists.

Circa 2000, a local version of the problem became known as the missing satellites problem. This is a down-shifted version of the mismatch between the galaxy luminosity function and the halo mass function that pervades the entire universe: few small galaxies are observed where many are predicted. To give visual life to the numbers we’re talking about, here is an image of the dark matter in a simulation of a Milky Way size galaxy:

Dark Matter in the Via Lactea simulation (Diemand et al. 2008). The central region is the main dark matter halo which would contain a large galaxy like the Milky Way. All the lesser blobs are subhalos. A typical galaxy-sized dark matter halo should contain many, many subhalos. Naively, we expect each subhalo to contain a dwarf satellite galaxy. Structure is scale-free in CDM, so major galaxies should look like miniature clusters of galaxies.

In contrast, real galaxies have rather fewer satellites that meet the eye:

NGC 6946 and environs. The points are foreground stars, ignore them. The neighborhood of NGC 6946 appears to be pretty empty – there is no swarm of satellite galaxies as in the simulation above. I know of two dwarf satellite galaxies in this image, both of low surface brightness. The brighter one (KK98-250) the sharp-eyed may find between the bright stars at top right. The fainter one (KK98-251) is nearby KK98-250, a bit down and to the left of it; good luck seeing it on this image from the Digital Sky Survey. That’s it. There are no other satellite galaxies visible here. There can of course be more that are too low in surface brightness to detect. The obvious assumption of a one-to-one relation between stellar and halo mass cannot be sustained; there must instead be a highly non-linear relation between mass and light so that subhalos contain only contain dwarfs of extraordinarily low surface brightness.

By 2010, we’d thrown in the towel, and decided to just accept that this aspect of the universe was too complicated to predict. The story now is that feedback changes the shape of the luminosity function at both the faint and the bright ends. Exactly how depends on who you ask, but the predicted halo mass function is sacrosanct so there must be physical processes that make it so. (This is an example of the Frenk Principle in action.)

Lacking a predictive theory, theorists instead came up with a clever trick to relate galaxies to their dark matter halos. This has come to be known as abundance matching. We measure the number density of galaxies as a function of stellar mass. We know, from theory, what the number density of dark matter halos should be as a function of halo mass. Then we match them up: galaxies of a given density live in halos of the corresponding density, as illustrated by the horizontal gray lines in the right panel of the figure above.

There have now been a number of efforts to quantify this. Four examples are given in the figure below (see this paper for references), together with kinematic mass estimates.

The ratio of stellar to halo mass as a function of dark matter halo mass. Lines represent the abundance matching relations derived by assigning galaxies to dark matter halos based on their cosmic abundance. Points are independent halo mass estimates based on kinematics (McGaugh et al. 2010). The horizontal dashed line represents the maximum stellar mass that would result if all available baryons were turned into stars. (Mathematically, this happens when md equals the cosmic baryon fraction, about 15%.)

The abundance matching relations have a peak around a halo mass of 1012 M and fall off to either side. This corresponds to the knee in the galaxy luminosity function. For whatever reason, halos of this mass seem to be most efficient at converting their available baryons into stars. The shape of these relations mean that there is a non-linear relation between stellar mass and halo mass. At the low mass end, a big range in stellar mass is compressed into a small range in halo mass. The opposite happens at high mass, where the most massive galaxies are generally presumed to be the “central” galaxy of a cluster of galaxies. We assign the most massive halos to big galaxies understanding that they may be surrounded by many subhalos, each containing a cluster galaxy.

Around the same time, I made a similar plot, but using kinematic measurements to estimate halo masses. Both methods are fraught with potential systematics, but they seem to agree reasonably well – at least over the range illustrated above. It gets dodgy above and below that. The agreement is particularly good for lower mass galaxies. There seems to be a departure for the most massive individual galaxies, but why worry about that when the glass is 3/4 full?

Skip ahead a decade, and some people think we’ve solved the missing satellite problem. One key ingredient of that solution is that the Milky Way resides in a halo that is on the lower end of the mass range that has traditionally been estimated for it (1 to 2 x 1012 M). This helps because the number of subhalos scales with mass: clusters are big halos with lots of galaxy-size halos; the Milky Way is a galaxy-sized halo with lots of smaller subhalos. Reality does not look like that, but having a lower mass means fewer subhalos, so that helps. It does not suffice. We must invoke feedback effects to make the relation between light and mass nonlinear. Then the lowest mass satellites may be too dim to detect: selection effects have to do a lot of work. It also helps to assume the distribution of satellites is isotropic, which looks to be true in the simulation, but not so much in reality where known dwarf satellites occupy a planar distribution. We also need to somehow fudge the too-big-to-fail problem, in which the more massive subhalos appear not to be occupied by luminous galaxies at all. Given all that, we can kinda sorta get in the right ballpark. Kinda, sorta, provided that we live in a galaxy whose halo mass is closer to 1012 M than to 2 x 1012 M.

At an IAU meeting in Shanghai (in July 2019, before travel restrictions), the subject of the mass of the Milky Way was discussed at length. It being our home galaxy, there are many ways in which to constrain the mass, some of which take advantage of tracers that go out to greater distances than we can obtain elsewhere. Speaker after speaker used different methods to come to a similar conclusion, with the consensus hedging on the low side (roughly 1 – 1.5 x 1012 M). A nice consequence would be that the missing satellite problem may no longer be a problem.

Galaxies in general and the Milky Way in particular are different and largely distinct subfields. Different data studied by different people with distinctive cultures. In the discussion at the end of the session, Pieter van Dokkum pointed out that from the perspective of other galaxies, the halo mass ought to follow from abundance matching, which for a galaxy like the Milky Way ought to be more like 3 x 1012 M, considerably more than anyone had suggested, but hard to exclude because most of that mass could be at distances beyond the reach of the available tracers.

This was not well received.

The session was followed by a coffee break, and I happened to find myself standing in line next to Pieter. I was still processing his comment, and decided he was right – from a certain point of view. So we got to talking about it, and wound up making the plot below, which appears in a short research note. (For those who know the field, it might be assumed that Pieter and I hate each other. This is not true, but we do frequently disagree, so the fact that we do agree about this is itself worthy of note.)

The Local Group and its two most massive galaxies, the Milky Way and Andromeda (M31), in the stellar mass-halo mass plane. Lines are the abundance matching relations from above. See McGaugh & van Dokkum for further details. The remaining galaxies of the Local Group all fall off the edge of this plot, and do not add up to anything close to either the Milky Way or Andromeda alone.

The Milky Way and Andromeda are the 1012 M gorillas of the Local Group. There are many dozens of dwarf galaxies, but none of them are comparable in mass, even with the boost provided by the non-linear relation between mass and luminosity. To astronomical accuracy, in terms of mass, the Milky Way plus Andromeda are the Local Group. There are many distinct constraints, on each galaxy as an individual, and on the Local Group as a whole. Any way we slice it, all three entities lie well off the relation expected from abundance matching.

There are several ways one could take it from here. One might suppose that abundance matching is correct, and we have underestimated the mass with other measurements. This happens all the time with rotation curves, which typically do not extend far enough out into the halo to give a good constraint on the total mass. This is hard to maintain for the Local Group, where we have lots of tracers in the form of dwarf satellites, and there are constraints on the motions of galaxies on still larger scales. Moreover, a high mass would be tragic for the missing satellite problem.

One might instead imagine that there is some scatter in the abundance matching relation, and we just happen to live in a galaxy that has a somewhat low mass for its luminosity. This is almost reasonable for the Milky Way, as there is some overlap between kinematic mass estimates and the expectations of abundance matching. But the missing satellite problem bites again unless we are pretty far off the central value of the abundance matching relation. Other Milky Way-like galaxies ought to fall on the other end of the spectrum, with more mass and more satellites. A lot of work is going on to look for satellites around other spirals, which is hard work (see NGC 6946 above). There is certainly scatter in the number of satellites from system to system, but whether this is theoretically sensible or enough to explain our Milky Way is not yet apparent.

There is a tendency in the literature to invoke scatter when and where needed. Here, it is important to bear in mind that there is little scatter in the Tully-Fisher relation. This is a relation between stellar mass and rotation velocity, with the latter supposedly set by the halo mass. We can’t have it both ways. Lots of scatter in the stellar mass-halo mass relation ought to cause a corresponding amount of scatter in Tully-Fisher. This is not observed. It is a much stronger than most people seem to appreciate, as even subtle effects are readily perceptible. Consequently, I think it unlikely that we can nuance the relation between halo mass and observed rotation speed to satisfy both relations without a lot of fine-tuning, which is usually a sign that something is wrong.

There are a lot of moving parts in modern galaxy formation simulations that need to be fine-tuned: the effects of halo mass, merging, dissipation, [non]adiabatic compression, angular momentum transport, gas cooling, on-going accretion of gas from the intergalactic medium, expulsion of gas in galactic winds, re-accretion of expelled gas via galactic fountains, star formation and the ensuing feedback from radiation pressure, stellar winds, supernovae, X-rays from stellar remnants, active galactic nuclei, and undoubtedly other effects I don’t recall off the top of my head. Visualization from the Dr. Seuss suite of simulations.

A lot of effort has been put into beating down the missing satellite problem around the Milky Way. Matters are worse for Andromeda. Kinematic halo mass estimates are typically in the same ballpark as the Milky Way. Some are a bit bigger, some are lower. Lower is a surprise, because the stellar mass of M31 is clearly bigger than that of the Milky Way, placing it is above the turnover where the efficiency of star formation is maximized. In this regime, a little stellar mass goes a long way in terms of halo mass. Abundance matching predicts that a galaxy of Andromeda’s stellar mass should reside in a dark matter halo of at least 1013 M. That’s quite a bit more than 1 or 2 x 1012 M, even by astronomical standards. Put another way, according to abundance matching, the Local Group should have the Milky Way as its most massive occupant. Just the Milky Way. Not the Milky Way plus Andromeda. Despite this, the Local Group is not anomalous among similar groups.

Words matter. A lot boils down to what we consider to be “close enough” to call similar. I do not consider the Milky Way and Andromeda to be all that similar. They are both giant spirals, yes, but galaxies are all individuals. Being composed of hundreds of billions of stars, give or take, leaves a lot of room for differences. In this case, the Milky Way and Andromeda are easily distinguished in the Tully-Fisher plane. Andromeda is about twice the baryonic mass of the Milky Way. It also rotates faster. The error bars on these quantities do not come close to overlapping – that would be one criterion for considering them to be similar – a criterion they do not meet. Even then, there could be other features that might be readily distinguished, but let’s say a rough equality in the Tully-Fisher plane would indicate stellar and halo masses that are “close enough” for our present discussion. They aren’t: to me, the Milky Way and M31 are clearly different galaxies.

I spent a fair amount of time reading the recent literature on satellites searches, and I was struck by the ubiquity with which people make the opposite assumption, treating the Milky Way and Andromeda as interchangeable galaxies of similar mass. Why would they do this? If one looks at the kinematic halo mass as the defining characteristic of a galaxy, they’re both close to 1012 M, with overlapping error bars on M200. By that standard, it seems fair. Is it?

Luminosity is observable. Rotation speed is observable. There are arguments to be had about how to convert luminosity into stellar mass, and what rotation speed measure is “best.” These are sometimes big arguments, but they are tiny in scale compared to estimating notional quantities like the halo mass. The mass M200 is not an observable quantity. As such, we have no business using it as a defining characteristic of a galaxy. You know a galaxy when you see it. The same cannot be said of a dark matter halo. Literally.

If, for some theoretically motivated reason, we want to use halo mass as a standard then we need to at least use a consistent method to assess its value from directly observable quantities. The methods we use for the Milky Way and M31 are not applicable beyond the Local Group. Nowhere else in the universe do we have such an intimate picture of the kinematic mass from a wide array of independent methods with tracers extending to such large radii. There are other standards we could apply, like the Tully-Fisher relation. That we can do outside the Local Group, but by that standard we would not infer that M31 and the Milky Way are the same. Other observables we can fairly apply to other galaxies are their luminosities (stellar masses) and cosmic number densities (abundance matching). From that perspective, what we know from all the other galaxies in the universe is that the factor of ~2 difference in stellar mass between Andromeda and the Milky Way should be huge in terms of halo mass. If it were anywhere else in the universe, we wouldn’t treat these two galaxies as interchangeably equal. This is the essence of Pieter’s insight: abundance matching is all about the abundance of dark matter halos, so that would seem to be the appropriate metric by which to predict the expected number of satellites, not the kinematic halo mass that we can’t measure in the same way anywhere else in the universe.

That isn’t to say we don’t have some handle on kinematic halo masses, it’s just that most of that information comes from rotation curves that don’t typically extend as far as the tracers that we have in the Local Group. Some rotation curves are more extended than others, so one has to account for that variation. Typically, we can only put a lower limit on the halo mass, but if we assume a profile like NFW – the standard thing to do in LCDM, then we can sometimes exclude halos that are too massive.

Abundance matching has become important enough to LCDM that we included it as a prior in fitting dark matter halo models to rotation curves. For example:

The stellar mass-halo mass relation from rotation curve fits (Li et al 2020). Each point is one galaxy; the expected abundance matching relation (line) is not recovered (left) unless it is imposed as a prior (right). The data are generally OK with this because the amount of mass at radii beyond the end of the rotation curve is not strongly constrained. Still, there are some limits on how crazy this can get.

NFW halos are self-similar: low mass halos look very much like high mass halos over the range that is constrained by data. Consequently, if you have some idea what the total mass of the halo should be, as abundance matching provides, and you impose that as a prior, the fits for most galaxies say “OK.” The data covering the visible galaxy have little power to constrain what is going on with the dark matter halo at much larger radii, so the fits literally fall into line when told to do so, as seen in Pengfei‘s work.

That we can impose abundance matching as a prior does not necessarily mean the result is reasonable. The highest halo masses that abundance matching wants in the plot above are crazy talk from a kinematic perspective. I didn’t put too much stock in this, as the NFW halo itself, the go-to standard of LCDM, provides the worst description of the data among all the dozen or so halo models that we considered. Still, we did notice that even with abundance matching imposed as a prior, there are a lot more points above the line than below it at the high mass end (above the bend in the figure above). The rotation curves are sometimes pushing back against the imposed prior; they often don’t want such a high halo mass. This was explored in some detail by Posti et al., who found a similar effect.

I decided to turn the question around. Can we use abundance matching to predict the halo and hence rotation curve of a massive galaxy? The largest spiral in the local universe, UGC 2885, has one of the most extended rotation curves known, meaning that it does provide some constraint on the halo mass. This galaxy has been known as an important case since Vera Rubin’s work in the ’70s. With a modern distance scale, its rotation curve extends out 80 kpc. That’s over a quarter million light-years – a damn long way, even by the standards of galaxies. It also rotates remarkably fast, just shy of 300 km/s. It is big and massive.

(As an aside, Vera once offered a prize for anyone who found a disk that rotated faster than 300 km/s. Throughout her years of looking at hundreds of galaxies, UGC 2885 remained the record holder, with 300 seeming to be a threshold that spirals did not exceed. She told me that she did pay out, but on a technicality: someone showed her a gas disk around a supermassive black hole in Keplerian rotation that went up to 500 km/s at its peak. She lamented that she had been imprecise in her language, as that was nothing like what she meant, which was the flat rotation speed of a spiral galaxy.)

That aside aside, if we take abundance matching at face value, then the stellar mass of a galaxy predicts the mass of its dark matter halo. Using the most conservative (in that it returns the lowest halo mass) of the various abundance matching relations indicates that with a stellar mass of about 2 x 1011 M, UGC 2885 should have a halo mass of 3 x 1013 M. Combining this with a well-known relation between halo concentration and mass for NFW halos, we then know what the rotation curve should be. Doing this for UGC 2885 yields a tragic result:

The extended rotation curve of UGC 2885 (points). The declining dotted line is the rotation curve predicted by the observed stars and gas. The rising dashed line is the halo predicted by abundance matching. Combining this halo with the observed stars and gas should result in the solid line. This greatly exceeds the data. UGC 2885 does not reside in an NFW halo that is anywhere near as massive as predicted by abundance matching.

The data do not allow for the predicted amount of dark matter. If we fit the rotation curve, we obtain a “mere” M200 = 5 x 1012 M. Note that this means that UGC 2885 is basically the Milky Way and Andromeda added together in terms of both stellar mass and halo mass – if added to the M*-M200 plot above, it would land very close to the open circle representing the more massive halo estimate for the combination of MW+M31, and be just as discrepant from the abundance matching relations. We get the same result regardless of which direction we look at it from.

Objectively, 5 x 1012 M is a huge dark matter halo for a single galaxy. It’s just not the yet-more massive halo that is predicted by abundance matching. In this context, UGC 2885 apparently has a serious missing satellites problem, as it does not appear to be swimming in a sea of satellite galaxies the way we’d expect for the central galaxy of such high mass halo.

UGC 2885 appears to be pretty lonely in this image from the DSS. I see a few candidate satellite galaxies amidst the numerous foreground stars, but nothing like what you’d expect for dark matter subhalos from a simulation like the via Lactea. This impression does not change when imaged in more detail with HST.

It is tempting to write this off as a curious anecdote. Another outlier. Sure, that’s always possible, but this is more than a bit ridiculous. Anyone who wants to go this route I refer to Snoop Dog.

I spent much of my early career obsessed with selection effects. These preclude us from seeing low surface brightness galaxies as readily as brighter ones. However, it isn’t binary – a galaxy has to be extraordinarily low surface brightness before it becomes effectively invisible. The selection effect is a bias – and a very strong one – but not an absolute screen that prevents us from finding low surface brightness galaxies. That makes it very hard to sustain the popular notion that there are lots of subhalos that simply contain ultradiffuse galaxies that cannot currently be seen. I’ve been down this road many times as an optimist in favor of this interpretation. It hasn’t worked out. Selection effects are huge, but still nowhere near big enough to overcome the required deficit.

Having the satellite galaxies that inhabit subhalos be low in surface brightness is a necessary but not sufficient criterion. It is also necessary to have a highly non-linear stellar mass-halo mass relation at low mass. In effect, luminosity and halo mass become decoupled: satellite galaxies spanning a vast range in luminosity must live in dark matter halos that cover only a tiny range. This means that it should not be possible to predict stellar motions in these galaxies from their luminosity. The relation between mass and light has just become too weak and messy.

And yet, we can do exactly that. Over and over again. This simply should not be possible in LCDM.

The Fat One – a test of structure formation with the most massive cluster of galaxies

The Fat One – a test of structure formation with the most massive cluster of galaxies

A common objection to MOND is that it does not entirely reconcile the mass discrepancy in clusters of galaxies. This can be seen as an offset in the acceleration scale between individual galaxies and clusters. This is widely seen as definitive proof of dark matter, but this is just defaulting to our confirmation bias without checking if it is really any better: just because MOND does something wrong doesn’t automatically mean that LCDM does it right.

The characteristic acceleration (in units of Milgrom’s constant a0) of extragalactic objects as a function of their baryonic mass, ranging from tiny dwarf galaxies to giant clusters of galaxies. Clusters are offset from individual galaxies, implying a residual missing mass problem for MOND. From Famaey & McGaugh (2012).

I do see clusters as a problem for MOND, and there are some aspects of clusters that make good sense in LCDM. Unlike galaxies, cluster mass profiles are generally consistent with the predicted NFW halos (modulo their own core problem). That’s not a contradiction to MOND, which should do the same thing as Newton in the Newtonian regime. But rich clusters also have baryon fractions close to that expected from cosmology. From that perspective, it looks pretty reasonable. This success does not extend to lower mass clusters; in the plot above, the low mass green triangles should be higher than the higher mass gray triangles in order for all clusters to have the cosmic baryon fraction. They should not parallel the prediction of MOND. Within individual clusters, baryons are not as well mixed with dark matter as expected: they tend to have too much unseen mass at small radius, which is basically the same problem encountered by MOND.

There are other tests, one of which is the growth of clusters. Structure is predicted to form hierarchically in LCDM: small objects form first, and pile on to make bigger ones, with the largest clusters being the last to form. So there is a test in how massive a cluster can get as a function of redshift. This is something for which LCDM makes a clear prediction. In MOND, my expectation is that structure forms faster so that massive objects are in place at higher redshift than expected in LCDM. This post is mostly about clusters in LCDM, so henceforth all masses will be conventional masses, including the putative dark matter.

Like so many things, there is a long history to this. For example, in the late ’90s, Megan Donahue reported a high temperature of ~ 12 keV for the intracluster gas in the cluster MS1054-0321. This meant that it was massive for its redshift: 7.4 x 1014 h-1 M (dark matter and all) at z = 0.829, when the universe was only about half its current age. (Little h is the Hubble constant in units of 100 km/s/Mpc. Since we’re now pretty sure h < 1, the true mass is higher, more like 1015 M.) That’s a lot of solar masses to assemble in the available time. In 1997, this was another nail in the coffin of SCDM, which was already a zombie theory by then. But the loss of Ωm = 1 was still raw for some people, I guess, because she got a lot of grief for it. Can’t be true! Clusters don’t get that big that early! At least they shouldn’t. In SCDM.

Structure formation in SCDM was elegant in that in continues perpetually: as the universe expands, bigger and bigger structures continue to form; statistically, later epochs look like scaled-up versions of earlier epochs. In LCDM, this symmetry is broken by the decline in density as the universe expands. Consequently, structure forms earlier in LCDM: the action has to happen when there is still some density to work with, and the accelerated expansion provides some extra time (what’s a few billion years among cosmologists?) for mass to get together. Consequently, MS1054-0321 is not problematic in LCDM.

The attitude persisted, however. In the mid-’00s, Jim Schombert and I started using the wide field near-IR camera NEWFIRM to study high redshift clusters. Jim had a clever way of identifying them, which turned out not to be particularly hard, e.g., MS 1426.9+1052 at z = 1.83. This is about 10 Gyr ago, and made the theorists squirm. That didn’t leave enough time for a cluster to form. On multiple occasions I had the following conversation with different theorists:

me: Hey, look at this clusters at z = 1.8.

theorist: That isn’t a cluster.

me: Sure it is. There’s the central galaxy, which contains a bright radio source (QSO). You can see lots of other galaxies around it. That’s what a cluster looks like.

theorist: Must be a chance projection.

me: There are spectra for many of the surrounding galaxies; they’re all at the same redshift.

theorist: …

me: So… a cluster at z = 1.8. Pretty cool, huh?

theorist: That isn’t a cluster.

This work became part of Jay Frank’s thesis. He found evidence for more structure at even higher redshift. A lot of this apparent clustering probably is not real… the statistics get worse as you push farther out: fewer galaxies, worse data. But there were still a surprising number of objects in apparent association up to and beyond z > 5. That’s pretty much all of time, leaving a mere Gyr to go from the completely homogeneous universe that we see in the CMB at z = 1090 to the first stars around z ~ 20 to the first galaxies to big galaxies to protoclusters – or whatever we want to call these associations of many galaxies in the same place on the sky at the same redshift.

Jay did a lot of work to estimate the rate of false positives. Long story short, we expect about 1/3 of the protoclusters he identified to be real structures. That’s both bad and good – lots of chaff, but some wheat too. One thing Jay did was to analyze the Millennium simulation in the same way as the data. This allows us to quantify what we would see if the universe looked like an LCDM simulation.

The plot below shows the characteristic brightness of galaxies at various redshifts. For the pros, this is the knee in the Schechter function fit to the luminosity distribution of galaxies in redshift bins. We saw the same thing in protoclusters and in the field: galaxies were brighter than anticipated in the simulation. Between redshifts 3 < z < 4, the characteristic magnitude is expected to be 23. That’s pretty faint. In the data, it’s more like 21. That’s also faint, but about a factor of 6 brighter than they should be. That’s a lot of stars that have formed before they’re supposed to, in galaxies that are bigger than they should yet be, with some of them already clustering together ahead of their time.

The characteristic magnitude of galaxies in the Spitzer 4.5 micron band as a function of redshift in the Millennium simulation (black squares) and in reality (circles). This is a classic backwards astronomical plot in which larger magnitudes are fainter sources. At high redshift, simulations predict that galaxies should not yet have grown to become as bright as they are observed to be. From Franck (2017).

This has been the observer’s experience. Donahue wasn’t the first, and Franck won’t be the last. Every time we look, we see more structure in place sooner than had been expected before it was seen. I don’t hear people complaining about our clusters at z = 1.8 anymore; those have been seen enough to become normalized. Perhaps they have even been explained satisfactorily. But they sure weren’t expected, much less predicted.

So, just how big can a cluster get? Mortonson et al. (2011) set out to answer this question. The graph below shows the upper limit they predict for the most massive cluster in the universe as a function of redshift. This declines as redshift increases because we’re looking back in time; high redshift clusters haven’t had time time to assemble more mass than the upper most line. They project this into what would be discovered in an all-sky survey, and more realistic surveys of finite size. Basically nothing should exist above these lines.

The predicted maximum mass of galaxy clusters as a function of redshift from Mortonson et al. (2011). Each line is the predicted upper limit for the corresponding amount of sky surveyed. The green line illustrates the area of the sky in which El Gordo was discovered. The points show independent mass estimates for El Gordo from Menanteau et al. (2012) and Jee et al. (2014). These are significantly above the predicted upper limit.

Their prediction was almost immediately put to the test by the discovery of El Gordo, a big fat cluster at z = 0.87 reported by Menanteau et al. (2012), who published the X-ray image above. It is currently the record holder for the most massive known object that is thought to be gravitationally bound, weighing in at 2 or 3 x 1015 M, depending on who you ask. That’s about a thousand Milky Ways, plus a few hundred Andromedas. Give or take.

El Gordo straddles the uppermost line in the graph above. A naive reading of the first mass estimate suggests that it’s roughly a 50/50 proposition whether the entire observable universe should contain exactly one El Gordo. However, El Gordo was discovered in something less than a full sky survey. The appropriate comparison is to the green line, which El Gordo clearly exceeds – by about 3 sigma. This is the case for both of the illustrated mass estimates as the high mass point has a larger error bar. They both exceed the green line by a hair less than 3 sigma. Formally, this means that the chance of finding El Gordo in our universe is only a few percent.

A few percent is not good. Neither is it terrible – I’ve often commented here on how the uncertainties are larger than they seem. This is especially true of the tails of the distribution. So maybe a few percent is pessimistic; sometimes that’s how the dice roll. On the other hand, the odds aren’t better than 10%: El Gordo is not likely to exist however we slice the uncertainties. Whether we should be worried about it is just a matter of how surprising it is. A similar situation arises with the collision velocity of the Bullet cluster, which is either absurdly unlikely (about 1 chance in 10 billion) or merely unusual (maybe 1 in 10). So I made the above plot by adding El Gordo to the predictions of Mortonson et al., and filed it away under


Recently, Elena Asencio, Indranil Banik, and Pavel Kroupa have made a more thorough study. They have their own blog post, so I won’t repeat the technical description. Basically, they sift through a really big LCDM simulation to find objects that could be (or become) like El Gordo.

The short answer is that it doesn’t happen, similar to big voids. They estimate that the odds of El Gordo existing are a bit less than one in a billion. I’m sure one can quibble with details, but we’re not going to save LCDM with factors of two in a probability that starts this low. El Gordo just shouldn’t exist.

The probability is lower than in the graph above because it isn’t just a matter of mass. It is also the mass ratio of the merging clumps (both huge clusters in their own right), their collision speed, impact parameter, and morphology. As they are aware, one must be careful not to demand a perfect match, since there is only one reality. But neither is it just a matter of assembling mass; that understates the severity of the problem. This is where simulations are genuinely helpful: one can ask how often does this happen? If the answer is never, one can refine the query to be more lenient. The bottom line here is that you can’t be lenient enough to get something like El Gordo.

Here is their money plot. To be like El Gordo, an object would have to be up on the red line. That’s well above 5 sigma, which is the threshold where we traditionally stop quibbling about percentiles and just say Nope. Not an accident.

Logarithmic mass as a function of expansion factor [how big the universe is. This is inversely related to redshift: a = 1/(1+z)]. The color scale gives the number density of objects of a given mass as a function of how far the universe has expanded. The solid lines show the corresponding odds (in sigma) of finding such a thing in a large LCDM simulation. Figure from Asencio et al (2020).

In principle, this one object falsifies the LCDM structure formation paradigm. We are reluctant to put too much emphasis on a single object (unless it is the bullet cluster and we have clickbait to sell) as its a big universe, so there can always be one unicorn or magnetic monopole somewhere. Ascencio et al note that a similar constraint follows for the Bullet cluster itself, which also should not exist, albeit at a lower significance. That’s two unicorns: we can’t pretend that this is a one-off occurrence. The joint probability of living in a universe with both El Gordo and the Bullet cluster is even lower than either alone.

Looking at Ascencio’s figure, it strikes me as odd not only that we find huge things at high redshift, but also that we don’t see still bigger objects at low redshift. There were already these huge clusters ramming into each other when the universe had only expanded to half its present size. This process should continue to build still bigger clusters, as indicated by the lines in the plot. The sweet spot for finding really massive clusters should be about z = 0.5, by which time they could have reached a mass of nearly 1016 M as readily (or not!) as El Gordo could reach its mass by its observed redshift. (The lines turn down for the largest expansion factors/lowest redshifts because surveys cover a fixed area on the sky, which is a conical volume in 3D. We reside at the point of the cone, and need to see a ways out before a volume large enough to contain a giant cluster has been covered.)

I have never heard a report of a cluster anywhere near to 1016 M. A big cluster is 1015 M. While multiple examples of clusters this big are known, to the best of my knowledge, El Gordo is the record holding Fat One at twice or thrice that. The nearest challenger I can readily find is RX J1347.5-1145 at z=0.451 (close to the survey sweet spot) weighing in at 2 x 1015 M. Clusters just don’t seem to get bigger than that. This mass is OK at low redshift, but at higher z we shouldn’t see things as big as El Gordo. Given that we do see them at z = 0.87 (a = 0.535), why don’t we see still bigger ones at lower redshift? Perhaps structure formation saturates, but that’s not what LCDM predicts. If we can somehow explain El Gordo at high z, we are implicitly predicting still bigger clusters at lower redshift – objects we have yet to discover, if they exist, which they shouldn’t.

Which is the point.


The image featured at top is an X-ray image of the hot gas in the intracluster medium of El Gordo from NASA/CXC/Rutgers/J. Hughes et al.

Does Newton’s Constant Vary?

Does Newton’s Constant Vary?

This title is an example of what has come to be called Betteridge’s law. This is a relatively recent name for an old phenomenon: if a title is posed as a question, the answer is no. This is especially true in science, whether the authors are conscious of it or not.

Pengfei Li completed his Ph.D. recently, fitting all manner of dark matter halos as well as the radial acceleration relation (RAR) to galaxies in the SPARC database. For the RAR, he found that galaxy data were consistent with a single, universal acceleration scale, g+. There is of course scatter in the data, but this appears to us to be consistent with what we expect from variation in the mass-to-light ratios of stars and the various uncertainties in the data.

This conclusion has been controversial despite being painfully obvious. I have my own law for data interpretation in astronomy:

Obvious results provoke opposition. The more obvious the result, the stronger the opposition.

S. McGaugh, 1997

The constancy of the acceleration scale is such a case. Where we do not believe we can distinguish between galaxies, others think they can – using our own data! Here it is worth contemplating what all is involved in building a database like SPARC – we were the ones who did the work, after all. In the case of the photometry, we observed the galaxies, we reduced the data, we cleaned the images of foreground contaminants (stars), we fit isophotes, we built mass models – that’s a very short version of what we did in order to be able to estimate the acceleration predicted by Newtonian gravity for the observed distribution of stars. That’s one axis of the RAR. The other is the observed acceleration, which comes from rotation curves, which require even more work. I will spare you the work flow; we did some galaxies ourselves, and took others from the literature in full appreciation of what we could and could not believe — which we have a deep appreciation for because we do the same kind of work ourselves. In contrast, the people claiming to find the opposite of what we find obtained the data by downloading it from our website. The only thing they do is the very last step in the analysis, making fits with Bayesian statistics the same as we do, but in manifest ignorance of the process by which the data came to be. This leads to an underappreciation of the uncertainty in the uncertainties.

This is another rule of thumb in science: outside groups are unlikely to discover important things that were overlooked by the group that did the original work. An example from about seven years ago was the putative 126 GeV line in Fermi satellite data. This was thought by some at the time to be evidence for dark matter annihilating into gamma rays with energy corresponding to the rest mass of the dark matter particles and their anti-particles. This would be a remarkable, Nobel-winning discovery, if true. Strange then that the claim was not made by the Fermi team themselves. Did outsiders beat them to the punch with their own data? It can happen: sometimes large collaborations can be slow to move on important results, wanting to vet everything carefully or warring internally over its meaning while outside investigators move more swiftly. But it can also be that the vetting shows that the exciting result is not credible.

I recall the 126 GeV line being a big deal. There was an entire session devoted to it at a conference I was scheduled to attend. Our time is valuable: I can’t go to every interesting conference, and don’t want to spend time on conferences that aren’t interesting. I was skeptical, simply because of the rule of thumb. I wrote the organizers, and asked if they really thought that this would still be a thing by the time the conference happened in few months’ time. Some of them certainly thought so, so it went ahead. As it happened, it wasn’t. Not a single speaker who was scheduled to talk about the 126 GeV line actually did so. In a few short months, if had gone from an exciting result sure to win a Nobel prize to nada.

What 126 GeV line? Did I say that? I don’t recall saying that.

This happens all the time. Science isn’t as simple as a dry table of numbers and error bars. This is especially true in astronomy, where we are observing objects in the sky. It is never possible to do an ideal experiment in which one controls for all possible systematics: the universe is not a closed box in which we can control the conditions. Heck, we don’t even know what all the unknowns are. It is a big friggin’ universe.

The practical consequence of this is that the uncertainty in any astronomical measurement is almost always larger than its formal error bar. There are effects we can quantify and include appropriately in the error assessment. There are things we can not. We know they’re there, but that doesn’t mean we can put a meaningful number on them.

Indeed, the sociology of this has evolved over the course of my career. Back in the day, everybody understood these things, and took the stated errors with a grain of salt. If it was important to estimate the systematic uncertainty, it was common to estimate a wide band, in effect saying “I’m pretty sure it is in this range.” Nowadays, it has become common to split out terms for random and systematic error. This is helpful to the non-specialist, but it can also be misleading because, so stated, the confidence interval on the systematic looks like a 1 sigma error even though it is not likely to have a Gaussian distribution. Being 3 sigma off of the central value might be a lot more likely than this implies — or a lot less.

People have become more careful in making error estimates, which ironically has made matters worse. People seem to think that they can actually believe the error bars. Sometimes you can, but sometimes not. Many people don’t know how much salt to take it with, or realize that they should take it with a grain of salt at all. Worse, more and more folks come over from particle physics where extraordinary accuracy is the norm. They are completely unprepared to cope with astronomical data, or even fully process that the error bars may not be what they think they are. There is no appreciation for the uncertainties in the uncertainties, which is absolutely fundamental in astrophysics.

Consequently, one gets overly credulous analyses. In the case of the RAR, a number of papers have claimed that the acceleration scale isn’t constant. Not even remotely! Why do they make this claim?

Below is a histogram of raw acceleration scales from SPARC galaxies. In effect, they are claiming that they can tell the difference between galaxies in the tail on one side of the histogram from those on the opposite side. We don’t think we can, which is the more conservative claim. The width of the histogram is just the scatter that one expects from astronomical data, so the data are consistent with zero intrinsic scatter. That’s not to say that’s necessarily what Nature is doing: we can never measure zero scatter, so it is always conceivable that there is some intrinsic variation in the characteristic acceleration scale. All we can say is that if is there, it is so small that we cannot yet resolve it.

Histogram of the acceleration scale in individual galaxies g+ relative the characteristic value a0.

Posed as a histogram like this, it is easy to see that there is a characteristic value – the peak – with some scatter around it. The entire issue it whether that scatter is due to real variation from galaxy to galaxy, or if it is just noise. One way to check this is to make quality cuts: in the plot above, the gray-striped histogram plots every available galaxy. The solid blue one makes some mild quality cuts, like knowing the distance to better than 20%. That matters, because the acceleration scale is a quantity that depends on distance – a notoriously difficult quantity to measure accurately in astronomy. When this quality cut is imposed, the width of the histogram shrinks. The better data make a tighter histogram – just as one would expect if the scatter is due to noise. If instead the scatter is a real, physical effect, it should, if anything, be more pronounced in the better data.

This should not be difficult to understand. And yet – other representations of the data give a different impression, like this one:

Best-fit accelerations from Marra et al. (2020).

This figure tells a very different story. The characteristic acceleration does not just scatter around a universal value. There is a clear correlation from one end of the plot to the other. Indeed, it is a perfectly smooth transition, because “Galaxy” is the number of each galaxy ordered by the value of its acceleration, from lowest to highest. The axes are not independent, they represent identically the same quantity. It is a plot of x against x. If properly projected it into a histogram, it would look like the one above.

This is a terrible way to plot data. It makes it look like there is a correlation where there is none. Setting this aside, there is a potential issue with the most discrepant galaxies – those at either extreme. There are more points that are roughly 3 sigma from a constant value than there should be for a sample this size. If this is the right assessment of the uncertainty, then there is indeed some variation from galaxy to galaxy. Not much, but the galaxies at the left hand side of the plot are different from those on the right hand side.

But can we believe the formal uncertainties that inform this error analysis? If you’ve read this far, you will anticipate that the answer to this question obeys Betteridge’s law. No.

One of the reasons we can’t just assign confidence intervals and believe them like a common physicist is that there are other factors in the analysis – nuisance parameters in Bayesian verbiage – with which the acceleration scale covaries. That’s a fancy way of saying that if we turn one knob, it affects another. We assign priors to the nuisance parameters (e.g., the distance to each galaxy and its inclination) based on independent measurements. But there is still some room to slop around. The question is really what to believe at the end of the analysis. We don’t think we can distinguish the acceleration scale from one galaxy to another, but this other analysis says we should. So which is it?

It is easy at this point to devolve into accusations of picking priors to obtain a preconceived result. I don’t think anyone is doing that. But how to show it?

Pengfei had the brilliant idea to perform the same analysis as Marra et al., but allowing Newton’s constant to vary. This is Big G, a universal constant that’s been known to be a constant of nature for centuries. It surely does not vary. However, G appears in our equations, so we can test for variation therein. Pengfei did this, following the same procedure as Mara et al., and finds the same kind of graph – now for G instead of g+.

Best fit values of Newton’s constant from Li et al (2021).

You see here the same kind of trend for Newton’s constant as one sees above for the acceleration scale. The same data have been analyzed in the same way. It has also been plotted in the same way, giving the impression of a correlation where there is none. The result is also the same: if we believe the formal uncertainties, the best-fit G is different for the galaxies at the left than from those to the right.

I’m pretty sure Newton’s constant does not vary this much. I’m entirely sure that the rotation curve data we analyze are not capable of making this determination. It would be absurd to claim so. The same absurdity extends to the acceleration scale g+. If we don’t believe the variation in G, there’s no reason to believe that in g+.


So what is going on here? It boils down to the errors on the rotation curves not representing the uncertainty in the circular velocity as we would like for them to. There are all sorts of reasons for this, observational, physical, and systematic. I’ve written about this at great lengths elsewhere, and I haven’t the patience to do so again here. it is turgidly technical to the extent that even the pros don’t read it. It boils down to the ancient, forgotten wisdom of astronomy: you have to take the errors with a grain of salt.

Here is the cumulative distribution (CDF) of reduced chi squared for the plot above.

Cumulative distribution of reduced chi-squared for different priors on Newton’s constant.

Two things to notice here. First, the CDF looks the same regardless of whether we let Newton’s constant vary or not, or how we assign the Bayesian priors. There’s no value added in letting it vary – just as we found for the characteristic acceleration scale in the first place. Second, the reduced chi squared is rarely close to one. It should be! As a goodness of fit measure, one claims to have a good fit when chi squared equal to one. The majority of these are not good fits! Rather than the gradual slope we see here, the CDF of chi squared should be a nearly straight vertical line. That’s nothing like what we see.

If one interprets this literally, there are many large chi squared values well in excess of unity. These are bad fits, and the model should be rejected. That’s exactly what Rodrigues et al. (2018) found, rejecting the constancy of the acceleration scale at 10 sigma. By their reasoning, we must also reject the constancy of Newton’s constant with the same high confidence. That’s just silly.

One strange thing: the people complaining that the acceleration scale is not constant are only testing that hypothesis. Their presumption is that if the data reject that, it falsifies MOND. The attitude is that this is an automatic win for dark matter. Is it? They don’t bother checking.

We do. We can do the same exercise with dark matter. We find the same result. The CDF looks the same; there are many galaxies with chi squared that is too large.

CDF of rotation curve fits with various types of dark matter halos. None provide a satisfactory fit (as indicated by chi squared) to all galaxies.

Having found the same result for dark matter halos that we found for the RAR, if we apply the same logic, then all proposed model halos are excluded. There are too many bad fits with overly large chi squared.

We have now ruled out all conceivable models. Dark matter is falsified. MOND is falsified. Nothing works. Look on these data, ye mighty, and despair.

But wait! Should we believe the error bars that lead to the end of all things? What would Betteridge say?

Here is the rotation curve of DDO 170 fit with the RAR. Look first at the left box, with the data (points) and the fit (red line). Then look at the fit parameters in the right box.

RAR fit to the rotation curve of DDO 170 (left) with fit parameters at right.

Looking at the left panel, this is a good fit. The line representing the model provides a reasonable depiction of the data.

Looking at the right panel, this is a terrible fit. The reduced chi squared is 4.9. That’s a lot larger than one! The model is rejected with high confidence.

Well, which is it? Lots of people fall into the trap of blindly trusting statistical tests like chi squared. Statistics can only help your brain. They can’t replace it. Trust your eye-brain. This is a good fit. Chi squared is overly large not because this is a bad model but because the error bars are too small. The absolute amount by which the data “miss” is just a few km/s. This is not much by the standards of galaxies, and could easily be explained by a small departure of the tracer from a purely circular orbit – a physical effect we expect at that level. Or it could simply be that the errors are underestimated. Either way, it isn’t a big deal. It would be incredibly naive to take chi squared at face value.

If you want to see a dozen plots like this for all the various models fit to each of over a hundred galaxies, see Li et al. (2020). The bottom line is always the same. The same galaxies are poorly fit by any model — dark matter or MOND. Chi squared is too big not because all conceivable models are wrong, but because the formal errors are underestimated in many cases.

This comes as no surprise to anyone with experience working with astronomical data. We can work to improve the data and the error estimation – see, for example, Sellwood et al (2021). But we can’t blindly turn the crank on some statistical black box and expect all the secrets of the universe to tumble out onto a silver platter for our delectation. There’s a little more to it than that.