The Deuterium-Lithium tension in Big Bang Nucleosynthesis

There are many tensions in the era of precision cosmology. The most prominent, at present, is the Hubble tension – the difference between traditional measurements, which consistently obtain H₀ = 73 km/s/Mpc, and best fit* to the acoustic power spectrum of the cosmic microwave background (CMB) observed by Planck, H₀ = 67 km/s/Mpc. There are others of varying severity that are less widely discussed. In this post, I want to talk about a persistent tension in the baryon density implied by the measured primordial abundances of deuterium and lithium⁺. Unlike the tension in H₀, this problem is not nearly as widely discussed as it should be.

Framing

Part of the reason that this problem is not seen as an important tension has to do with the way in which it is commonly framed. In most discussions, it is simply the primordial lithium problem. Deuterium agrees with the CMB, so those must be right and lithium must be wrong. Once framed that way, it becomes a trivial matter specific to one untrustworthy (to cosmologists) observation. It’s a problem for specialists to sort out what went wrong with lithium: the “right” answer is otherwise known, so this tension is not real, making it unworthy of wider discussion. However, as we shall see, this might not be the right way to look at it.

It’s a bit like calling the acceleration discrepancy the dark matter problem. Once we frame it this way, it biases how we see the entire problem. Solving this problem becomes a matter of finding the dark matter. It precludes consideration of the logical possibility that the observed discrepancies occur because the force law changes on the relevant scales. This is the mental block I struggled mightily with when MOND first cropped up in my data; this experience makes it easy to see when other scientists succumb to it sans struggle.

Big Bang Nucleosynthesis (BBN)

I’ve talked about the cosmic baryon density here a lot, but I’ve never given an overview of BBN itself. That’s because it is well-established, and has been for a long time – I assume you, the reader, already know about it or are competent to look it up. There are many good resources for that, so I’ll only give enough of a sketch necessary to the subsequent narrative – a sketch that will be both too little for the experts and too much for the subsequent narrative that most experts are unaware of.

Primordial nucleosynthesis occurs in the first few minutes after the Big Bang when the universe is the right temperature and density to be one big fusion reactor. The protons and available neutrons fuse to form helium and other isotopes of the light elements. Neutrons are slightly more massive and less numerous than protons to begin with. In addition, free neutrons decay with a half-life of roughly ten minutes, so are outnumbered by protons when nucleosynthesis happens. The vast majority of the available neutrons pair up with protons and wind up in ⁴He while most of the protons remain on their own as the most common isotope of hydrogen, ¹H. The resulting abundance ratio is one alpha particle for every dozen protons, or in terms of mass fractions^&, X_p = 3/4 hydrogen and Y_p = 1/4 helium. That is the basic composition with which the universe starts; heavy elements are produced subsequently in stars and supernova explosions.

Though ¹H and ⁴He are by far the most common products of BBN, there are traces of other isotopes that emerge from BBN:

The time evolution of the relative numbers of light element isotopes through BBN. As the universe expands, nuclear reactions “freeze-out” and establish primordial abundances for the indicated species. The precise outcome depends on the baryon density, Ω_b. This plot illustrates a particular choice of *Ω_b*; different *Ω_b* result in observationally distinguishable abundances. (Figures like this are so ubiquitous in discussions of the early universe that I have not been able to identify the original citation for this particular version.)

After hydrogen and helium, the next most common isotope to emerge from BBN is deuterium, ²H. It is the first thing made (one proton plus one neutron) but most of it gets processed into ⁴He, so after a brief peak, its abundance declines. How much it declines is very sensitive to Ω_b: the higher the baryon density, the more deuterium gets gobbled up by helium before freeze-out. The following figure illustrates how the abundance of each isotope depends on Ω_b:

“Schramm diagram” adopted from Cyburt et al (2003) showing the abundance of ⁴He by mass fraction (top) and the number relative to hydrogen of deuterium (D = ²H), helium-3, and lithium as a function of the baryon-to-photon ratio. We measure the photon density in the CMB, so this translates directly to the baryon density^$ *Ω_b*h² (top axis).

If we can go out and measure the primordial abundances of these various isotopes, we can constrain the baryon density.

The Baryon Density

It works! Each isotope provides an independent estimate of Ω_bh², and they agree pretty well. This was the first and for a long time the only over-constrained quantity in cosmology. So while I am going to quibble about the exact value of Ω_bh², I don’t doubt that the basic picture is correct. There are too many details we have to get right in the complex nuclear reaction chains coupled to the decreasing temperature of a universe expanding at the rate required during radiation domination for this to be an accident. It is an exquisite success of the standard Hot Big Bang cosmology, albeit not one specific to LCDM.

Getting at primordial, rather than current, abundances is an interesting observational challenge too involved to go into much detail here. Suffice it to say that it can be done, albeit to varying degrees of satisfaction. We can then compare the measured abundances to the theoretical BBN abundance predictions to infer the baryon density.

The Schramm diagram with measured abundances (orange boxes) for the isotopes of the light elements. The thickness of the box illustrates the uncertainty: tiny for deuterium and large for *⁴He* because of the large zoom on the axis scale. The lithium abundance could correspond to either low or high baryon density. ³He is omitted because its uncertainty is too large to provide a useful constraint.

Deuterium is considered the best baryometer because its relic abundance is very sensitive to Ω_bh²: a small change in baryon density corresponds to a large change in D/H. In contrast, ⁴He is a great confirmation of the basic picture – the primordial mass fraction has to come in very close to 1/4 – but the precise value is not very sensitive to Ω_bh². Most of the neutrons end up in helium no matter what, so it is hard to distinguish^# a few more from a few less. (Note the huge zoom on the linear scale for ⁴He. If we plotted it logarithmically with decades of range as we do the other isotopes, it would be a nearly flat line.) Lithium is annoying for being double-valued right around the interesting baryon density so that the observed lithium abundance can correspond to two values of Ω_bh². This behavior stems from the trade off with ⁷Be which is produced at a higher rate but decays to ⁷Li after a few months. For this discussion the double-valued ambiguity of lithium doesn’t matter, as the problem is that the deuterium abundance indicates Ω_bh² that is even higher than the higher branch of lithium.

BBN pre-CMB

The diagrams above and below show the situation in the 1990s before CMB estimates became available. Consideration of all the available data in the review of Walker et al. led to the value Ω_bh² = 0.0125 ± 0.0025. This value** was so famous that it was Known. It formed the basis of my predictions for the CMB for both LCDM and no-CDM. This prediction hinged on BBN being correct, and that we understood the experimental bounds on the baryon density. A few years after Walker’s work, Copi et al. provided the estimate⁺⁺ 0.009 < Ω_bh² < 0.02. Those were the extreme limits of the time, as illustrated by the green box below:

The baryon density as it was known before detailed observations of the acoustic power spectrum of the CMB. BBN was a mature subject before 1990; the massive reviews of Walker et al. and Copi et al. creak with the authority of a solved problem. The controversial tension at the time was between the high and low deuterium measurements from Hogan and Tytler, which were at the extreme ends of the ranges indicated by the bulk of the data in the reviews.

Up until this point, the constraints on BBN had come mostly from helium observations in nearby galaxies and lithium measurements in metal poor stars. It was only just then becoming possible to obtain high quality spectra of sufficiently high redshift quasars to see weak deuterium lines associated with strongly damped primary hydrogen absorption in intergalactic gas along the line of sight. This is great: deuterium is the most sensitive baryometer, the redshifts were high enough to be early in the history of the universe close to primordial times, and the gas was in the middle of intergalactic nowhere so shouldn’t be altered by astrophysical processes. These are ideal conditions, at least in principle.

First results were binary. Craig Hogan obtained a high deuterium abundance, corresponding to a low baryon density. Really low. From my Walker et al.-informed confirmation bias, too low. It was a a brand new result, so promising but probably wrong. Then Tytler and his collaborators came up with the opposite result: low deuterium abundance corresponding to a high baryon density: Ω_bh² = 0.019 ± 0.001. That seemed pretty high at the time, but at least it was within the bound Ω_bh² < 0.02 set by Copi et al. There was a debate between these high/low deuterium camps that ended in a rare act of intellectual honesty by a cosmologist when Hogan^&& conceded. We seemed to have settled on the high-end of the allowed range, just under Ω_bh² = 0.02.

Enter the CMB

CMB data started to be useful for constraining the baryon density in 2000 and improved rapidly. By that point, LCDM was already well-established, and I had published predictions for both LCDM and no-CDM. In the absences of cold dark matter, one expects a damping spectrum, with each peak lower than the one before it. For the narrow (factor of two) Known range of possible baryon densities, all the no-CDM models run together to essentially the same first-to-second peak ratio.

Peak locations measured by WMAP in 2003 (points) compared to the a priori (1999) predictions of LCDM (red tone lines) and no-CDM (blue tone lines). Models are normalized in amplitude around the first peak.

Adding CDM into the mix adds a driver to the oscillations. This fights the baryonic damping: the CDM is like a parent pushing a swing while the baryons are the kid dragging his feet. This combination makes just about any pattern of peaks possible. Not all free parameters are made equal: the addition of a single free parameter, Ω_CDM, makes it possible to fit any plausible pattern of peaks. Without it (no-CDM means Ω_CDM = 0), only the damping spectrum is allowed.

For BBN as it was known at the time, the clear difference was in the relative amplitude^$$ of the first and second peaks. As can be seen above, the prediction for no-CDM was correct and that for LCDM was not. So we were done, right?

Of course not. To the CMB community, the only thing that mattered was the fit to the CMB power spectrum, not some obscure prediction based on BBN. Whatever the fit said was True; too bad for BBN if it didn’t agree.

The way to fit the unexpectedly small^## second peak was to crank up the baryon density. To do that, Tegmark & Zaldarriaga (2000) needed 0.022 < Ω_bh² < 0.040. That’s what the first blue point below. This was the first time that I heard it suggested that the baryon density could be so high.

The baryon density from deuterium (red triangles) before and after (dotted vertical line) estimates from the CMB (blue points). The horizontal dotted line is the pre-CMB upper limit of *Copi et al.*

The astute reader will note that the CMB-fit 0.022 < Ω_bh² < 0.040 sits entirely outside the BBN bounds 0.009 < Ω_bh² < 0.02. So we’re done, right? Well, no – the community simply ignored the successful a priori prediction of the no-CDM scenario. That was certainly easier than wrestling with its implications, and no one seems to have paused to contemplate why the observed peak ratio came in exactly at the one unique value that it could obtain in the case of no-CDM.

For a few years, the attitude seemed to be that BBN was close but not quite right. As the CMB data improved, the baryon density came down, ultimately settling on Ω_bh² = 0.0224 ± 0.0001. Part of the reason for this decline from the high initial estimate is covariance. In this case, the tilt plays a role: the baryon density declined as n_s = 1 → 0.965 ± 0.004. Getting the second peak amplitude right takes a combination of both.

Now we’re back in the ballpark, almost: Ω_bh² = 0.0224 is not ridiculously far above the BBN limit Ω_bh² < 0.02. Close enough for Spergel et al. (2003) to say “The remarkable agreement between the baryon density inferred from D/H values and our [WMAP] measurements is an important triumph for the basic big bang model.” This was certainly true given the size of the error bars on both deuterium and the CMB at the time. It also elides^*** any mention of either helium or lithium or the fact that the new Known was not consistent with the previous Known. Ω_bh² = 0.0224 was always the ally; Ω_bh² = 0.0125 was always the enemy.

Note, however, that deuterium made a leap from below Ω_bh² = 0.02 to above 0.02 exactly when the CMB indicated that it should do so. They iterated to better agreement and pretty much stayed there. Hopefully that is the correct answer, but given the history of the field, I can’t help worrying about confirmation bias. I don’t know if that is what’s going on, but if it were, this convergence over time is what it would look like.

Lithium does not concur

Taking the deuterium results at face value, there really is excellent agreement with the LCDM fit to the CMB, so I have some sympathy for the desire to stop there. Deuterium is the best baryometer, after all. Helium is hard to get right at a precise enough level to provide a comparable constraint, and lithium, well, lithium is measured in stars. Stars are tiny, much smaller than galaxies, and we know those are too puny to simulate.

Spite & Spite (1982) [those are names, pronounced “speet”; we’re not talking about spiteful stars] discovered what is now known as the Spite plateau, a level of constant lithium abundance in metal poor stars, apparently indicative of the primordial lithium abundance. Lithium is a fragile nucleus; it can be destroyed in stellar interiors. It can also be formed as the fragmentation product of cosmic ray collisions with heavier nuclei. Both of these things go on in nature, making some people distrustful of any lithium abundance. However, the Spite plateau is a sort of safe zone where neither effect appears to dominate. The abundance of lithium observed there is indeed very much in the right ballpark to be a primordial abundance, so that’s the most obvious interpretation.

Lithium indicates a lowish baryon density. Modern estimates are in the same range as BBN of old; they have not varied systematically with time. There is no tension between lithium and pre-CMB deuterium, but it disagrees with LCDM fits to the CMB and with post-CMB deuterium. This tension is both persistent and statistically significant (Fields 2011 describes it as “4–5σ”).

*The baryon density from lithium (yellow symbols) over time. Stars are measurements in groups of stars on the Spite plateau; the square represents the approximate value from the ISM of the SMC.*

I’ve seen many models that attempt to fix the lithium abundance, e.g., by invoking enhanced convective mixing via <<mumble mumble>> so that lithium on the surface of stars is subject to destruction deep in the stellar interior in a previously unexpected way. This isn’t exactly satisfactory – it should result in a mess, not a well-defined plateau – and other attempts I’ve seen to explain away the problem do so with at least as much contrivance. All of these models appeared after lithium became a problem; they’re clearly motivated by the assumption bias that the CMB is correct so the discrepancy is specific to lithium so there must be something weird about stars that explains it.

Another way to illustrate the tension is to use Ω_bh² from the Planck fit to predict what the primordial lithium abundance should be. The Planck-predicted band is clearly higher than and offset from the stars of the Spite plateau. There should be a plateau, sure, but it’s in the wrong place.

The lithium abundance in metal poor stars (points), the interstellar medium of the Small Magellanic Cloud (green band), and the primordial lithium abundance expected for the best-fit Planck LCDM. For reference, *[Fe/H] = -3* means an iron abundance that is one one-thousandth that of the sun.

An important recent observation is that a similar lithium abundance is obtained in the metal poor interstellar gas of the Small Magellanic Cloud. That would seem to obviate any explanation based on stellar physics.

The Schramm diagram with the Planck CMB-LCDM value added (vertical line). This agrees well with deuterium measurements made after CMB data became available, but not with those before, nor with the measured abundance of lithium.

We can also illustrate the tension on the Schramm diagram. This version adds the best-fit CMB value and the modern deuterium abundance. These are indeed in excellent agreement, but they don’t intersect with lithium. The deuterium-lithium tension appears to be real, and comparable in significance to the H₀ tension.

So what’s the answer?

I don’t know. The logical options are

A systematic error in the primordial lithium abundance
A systematic error in the primordial deuterium abundance
Physics beyond standard BBN

I don’t like any of these solutions. The data for both lithium and deuterium are what they are. As astronomical observations, both are subject to the potential for systematic errors and/or physical effects that complicate their interpretation. I am also extremely reluctant to consider modifications to BBN. There are occasional suggestions to this effect, but it is a lot easier to break than it is to fix, especially for what is a fairly small disagreement in the absolute value of Ω_bh².

I have left the CMB off the list because it isn’t part of BBN: it’s constraint on the baryon density is real, but involves completely different physics. It also involves different assumptions, i.e., the LCDM model and all its invisible baggage, while BBN is just what happens to ordinary nucleons during radiation domination in the early universe. CMB fits are corroborative of deuterium only if we assume LCDM, which I am not inclined to accept: deuterium disagreed with the subsequent CMB data before it agreed. Whether that’s just progress or a sign of confirmation bias, I also don’t know. But I do know confirmation bias has bedeviled the history of cosmology, and as the H0 debate shows, we clearly have not outgrown it.

The appearance of confirmation bias is augmented by the response time of each measured elemental abundance. Deuterium is measured using high redshift quasars; the community that does that work is necessarily tightly coupled to cosmology. It’s response was practically instantaneous: as soon as the CMB suggested that the baryon density needed to be higher, conforming D/H measurements appeared. Indeed, I recall when that first high red triangle appeared in the literature, a colleague snarked to me “we can do that too!” In those days, those of us who had been paying attention were all shocked at how quickly Ω_bh² = 0.0125 ± 0.0025 was abandoned for literally double that value, Ω_Bh² = 0.025 ± 0.001. That’s 4.6 sigma for those keeping score.

The primordial helium abundance is measured in nearby dwarf galaxies. That community is aware of cosmology, but not as strongly coupled to it. Estimates of the primordial helium abundance have drifted upwards over time, corresponding to higher implied baryon densities. It’s as if confirmation bias is driving things towards the same result, but on a timescale that depends on the sociological pressure of the CMB imperative.

**Fig. 8** from Steigman (2012) *showing the history of primordial helium mass fraction (Y_P) determinations as a function of time.*

I am not accusing anyone of trying to obtain a particular result. Confirmation bias can be a lot more subtle than that. There is an entire field of study of it in psychology. We “humans actively sample evidence to support prior beliefs” – none of us are immune to it.

In this case, how we sample evidence depends on the field we’re active in. Lithium is measured in stars. One can have a productive career in stellar physics while entirely ignoring cosmology; it is the least likely to be perturbed by edicts from the CMB community. The inferred primordial lithium abundance has not budged over time.

What’s your confirmation bias?

I try not to succumb to confirmation bias, but I know that’s impossible. The best I can do is change my mind when confronted with new evidence. This is why I went from being sure that non-baryonic dark matter had to exist to taking seriously MOND as the theory that predicted what I observed.

I do try to look at things from all perspectives. Here, the CMB has been a roller coaster. Putting on an LCDM hat, the location of the first peak came in exactly where it was predicted: this was strong corroboration of a flat FLRW geometry. What does it mean in MOND? No idea – MOND doesn’t make a prediction about that. The amplitude of the second peak came in precisely as predicted for the case of no-CDM. This was corroboration of the ansatz inspired by MOND, and the strongest possible CMB-based hint that we might be barking up the wrong tree with LCDM.

As an exercise, I went back and maxed out the baryon density as it was known before the second peak was observed. We already thought we knew LCDM parameters well enough to do this. We couldn’t. The amplitude of the second peak came as a huge surprise to LCDM; everyone acknowledged that at the time (if pressed; many simply ignored it). Nowadays this is forgotten, or people have gaslit themselves into believing this was expected all along. It was not.

**Fig. 45** from Famaey & McGaugh (2012): *WMAP data are shown with the a priori prediction of no-CDM (blue line) and the* *most favorable *prediction* *that could have been made ahead of time for* LCDM (red line).*

From the perspective of no-CDM, we don’t really care whether deuterium or lithium hits closer to the right baryon density. All plausible baryon densities predict essentially the same A_1:2 amplitude ratio. Once we admit CDM as a possibility, then the second peak amplitude becomes very sensitive to the mix of CDM and baryons. From this perspective, the lithium-indicated baryon density is unacceptable. That’s why it is important to have a test that is independent of the CMB. Both deuterium and lithium provide that, but they disagree about the answer.

Once we broke BBN to fit the second peak in LCDM, we were admitting (if not to ourselves) that the a priori prediction of LCDM had failed. Everything after that is a fitting exercise. There are enough free parameters in LCDM to fit any plausible power spectrum. Cosmologists are fond of saying there are thousands of independent multipoles, but that overstates the case: it doesn’t matter how finely we sample the wave pattern, it matters what the wave pattern is. That is not as over-constrained as it is made to sound. LCDM is, nevertheless, an excellent fit to the CMB data; the test then is whether the parameters of this fit are consistent with independent measurements. It was until it wasn’t; that’s why we face all these tensions now.

Despite the success of the prediction of the second peak, no-CDM gets the third peak wrong. It does so in a way that is impossible to fix short of invoking new physics. We knew that had to happen at some level; empirically that level occurs at L = 600. After that, it becomes a fitting exercise, just as it is in LCDM – only now, one has to invent a new theory of gravity in which to make the fit. That seems like a lot to ask, so while it remained as a logical possibility, LCDM seemed the more plausible explanation for the CMB if not dynamical data. From this perspective, that A_1:2 came out bang on the value predicted by no-CDM must just be one heck of a cosmic fluke. That’s easy to accept if you were unaware of the prediction or scornful of its motivation; less so if you were the one who made it.

Either way, the CMB is now beyond our ability to predict. It has become a fitting exercise, the chief issue being what paradigm in which to fit it. In LCDM, the fit follows easily enough; the question is whether the result agrees with other data: are these tensions mere hiccups in the great tradition of observational cosmology? Or are they real, demanding some new physics?

The widespread attitude among cosmologists is that it will be impossible to fit the CMB in any way other than LCDM. That is a comforting thought (it has to be CDM!) and for a long time seemed reasonable. However, it has been contradicted by the success of Skordis & Zlosnik (2021) using AeST, which can fit the CMB as well as LCDM.

*CMB power spectrum observed by Planck fit by AeST (Skordis & Zlosnik 2021).*

AeST is a very important demonstration that one does not need dark matter to fit the CMB. One does need other fields⁺⁺⁺, so now the reality of those have to be examined. Where this show stops, nobody knows.

I’ll close by noting that the uniqueness claimed by the LCDM fit to the CMB is a property more correctly attributed to MOND in galaxies. It is less obvious that this is true because it is always possible to fit a dark matter model to data once presented with the data. That’s not science, that’s fitting French curves. To succeed, a dark matter model must “look like” MOND. It obviously shouldn’t do that, so modelers refuse to go there, and we continue to spin our wheels and dig the rut of our field deeper.

Note added in proof, as it were: I’ve been meaning to write about this subject for a long time, but hadn’t, in part because I knew it would be long and arduous. Being deeply interested in the subject, I had to slap myself repeatedly to refrain from spending even more time updating the plots with publication date as an axis: nothing has changed, so that would serve only to feed my OCD. Even so, it has taken a long time to write, which I mention because I had completed the vast majority of this post before the IAU announced on May 15 that Cooke & Pettini have been awarded the Gruber prize for their precision deuterium abundance. This is excellent work (it is one of the deuterium points in the relevant plot above), and I’m glad to see this kind of hard, real-astronomy work recognized.

The award of a prize is a recognition of meritorious work but is not a guarantee that it is correct. So this does not alter any of the concerns that I express here, concerns that I’ve expressed for a long time. It does make my OCD feels obliged to comment at least a little on the relevant observations, which is itself considerably involved, but I will tack on some brief discussion below, after the footnotes.

*These methods were in agreement before they were in tension, e.g., Spergel et al. (2003) state: “The agreement between the HST Key Project value and our [WMAP CMB] value, h = 0.72 ±0.05, is striking, given that the two methods rely on different observables, different underlying physics, and different model assumptions.”

⁺Here I mean the abundance of the primary isotope of lithium, ⁷Li. There is a different problem involving the apparent overabundance of ⁶Li. I’m not talking about that here; I’m talking about the different baryon densities inferred separately from the abundances of D/H and ⁷Li/H.

^&By convention, X, Y, and Z are the mass fractions of hydrogen, helium, and everything else. Since the universe starts from a primordial abundance of X_p = 3/4 and Y_p = 1/4, and stars are seen to have approximately that composition plus a small sprinkling of everything else (for the sun, Z ≈ 0.02), and since iron lines are commonly measured in stars to trace Z, astronomers fell into the habit of calling Z the metallicity even though oxygen is the third most common element in the universe today (by both number and mass). Since everything in the periodic table that isn’t hydrogen and helium is a small fraction of the mass, all the heavier elements are often referred to collectively as metals despite the unintentional offense to chemistry.

^$The factor of h² appears because of the definition of the critical density ρ_c = (3H₀²)/(8πG): Ω_b = ρ_b/ρ_c. The physics cares about the actual density ρ_b but Ω_bh² = 0.02 is a lot more convenient to write than ρ_b,now = 3.75 x 10^-31 g/cm³.

^#I’ve worked on helium myself, but was never able to do better than Y_p = 0.25 ± 0.01. This corroborates the basic BBN picture, but does not suffice as a precise measure of the baryon density. To do that, one must obtain a result accurate to the third place of decimals, as discussed in the exquisite works of Kris Davidson, Bernie Pagel, Evan Skillman, and their collaborators. It’s hard to do for both observational reasons and because a wealth of subtle atomic physics effects come into play at that level of precision – helium has multiple lines; their parent population levels depend on the ionization mechanism, the plasma temperature, its density, and fluorescence effects as well as abundance.

**The value reported by Walker et al. was phrased as Ω_bh₅₀² = 0.05 ± 0.01, where h₅₀ = H₀/(50 km/s/Mpc); translating this to the more conventional h = H₀/(100 km/s/Mpc) decreases these numbers by a factor of four and leads to the impression of more significant digits than were claimed. It is interesting to consider the psychological effect of this numerology. For example, the modern CMB best-fit value in this phrasing is Ω_bh₅₀² = 0.09, four sigma higher than the value Known from the combined assessment of the light isotope abundances. That seems like a tension – not just involving lithium, but the CMB vs. all of BBN. Amusingly, the higher baryon density needed to obtain a CMB fit assuming LCDM is close to the threshold where we might have gotten away without the dynamical need (Ω_m > Ω_b) for non-baryonic dark matter that motivated non-baryonic dark matter in the first place. (For further perspective at a critical juncture in the development of the field, see Peebles 1999).

The use of h₅₀ itself is an example of the confirmation bias I’ve mentioned before as prevalent at the time, that Ω_m = 1 and H₀ = 50 km/s/Mpc. I would love to be able to do the experiment of sending the older cosmologists who are now certain of LCDM back in time to share the news with their younger selves who were then equally certain of SCDM. I suspect their younger selves would ask their older selves at what age they went insane, if they didn’t simply beat themselves up.

⁺⁺Craig Copi is a colleague here at CWRU, so I’ve asked him about the history of this. He seemed almost apologetic, since the current “right” baryon density from the CMB now is higher than his upper limit, but that’s what the data said at the time. The CMB gives a more accurate value only once you assume LCDM, so perhaps BBN was correct in the first place.

^&&Or succumbed to peer pressure, as that does happen. I didn’t witness it myself, so don’t know.

^$$The absolute amplitude of the no-CDM model is too high in a transparent universe. Part of the prediction of MOND is that reionization happens early, causing the universe to be a tiny bit opaque. This combination came out just right for τ = 0.17, which was the original WMAP measurement. It also happens to be consistent with the EDGES cosmic dawn signal and the growing body of evidence from JWST.

^##The second peak was unexpectedly small from the perspective of CDM; it was both natural and expected in no-CDM. At the time, it was computationally expensive to calculate power spectra, so people had pre-computed coarse grids within which to hunt for best fits. The range covered by the grids was informed by extant knowledge, of which BBN was only one element. From a dynamical perspective, Ω_m > 0.2 was adopted as a hard limit that imposed an edge in the grids of the time. There was no possibility of finding no-CDM as the best fit because it had been excluded as a possibility from the start.

***Spergel et al. (2003) also say “the best-fit Ω_bh² value for our fits is relatively insensitive to cosmological model and dataset combination as it depends primarily on the ratio of the first to second peak heights (Page et al. 2003b)” which is of course the basis of the prediction I made using the baryon density as it was Known at the time. They make no attempt to test that prediction, nor do they cite it.

⁺⁺⁺I’ve heard some people assert that this is dark matter by a different name, so is a success of the traditional dark matter picture rather than of modified gravity. That’s not at all correct. It’s just stage three in the list of reactions to surprising results identified by Louis Agassiz.

All of the figures below are from Cooke & Pettini (2018), which I employ here to briefly illustrate how D/H is measured. This is the level of detail I didn’t want to get into for either deuterium or helium or lithium, which are comparably involved.

First, here is a spectrum of the quasar they observe, Q1243+307. The quasar itself is not the object of interest here, though quasars are certainly interesting! Instead, we’re looking at the absorption lines along the line of sight; the quasar is being used as a spotlight to illuminate the gas between it and us.

**Figure 1.** Final combined and flux-calibrated spectrum of Q1243+307 (black histogram) shown with the corresponding error spectrum (blue histogram) and zero level (green dashed line). The red tick marks above the spectrum indicate the locations of the Lyman series absorption lines of the sub-DLA at redshift z_abs = 2.52564. Note the exquisite signal-to-noise ratio (S/N) of the combined spectrum, which varies from S/N ≃ 80 near the Lyα absorption line of the sub-DLA (∼4300 Å) to S/N ≃ 25 at the Lyman limit of the sub-DLA, near 3215 Å in the observed frame.

The big hump around 4330 Å is Lyman α emission from the quasar itself. Lyα is the n = 2 to 1 transition of hydrogen, Lyβ is the n = 3 to 1 transition, and so on. The rest frame wavelength of Lyα is far into the ultraviolet at 1216 Å; we see it redshifted to z = 2.558. The rest of the spectrum is continuum and emission lines from the quasar with absorption lines from stuff along the line of sight. Note that the red end of the spectrum at wavelengths longer than 4400 Å is mostly smooth with only the occasional absorption line. Blueward of 4300 Å, there is a huge jumble. This is not noise, this is the Lyα forest. Each of those lines is absorption from hydrogen in clouds at different distances, hence different redshifts, along the line of sight.

Most of the clouds in the Lyα forest are ephemeral. The cross section for Lyα is huge so It takes very little hydrogen to gobble it up. Most of these lines represent very low column densities of neutral hydrogen gas. Once in a while though, one encounters a higher column density cloud that has enough hydrogen to be completely opaque to Lyα. These are damped Lyα systems. In damped systems, one can often spot the higher order Lyman lines (these are marked in red in the figure). It also means that there is enough hydrogen present to have a shot at detecting the slightly shifted version of Lyα of deuterium. This is where the abundance ratio D/H is measured.

To measure D/H, one has not only to detect the lines, but also to model and subtract the continuum. This is a tricky business in the best of times, but here its importance is magnified by the huge difference between the primary Lyα line which is so strong that it is completely black and the deuterium Lyα line which is incredibly weak. A small error in the continuum placement will not matter to the measurement of the absorption by the primary line, but it could make a huge difference to that of the weak line. I won’t even venture to discuss the nonlinear difference between these limits due to the curve of growth.

**Figure 2.** Lyα profile of the absorption system at *z_abs = 2.52564* toward the quasar Q1243+307 (black histogram) overlaid with the best-fitting model profile (red line), continuum (long dashed blue line), and zero-level (short dashed green line). The top panels show the raw, extracted counts scaled to the maximum value of the best-fitting continuum model. The bottom panels show the continuum normalized flux spectrum. The label provided in the top left corner of every panel indicates the source of the data. The blue points below each spectrum show the normalized fit residuals, (data–model)/error, of all pixels used in the analysis, and the gray band represents a confidence interval of ±2σ. The S/N is comparable between the two data sets at this wavelength range, but it is markedly different near the high order Lyman series lines (see Figures 4 and 5). The red tick marks above the spectra in the bottom panels show the absorption components associated with the main gas cloud (Components 2, 3, 4, 5, 6, 8, and 10 in Table 2), while the blue tick marks indicate the fitted blends. Note that some blends are also detected in Lyβ–Lyε.

The above examples look pretty good. The authors make the necessary correction for the varying spectral sensitivity of the instrument, and take great care to simultaneously fit the emission of the quasar and the absorption. I don’t think they’ve done anything wrong; indeed, it looks like they did everything right – just as the people measuring lithium in stars have.

Still, as an experienced spectroscopist, there are some subtle details that make me queasy. There are two independent observations, which is awesome, and the data look almost exactly the same, a triumph of repeatability. The fitted models are nearly identical, but if you look closely, you can see the model cuts slightly differently along the left edge of the damped absorption around 4278 Å in the two versions of the spectrum, and again along the continuum towards the right edge.

These differences are small, so hopefully don’t matter. But what is the continuum, really? The model line goes through the data, because what else could one possibly do? But there is so much Lyα absorption, is that really continuum? Should the continuum perhaps trace the upper envelope of the data? A physical effect that I worry about is that weak Lyα is so ubiquitous, we never see the true continuum but rather continuum minus a tiny bit of extraordinarily weak (Gunn-Peterson) absorption. If the true continuum from the quasar is just a little higher, then the primary hydrogen absorption is unaffected but the weak deuterium absorption would go up a little. That means slightly higher D/H, which means lower Ω_bh², which is the direction in which the measurement would need to move to come into closer agreement with lithium.

Is the D/H measurement in error? I don’t know. I certainly hope not, and I see no reason to think it is. I do worry that it could be. The continuum level is one thing that could go wrong; there are others. My point is merely that we shouldn’t assume it has to be lithium that is in error.

An important check is whether the measured D/H ratio depends on metallicity or column density. It does not. There is no variation with metallicity as measured by the logarithmic oxygen abundance relative to solar (left panel below). Nor does it appear to depend on the amount of hydrogen in the absorbing cloud (right panel). In the early days of this kind of work there appeared to be a correlation, raising the specter of a systematic. That is not indicated here.

**Figure 6.** Our sample of seven high precision D/H measures (symbols with error bars); the green symbol represents the new measure that we report here. The weighted mean value of these seven measures is shown by the red dashed and dotted lines, which represent the 68% and 95% confidence levels, respectively. The left and right panels show the dependence of D/H on the oxygen abundance and neutral hydrogen column density, respectively. Assuming the Standard Model of cosmology and particle physics, the right vertical axis of each panel shows the conversion from D/H to the universal baryon density. This conversion uses the Marcucci et al. (2016) theoretical determination of the d(p,γ)³He cross-section. The dark and light shaded bands correspond to the 68% and 95% confidence bounds on the baryon density derived from the CMB (Planck Collaboration et al. 2016).

I’ll close by noting that Ω_bh² from this D/H measurement is indeed in very good agreement with the best-fit Planck CMB value. The question remains whether the physics assumed by that fit, baryons+non-baryonic cold dark mater+dark energy in a strictly FLRW cosmology, is the correct assumption to make.

On the timescale for galaxy formation

I’ve been wanting to expand on the previous post ever since I wrote it, which is over a month ago now. It has been a busy end to the semester. Plus, there’s a lot to say – nothing that hasn’t been said before, somewhere, somehow, yet still a lot to cobble together into a coherent story – if that’s even possible. This will be a long post, and there will be more after to narrate the story of our big paper in the ApJ. My sole ambition here is to express the predictions of galaxy formation theory in LCDM and MOND in the broadest strokes.

A theory is only as good as its prior. We can always fudge things after the fact, so what matters most is what we predict in advance. What do we expect for the timescale of galaxy formation? To tell you what I’m going to tell you, it takes a long time to build a massive galaxy in LCDM, but it happens much faster in MOND.

Basic Considerations

What does it take to make a galaxy? A typical giant elliptical galaxy has a stellar mass of 9 x 10¹⁰ M_☉. That’s a bit more than our own Milky Way, which has a stellar mass of 5 or 6 x 10¹⁰ M_☉ (depending who you ask) with another 10¹⁰ M_☉ or so in gas. So, in classic astronomy/cosmology style, let’s round off and say a big galaxy is about 10¹¹ M_☉. That’s a hundred billion stars, give or take.

How much of the universe does it take to make one big galaxy? The critical density of the universe is the over/under point for whether an expanding universe expands forever, or has enough self-gravity to halt the expansion and ultimately recollapse. Numerically, this quantity is ρ_crit = 3H₀²/(8πG), which for H₀ = 73 km/s/Mpc works out to 10^-29 g/cm³ or 1.5 x 10^-7 M_☉/pc³. This is a very small number, but provides the benchmark against which we measure densities in cosmology. The density of any substance X is Ω_X = ρ_X/ρ_crit. The stars and gas in galaxies are made of baryons, and we know the baryon density pretty well from Big Bang Nucleosynthesis: Ω_b = 0.04. That means the average density of normal matter is very low, only about 4 x 10^-31 g/cm³. That’s less than one hydrogen atom per cubic meter – most of space is an excellent vacuum!

This being the case, we need to scoop up a large volume to make a big galaxy. Going through the math, to gather up enough mass to make a 10¹¹ M_☉ galaxy, we need a sphere with a radius of 1.6 Mpc. That’s in today’s universe; in the past the universe was denser by (1+z)³, so at z = 10 that’s “only” 140 kpc. Still, modern galaxies are much smaller than that; the effective edge of the disk of the Milky Way is at a radius of about 20 kpc, and most of the baryonic mass is concentrated well inside that: the typical half-light radius of a 10¹¹ M_☉ galaxy is around 6 kpc. That’s a long way to collapse.

Monolithic Galaxy Formation

Given this much information, an early concept was monolithic galaxy formation. We have a big ball of gas in the early universe that collapses to form a galaxy. Why and how this got started was fuzzy. But we knew how much mass we needed and the volume it had to come from, so we can consider what happens as the gas collapses to create a galaxy.

Here we hit a big astrophysical reality check. Just how does the gas collapse? It has to dissipate energy to do so, and cool to form stars. Once stars form, they may feed energy back into the surrounding gas, reheating it and potentially preventing the formation of more stars. These processes are nontrivial to compute ab initio, and attempting to do so obsesses much of the community. We don’t agree on how these things work, so they are the knobs theorists can turn to change an answer they don’t like.

Even if we don’t understand star formation in detail, we do observe that stars have formed, and can estimate how many. Moreover, we do understand pretty well how stars evolve once formed. Hence a common approach is to build stellar population models with some prescribed star formation history and see what works. Spiral galaxies like the Milky Way formed a lot of stars in the past, and continue to do so today. To make 5 x 10¹⁰ M_☉ of stars in 13 Gyr requires an average star formation rate of 4 M_☉/yr. The current measured star formation rate of the Milky Way is estimated to be 2 ± 0.7 M_☉/yr, so the star formation rate has been nearly constant (averaging over stochastic variations) over time, perhaps with a gradual decline. Giant elliptical galaxies, in contrast, are “red and dead”: they have no current star formation and appear to have made most of their stars long ago. Rather than a roughly constant rate of star formation, they peaked early and declined rapidly. The cessation of star formation is also called quenching.

A common way to formulate the star formation rate in galaxies as a whole is the exponential star formation rate, SFR(t) = SFR₀ e^-t/τ. A spiral galaxy has a low baseline star formation rate SFR₀ and a long burn time τ ~ 10 Gyr while an elliptical galaxy has a high initial star formation rate and a short e-folding time like τ ~ 1 Gyr. Many variations on this theme are possible, and are of great interest astronomically, but this basic distinction suffices for our discussion here. From the perspective of the observed mass and stellar populations of local galaxies, the standard picture for a giant elliptical was a large, monolithic island universe that formed the vast majority of its stars early on then quenched with a short e-folding timescale.

Galaxies as Island Universes

The density parameter Ω provides another useful way to think about galaxy formation. As cosmologists, we obsess about the global value of Ω because it determines the expansion history and ultimate fate of the universe. Here it has a more modest application. We can think of the region in the early universe that will ultimately become a galaxy as its own little closed universe. With a density parameter Ω > 1, it is destined to recollapse.

A fun and funny fact of the Friedmann equation is that the matter density parameter Ω_m → 1 at early times, so the early universe when galaxies form is matter dominated. It is also very uniform (more on that below). So any subset that is a bit more dense than average will have Ω > 1 just because the average is very close to Ω = 1. We can then treat this region as its own little universe (a “top-hat overdensity”) and use the Friedmann equation to solve for its evolution, as in this sketch:

*The expansion of the early universe a(t) (blue line). A locally overdense region may behave as a closed universe, recollapsing in a finite time (red line) to potentially form a galaxy.*

That’s great, right? We have a simple, analytic solution derived from first principles that explains how a galaxy forms. We can plug in the numbers to find how long it takes to form our basic, big 10¹¹ M_☉ galaxy and… immediately encounter a problem. We need to know how overdense our protogalaxy starts out. Is its effective initial Ω_m = 2? 10? What value, at what time? The higher it is, the faster the evolution from initially expanding along with the rest of the universe to decoupling from the Hubble flow to collapsing. We know the math but we still need to know the initial condition.

Annoying Initial Conditions

The initial condition for galaxy formation is observed in the cosmic microwave background (CMB) at z = 1090. Where today’s universe is remarkably lumpy, the early universe is incredibly uniform. It is so smooth that it is homogeneous and isotropic to one part in a hundred thousand. This is annoyingly smooth, in fact. It would help to have some lumps – primordial seeds with Ω > 1 – from which structure can grow. The observed seeds are too tiny; the typical initial amplitude is 10^-5 so Ω_m = 1.00001. That takes forever to decouple and recollapse; it hasn’t yet had time to happen.

The cosmic microwave background as observed by ESA’s Planck satellite. This is an all-sky picture of the relic radiation field – essentially a snapshot of the universe when it was just a few hundred thousand years old. The variations in color are variations in temperature which correspond to variations in density. These variations are tiny, only about one part in 100,000. The early universe was very uniform; the real picture is a boring blank grayscale. We have to crank the contrast way up to see these minute variations.

We would like to know how the big galaxies of today – enormous agglomerations of stars and gas and dust separated by inconceivably vast distances – came to be. How can this happen starting from such homogeneous initial conditions, where all the mass is equally distributed? Gravity is an attractive force that makes the rich get richer, so it will grow the slight initial differences in density, but it is also weak and slow to act. A basic result in gravitational perturbation theory is that overdensities grow at the same rate the universe expands, which is inversely related to redshift. So if we see tiny fluctuations in density with amplitude 10^-5 at z = 1000, they should have only grown by a factor of 1000 and still be small today (10^-2 at z = 0). But we see structures of much higher contrast than that. You can’t here from there.

The rich large scale structure we see today is impossible starting from the smooth observed initial conditions. Yet here we are, so we have to do something to goose the process. This is one of the original motivations for invoking cold dark matter (CDM). If there is a substance that does not interact with photons, it can start to clump up early without leaving too large a mark on the relic radiation field. In effect, the initial fluctuations in mass are larger, just in the invisible substance. (That’s not to say the CDM doesn’t leave a mark on the CMB; it does, but it is subtle and entirely another story.) So the idea is that dark matter forms gravitational structures first, and the baryons fall in later to make galaxies.

An illustration of the the linear growth of overdensities. Structure can grow in the dark matter (long dashed lines) with the baryons catching up only after decoupling (short dashed line). In effect, the dark matter gives structure formation a head start, nicely explaining the apparently impossible growth factor. This has been standard picture for what seems like forever (illustration from Schramm 1992).

With the right amount of CDM – and it has to be just the right amount of a dynamically cold form of non-baryonic dark matter (stuff we still don’t know actually exists) – we can explain how the growth factor is 10⁵ since recombination instead of a mere 10³. The dark matter got a head start over the stuff we can see; it looks like 10⁵ because the normal matter lagged behind, being entangled with the radiation field in a way the dark matter was not.

This has been the imperative need in structure formation theory for so long that it has become undisputed lore; an element of the belief system so deeply embedded that it is practically impossible to question. I risk getting ahead of the story, but it is important to point out that, like the interpretation of so much of the relevant astrophysical data, this belief assumes that gravity is normal. This assumption dictates the growth rate of structure, which in turn dictates the need to invoke CDM to allow structure to form in the available time. If we drop this assumption, then we have to work out what happens in each and every alternative that we might consider. That definitely gets ahead of the story, so first let’s understand what we should expect in LCDM.

Hierarchical Galaxy formation in LCDM

LCDM predicts some things remarkably well but others not so much. The dark matter is well-behaved, responding only to gravity. Baryons, on the other hand, are messy – one has to worry about hydrodynamics in the gas, star formation, feedback, dust, and probably even magnetic fields. In a nutshell, LCDM simulations are very good at predicting the assembly of dark mass, but converting that into observational predictions relies on our incomplete knowledge of messy astrophysics. We know what the mass should be doing, but we don’t know so well how that translates to what we see. Mass good, light bad.

Starting with the assembly of mass, the first thing we learn is that the story of monolithic galaxy formation outlined above has to be wrong. Early density fluctuations start out tiny, even in dark matter. God didn’t plunk down island universes of galaxy mass then say “let there be galaxies!” The annoying initial conditions mean that little dark matter halos form first. These subsequently merge hierarchically to make ever bigger halos. Rather than top-down monolithic galaxy formation, we have the bottom-up hierarchical formation of dark matter halos.

The hierarchical agglomeration of dark matter halos into ever larger objects is often depicted as a merger tree. Here are four examples from the high resolution Illustris TNG50 simulation (Pillepich et al. 2019; Nelson et al. 2019).

Examples of merger trees from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019). Objects have been selected to have very nearly the same stellar mass at z=0. Mass is built up through a series of mergers. One large dark matter halo today (at top) has many antecedents (small halos at bottom). These merge hierarchically as illustrated by the connecting lines. *The size of the symbol is proportional to the halo mass.* I have added redshift *and the corresponding age of the universe for vanilla LCDM* in a more legible font. *The color bar illustrates the specific star formation rate*: the top row has objects that are still actively star forming like spirals; those in the bottom row are “red and dead” – things that have stopped forming stars, like giant elliptical galaxies. In all cases, there is a lot of merging and a modest rate of growth, with the typical object taking about half a Hubble time (~7 Gyr) to assemble half of its final stellar mass.

The hierarchical assembly of mass is generic in CDM. Indeed, it is one of its most robust predictions. Dark matter halos start small, and grow larger by a succession of many mergers. This gradual agglomeration is slow: note how tiny the dark matter halos at z = 10 are.

Strictly speaking, it isn’t even meaningful to talk about a single galaxy over the span of a Hubble time. It is hard to avoid this mental trap: surely the Milky Way has always been the Milky Way? so one imagines its evolution over time. This is monolithic thinking. Hierarchically, “the galaxy” refers at best to the largest progenitor, the object that traces the left edge of the merger trees above. But the other protogalactic chunks that eventually merge together are as much part of the final galaxy as the progenitor that happens to be largest.

This complicated picture is complicated further by what we can see being stars, not mass. The luminosity we observe forms through a combination of in situ growth (star formation in the largest progenitor) and ex situ growth through merging. There is no reason for some preferred set of protogalaxies to form stars faster than the others (though of course there is some scatter about the mean), so presumably the light traces the mass of stars formed traces the underlying dark mass. Presumably.

That we should see lots of little protogalaxies at high redshift is nicely illustrated by this lookback cone from Yung et al (2022). Here the color and size of each point corresponds to the stellar mass. Massive objects are common at low redshift but become progressively rare at high redshift, petering out at z > 4 and basically absent at z = 10. This realization of the observable stellar mass tracks the assembly of dark mass seen in merger trees.

This is what we expect to see in LCDM: lots of small protogalaxies at high redshift; the building blocks of later galaxies that had not yet merged. The observation of galaxies much brighter than this at high redshift by JWST poses a fundamental challenge to the paradigm: mass appears not to be subdivided as expected. So it is entirely justifiable that people have been freaking out that what we see are bright galaxies that are apparently already massive. That shouldn’t happen; it wasn’t predicted to happen; how can this be happening?

That’s all background that is assumed knowledge for our ApJ paper, so we’re only now getting to its Figure 1. This combines one of the merger trees above with its stellar mass evolution. The left panel shows the assembly of dark mass; the right pane shows the growth of stellar mass in the largest progenitor. This is what we expect to see in observations.

**Fig. 1** from McGaugh et al (2024): A merger tree for a model galaxy from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019, left panel) selected to have M_∗ ≈ 9 × 10¹⁰ M_⊙ at z = 0; i.e., the stellar mass of a local L^∗ giant elliptical galaxy (Driver et al. 2022). Mass assembles hierarchically, starting from small halos at high redshift (bottom edge) with the largest progenitor traced along the left of edge of the merger tree. The growth of stellar mass of the largest progenitor is shown in the right panel. This example (jagged line) is close to the median (dashed line) of comparable mass objects (Rodriguez-Gomez et al. 2016), and within the range of the scatter (the shaded band shows the 16th – 84th percentiles). A monolithic model that forms at z_f = 10 and evolves with an exponentially declining star formation rate with τ = 1 Gyr (purple line) is shown for comparison. The latter model forms most of its stars earlier than occurs in the simulation.

For comparison, we also show the stellar mass growth of a monolithic model for a giant elliptical galaxy. This is the classic picture we had for such galaxies before we realized that galaxy formation had to be hierarchical. This particular monolithic model forms at z_f = 10 and follows an exponential star formation rate with τ = 1 Gyr. It is one of the models published by Franck & McGaugh (2017). It is, in fact, the first model I asked Jay to construct when he started the project. Not because we expected it to best describe the data, as it turns out to do, but because the simple exponential model is a touchstone of stellar population modeling. It was a starter model: do this basic thing first to make sure you’re doing it right. We chose τ = 1 Gyr because that was the typical number bandied about for elliptical galaxies, and z_f = 10 because that seemed ridiculously early for a massive galaxy to form. At the time we built the model, it was ludicrously early to imagine a massive galaxy would form, from an LCDM perspective. A formation redshift z_f = 10 was, less than a decade ago, practically indistinguishable from the beginning of time, so we expected it to provide a limit that the data would not possibly approach.

In a remarkably short period, JWST has transformed z = 10 from inconceivable to run of the mill. I’m not going to go into the data yet – this all-theory post is already a lot – but to offer one spoiler: the data are consistent with this monolithic model. If we want to “fix” LCDM, we have to make the red line into the purple line for enough objects to explain the data. That proves to be challenging. But that’s moving the goalposts; the prediction was that we should see little protogalaxies at high redshift, not massive, monolith-style objects. Just look at the merger trees at z = 10!

Accelerated Structure Formation in MOND

In order to address these issues in MOND, we have to go back to the beginning. What is the evolution of a spherical region (a top-hat overdensity) that might collapse to form a galaxy? How does a spherical region under the influence of MOND evolve within an expanding universe?

The solution to this problem was first found by Felten (1984), who was trying to play the Newtonian cosmology trick in MOND. In conventional dynamics, one can solve the equation of motion for a point on the surface of a uniform sphere that is initially expanding and recover the essence of the Friedmann equation. It was reasonable to check if cosmology might be that simple in MOND. It was not. The appearance of a₀ as a physical scale makes the solution scale-dependent: there is no general solution that one can imagine applies to the universe as a whole.

Felten reasonably saw this as a failure. There were, however, some appealing aspects of his solution. For one, there was no such thing as a critical density. All MOND universes would eventually recollapse irrespective of their density (in the absence of the repulsion provided by a cosmological constant). It could take a very long time, which depended on the density, but the ultimate fate was always the same. There was no special value of Ω, and hence no flatness problem. The latter obsessed people at the time, so I’m somewhat surprised that no one seems to have made this connection. Too soon*, I guess.

There it sat for many years, an obscure solution for an obscure theory to which no one gave credence. When I became interested in the problem a decade later, I started methodically checking all the classic results. I was surprised to find how many things we needed dark matter to explain were just as well (or better) explained by MOND. My exact quote was “surprised the bejeepers out of us.” So, what about galaxy formation?

I started with the top-hat overdensity, and had the epiphany that Felten had already obtained the solution. He had been trying to solve all of cosmology, which didn’t work. But he had solved the evolution of a spherical region that starts out expanding with the rest of the universe but subsequently collapses under the influence of MOND. The overdensity didn’t need to be large, it just needed to be in the low acceleration regime. Something like the red cycloidal line in the second plot above could happen in a finite time. But how much?

The solution depends on scale and needs to be solved numerically. I am not the greatest programmer, and I had a lot else on my plate at the time. I was in no rush, as I figured I was the only one working on it. This is usually a good assumption with MOND, but not in this case. Bob Sanders had had the same epiphany around the same time, which I discovered when I received his manuscript to referee. So all credit is due to Bob: he said these things first.

First, he noted that galaxy formation in MOND is still hierarchical. Small things form first. Crudely speaking, structure formation is very similar to the conventional case, but now the goose comes from the change in the force law rather than extra dark mass. MOND is nonlinear, so the whole process gets accelerated. To compare with the linear growth of CDM:

A sketch of how structures grow over time under the influence of cold dark matter (left, from Schramm 1992, same as above) and MOND (right, from Sanders & McGaugh 2002; see also this further discussion and previous post). The slow linear growth of CDM (long-dashed line, left panel) is replaced by a rapid, nonlinear growth in MOND (solid lines at right; numbers correspond to different scales). Nonlinear growth moderates after cosmic expansion begins to accelerate (dashed vertical line in right panel).

The net effect is the same. A cosmic web of large scale structure emerges. They look qualitatively similar, but everything happens faster in MOND. This is why observations have persistently revealed structures that are more massive and were in place earlier than expected in contemporaneous LCDM models.

*Simulated structure formation in ΛCDM (top) and MOND (bottom) showing the more rapid emergence of similar structures in MOND (note the redshift of each panel). From McGaugh (2015).*

In MOND, small objects like globular clusters form first, but galaxies of a range of masses all collapse on a relatively short cosmic timescale. How short? Let’s consider our typical 10¹¹ M_☉ galaxy. Solving Felten’s equation for the evolution of a sphere numerically, peak expansion is reached after 300 Myr and collapse happens in a similar time. The whole galaxy is in place speedy quick, and the initial conditions don’t really matter: a uniform, initially expanding sphere in the low acceleration regime will behave this way. From our distant vantage point thirteen billion years later, the whole process looks almost monolithic (the purple line above) even though it is a chaotic hierarchical mess for the first few hundred million years (z > 14). In particular, it is easy to form half of the stellar mass early on: the mass is already assembled.

This is what JWST sees: galaxies that are already massive when the universe is just half a billion years old. I’m sure I should say more but I’m exhausted now and you may be too, so I’m gonna stop here by noting that in 1998, when Bob Sanders predicted that “Objects of galaxy mass are the first virialized objects to form (by z=10),” the contemporaneous prediction of LCDM was that “present-day disc [galaxies] were assembled recently (at z<=1)” and “there is nothing above redshift 7.” One of these predictions has been realized. It is rare in science that such a clear a priori prediction comes true, let alone one that seemed so unreasonable at the time, and which took a quarter century to corroborate.

*I am not quite this old: I was still an undergraduate in 1984. I hadn’t even decided to be an astronomer at that point; I certainly hadn’t started following the literature. The first time I heard of MOND was in a graduate course taught by Doug Richstone in 1988. He only mentioned it in passing while talking about dark matter, writing the equation on the board and saying maybe it could be this. I recall staring at it for a long few seconds, then shaking my head and muttering “no way.” I then completely forgot about it, not thinking about it again until it came up in our data for low surface brightness galaxies. I expect most other professionals have the same initial reaction, which is fair. The test of character comes when it crops up in their data, as it is doing now for the high redshift galaxy community.

What if we never find dark matter?

Some people have asked me to comment on the Scientific American article What if We Never Find Dark Matter? by Slatyer & Tait. For the most part, I find it unobjectionable – from a certain point of view. It is revealing to examine this point of view, starting with the title, which frames the subject in a way that gives us permission to believe in dark matter while never finding it. This framing is profoundly unscientific, as it invites a form of magical thinking that could usher in a thousand years of dark epicycles (feedback being the modern epicycle) on top of the decades it has already sustained.

The article does recognize that a modification of gravity is at least a logical possibility. The mere mention of this is progress, if grudging and slow. They can’t bring themselves to name a specific theory: they never say MOND and only allude obliquely to a single relativistic theory as if saying its name out loud would bring a curse^% upon their house.

Of course, they mention modified gravity merely to dismiss it:

A universe without dark matter would require striking modifications to the laws of gravity… [which] seems exceptionally difficult.

Yes it is. But it has also proven exceptionally difficult to detect dark matter. That hasn’t stopped people from making valiant efforts to do so. So the argument is that we should try really hard to accomplish the exceptionally difficult task of detecting dark matter, but we shouldn’t bother trying to modify gravity because doing so would be exceptionally difficult.

This speaks to motivations – is one idea better motivated? In the 1980s, cold dark matter was motivated by both astronomical observations and physical theory. Absent the radical thought of modifying gravity, we had a clear need for unseen mass. Some of that unseen mass could simply have been undetected normal matter, but most of it needed to be some form of non-baryonic dark matter that exceeded the baryon density allowed by Big Bang Nucleosynthesis and did not interact directly with photons. That meant entirely new physics from beyond the Standard Model of particle physics: no particle in the known stable of particles suffices. This new physics was seen as a good thing, because particle physicists already had the feeling that there should be something more than the Standard Model. There was a desire for Grand Unified Theories (GUTs) and supersymmetry (SUSY). SUSY naturally provides a home for particles that could be the dark matter, in particular the Weakly Interacting Massive Particles (WIMPs) that are the prime target for the vast majority of experiments that are working to achieve the exceptionally difficult task of detecting them. So there was a confluence of reasons from very different perspectives to make the search for WIMPs very well motivated.

That was then. Fast forward a few decades, and the search for WIMPs has failed. Repeatedly. Continuing to pursue it is an example of the sunk cost fallacy. We keep doing it because we’ve already done so much of it that surely we should keep going. So I feel the need to comment on this seemingly innocuous remark:

although many versions of supersymmetry predict WIMP dark matter, the converse isn’t true; WIMPs are viable dark matter candidates even in a universe without supersymmetry.

Strictly speaking, this is correct. It is also weak sauce. The neutrino is an example of a weakly interacting particle that has some mass. We know neutrinos exist, and they reside in the Standard Model – no need for supersymmetry. We also know that they cannot be the dark matter, so it would be disingenuous to conflate the two. Beyond that, it is possible to imagine a practically infinite variety of particles that are weakly interacting by not part of supersymmetry. That’s just throwing mud at the wall. SUSY WIMPs were extraordinarily well motivated, with the WIMP miracle being the beautiful argument that launched a thousand experiments. But lacking SUSY – which seems practically dead at this juncture – WIMPS as originally motivated are dead along with it. The motivation for more generic WIMPs is lacking, so the above statement is nothing more than an assertion that runs interference for the fact that we no longer have good reason to expect WIMPs at all.

There is also an element of disciplinary-centric thinking: if you’re a particle physicist, you can build a dark matter detector and maybe make a major discovery or at least get great gobs of grants in the effort to do so. If instead what is going on is really a modification of gravity, then your expertise is irrelevant and there is no reason to keep shoveling money into your field. Worse, a career spent at the bottom of a mine shaft working on dark matter detectors is a waste of effort. I can understand why people don’t want to hear that message, but that just brings us back to the sunk cost fallacy.

Speaking of money, I occasionally get scientists who come up to me Big Mad that grant money gets spent on MOND research, as that would be a waste of taxpayer money. I can assure them that no government dollars have been harmed in the pursuit of MOND research. Certainly not in the U.S., at any rate. But lots and lots of tax dollars have been burned in the search for dark matter, and the article we’re discussing advocates spending a whole lot more to search for dark matter candidates that are nowhere near as well motivated as WIMPs were. That’s why I keep asking: how do we know when to stop? I don’t expect other scientists to agree to my interpretation of the data, but I do expect them to have a criterion whereby they would accede that dark matter is incorrect. If we lack any notion of how we could figure out that we are wrong, then we’ve made the leap from science to religion. So far, such criteria are sadly lacking, and I see precious little evidence of people rising to the challenge. Indeed, I frequently get the opposite, as other scientists have frequently asserted to me that they would only consider MOND as a last resort. OK, when does that happen? There’s always another particle we can think up, so the answer seems to be “never.”

I wrote long ago that “After WIMPs, the next obvious candidate is axions.” Sure enough, this article spills a lot of ink discussing axions. Rather than dwell on this different doomed idea for dark matter, let’s take a gander at the remarkable art made to accompany the article, because we are visual animals and graphical representations are important.

Where to start? Right in the center is a scroll of an old-timey star chart. On top of that are several depictions of what I guess are meant to be galaxies*. Around those is an ethereal dragon representing the unknown dark matter. The depiction of dark matter as an unfathomable monster is at once both spot on and weirdly anthropomorphic. Is this a fabled beast the adventurous hero is supposed to seek out and slay? or befriend? or maybe it is a tale in which he grows during the journey to realize he has been on the wrong path the whole time? I love the dragon as art, but as a representation of a scientific subject it imparts an aura of teleological biology to something that is literally out of this world, residing in a dark sector that is not part of our daily experience and may be entirely inaccessible to our terrestrial experimentation. Off the edge of the map and on into extra dimensions: here there be monsters.

The representations here are fantastic. There is the coffee mug and the candle to represent the hard work of those of us who burn the candle at both ends wrestling with the dark matter problem. There’s a magnifying glass to represent how hard the experimentalists have looked for the dark matter. Scattered around are various totems, like the Polaroid-style picture at right depicting the gravitational lensing around a black hole. This is cool, but has squat to do with the missing mass problem. It’s more a nod to General Relativity and the Faith we have therein, albeit in a regime many orders of magnitude removed from the one that concerns us here. On the left is an old newspaper article about WIMPs, complete with a sketch of a Feynman diagram that depicts how we might detect them. And at the top, peeking out of a book, as it were a thought made long ago now seeking new relevance, a note saying Axions!

I can save everyone a lot of time, effort, and expense. It ain’t WIMPs and it ain’t axions. Nor is the dark matter any of the plethora of other ideas illustrated in the eye-watering depiction of the landscape of particle possibilities in the article. These simply add mass while providing no explanation of the observed MOND phenomenology. This phenomenology is fundamental to the problem, so any approach that ignores it is doomed to failure. I’m happy to consider explanations based on dark matter, but these need to have a direct connection to baryons baked-in to be viable. None of the ideas they discuss meet this minimum criterion.

Of course it could be that MOND – either as modified gravity or modified inertia, an important possibility that usually gets overlooked – is essentially correct and that’s why it keeps having predictions come true. That’s what motivates considering it now: repeated and sustained predictive success, particularly for phenomena that dark matter does not provide a satisfactory explanation for.

Of course, this article advocating dark matter is at pains to dismiss modified gravity as a possibility:

The changes [of modified gravity] would have to mimic the effects of dark matter in astrophysical systems ranging from giant clusters of galaxies to the Milky Way’s smallest satellite galaxies. In other words, they would need to apply across an enormous range of scales in distance and time, without contradicting the host of other precise measurements we’ve gathered about how gravity works. The modifications would also need to explain why, if dark matter is just a modification to gravity—which is universally associated with all matter—not all galaxies and clusters appear to contain dark matter. Moreover, the most sophisticated attempts to formulate self-consistent theories of modified gravity to explain away dark matter end up invoking a type of dark matter anyway, to match the ripples we observe in the cosmic microwave background, leftover light from the big bang.

That’s a lot, so let’s break it down. First, that modified gravity “would have to mimic the effects of dark matter” gets it exactly backwards. It is dark matter that has to mimic the effects of MOND. That’s an easy call: dark matter plus baryons could combine in a large variety of ways that might bear no resemblance to MOND. Indeed, they should do that: the obvious prediction of LCDM-like theories is an exponential disk in an NFW halo. In contrast, there is one and only one thing that can happen in MOND since there is a single effective force law that connects the dynamics to the observed distribution of baryons. Galaxies didn’t have to do that, shouldn’t do that, but remarkably they do. The uniqueness of this relation poses a problem for dark matter that has been known since the previous century:

*Reluctant conclusions from McGaugh & de Blok (1998). As we said at the time, “This result surprised the bejeepers out of us, too.”*

This basic conclusion has not changed over the years, only gotten stronger. The equation coupling dark to luminous matter I wrote down in all generality in McGaugh (2004) and again in McGaugh et al. (2016). The latter paper is published in Physical Review Letters, arguably the most prominent physics journal, and is in the top percentile of citation rates, so it isn’t some minuscule detail buried in an obscure astronomical journal that might have eluded the attention of particle physicists. It is the implication that conclusion [1] could be correct that bounces off a protective shell of cognitive dissonance so hard that the necessary corollary [2] gets overlooked.

OK, that’s just the first sentence. Let’s carry on with “[the modification] would need to apply across an enormous range of scales in distance and time, without contradicting the host of other precise measurements we’ve gathered about how gravity works.” Well, duh. That’s the first thing I checked. Thoroughly and repeatedly. I’ve written many reviews on the subject. They’re either unaware of some well-established results, or choose to ignore them.

The reason MOND doesn’t contradict the host of other constraints about how gravity works is simple. It happens in the low acceleration regime, where the only test of gravity is provided by the data that evince the mass discrepancy. If we had posed galaxy observations as a test of GR, we would have concluded that it fails at low accelerations. Of course we didn’t do that; we observed galaxies because we were interested in how they worked, then inferred the need for dark matter when gravity as we currently know it failed to explain the data. Other tests, regardless how precise, are irrelevant if they probe accelerations higher than Milgrom’s constant (1.2 x 10^-10 m/s/s).

Continuing on, there is the complaint that “modifications would also need to explain why… not all galaxies and clusters appear to contain dark matter.” Yep, you gotta explain all the data. That starts with the vast majority of the data that do follow the radial acceleration relation, which is not satisfactorily explained by dark matter. They skip⁺ past that part, preferring to ignore the forest in order to complain about a few outlying trees. There are some interesting cases, to be sure, but this complaint about objects lacking dark matter is misplaced for deeper reasons. It makes no sense in terms of dark matter that there are objects without dark matter. That shouldn’t happen in LCDM any more than in MOND^$. One winds up invoking non-equilibrium effects, which we can do in MOND just as we do in dark matter. It is not satisfactory in either case, but it is weird to complain about it for one theory while not for the other. This line of argument is perilously close to the a priori fallacy.

The last line, “the most sophisticated attempts to formulate self-consistent theories of modified gravity to explain away dark matter end up invoking a type of dark matter anyway, to match the ripples we observe in the cosmic microwave background” actually has some merit. The theory they’re talking about is Aether-Scalar-Tensor (AeST) theory, which I guess earns the badge of “most sophisticated” because it fits the power spectrum of the cosmic microwave background (CMB).

I’ve discussed the CMB in detail before, so won’t belabor it here. I will note that the microwave background is only one piece of many lines of evidence, and the conclusion one reaches depends on how one chooses to weigh the various incommensurate evidence. That they choose to emphasize this one thing while entirely eliding the predictive successes of MOND is typical, but does not encourage me to take this as a serious argument, especially when I had more success predicting important aspects of the microwave background than did the entire community that persistently cites the microwave background to the exclusion of all else.

It is also a bit strange to complain that AeST “explain[s] away dark matter [but] end[s] up invoking a type of dark matter.” I think what they mean here is true at the level of quantum field theory where all particles are fields and all fields are particles, but beyond that, they aren’t the same thing at all. It is common for modified gravity theories to invoke scalar fields^#, and this is an important degree of freedom that enables AeST to fit the CMB. TeVeS also added a scalar and tensor field, but could not fit the CMB, so this approach isn’t guaranteed to work. But are these a type of dark matter? Or are our ideas of dark matter mimicking a scalar field? It seems like this argument could cut either way, and we’re just granting dark matter priority as a concept because we thought of it first. I don’t think nature cares about the order of our thoughts.

None of this addresses the question of the year. Why does MOND get any predictions right? Just saying “dark matter does it” is not sufficient. Until scientists engage seriously with this question, they’re doomed to chasing phantoms that aren’t there to catch.

^%From what I’ve seen, they’re probably right to fear the curses of their colleagues for such blasphemy. Very objective, very scientific.

*Galaxies are nature’s artwork; human imitations never seem adequate. These look more like fried eggs to me. On the whole, this art is exceptionally well informed by science, or at least by particle physics, but not so much by astronomy. And therein lies the greater problem: there is a whole field of physics devoted to dark matter that is entirely motivated by astronomical observations yet its practitioners are, by and large, remarkably ignorant of anything more than the most rudimentary aspects of the data that motivate their field’s existence.

⁺There seems to be a common misconception that anything we observe is automatically explained by dark matter. That’s only true at the level of inference: any excess gravity is attributable to unseen mass. That’s why a hypothesis is only as good as its prior; a mere inference isn’t science, you have to make a prediction. Once you do that, you find dark matter might do lots of things that are not at all like the MONDian phenomenology that we observe. While I would hope the need for predictions is obvious, many scientists seem to conflate observation with prediction – if we observe it, that’s what dark matter must predict!

^$The discrepancy should only appear below the critical acceleration scale in MOND. So strictly speaking, MOND does predict that there should be objects without dark matter: systems that are high acceleration. The central regions of globular clusters and elliptical galaxies are such regions, and MOND fares well there. In contrast, it is rather hard to build a sensible dark matter model that is as baryon dominated as observed. So this is an example of MOND explaining the absence of dark matter better than dark matter theory. This is related to the observation that the apparent need for dark matter only appears at low accelerations, at a scale that dark matter knows nothing about.

^#I, personally, am skeptical of this approach, as it seems too generic (let’s add some new freedom!) when it feels like we’re missing something fundamental, perhaps along the lines of Mach’s Principle. However, I also recognize that this is a feeling on my part; it is outside my training to have a meaningful opinion.

Progressive Approximations in Mass Modeling

I have said I wasn’t going to attempt to teach an entire graduate course on galaxy dynamics in this forum, and I’m not. But I can give some pointers for those who want to try it for themselves. It also provides some useful context for fans of Deur’s approach.

The go-to textbook for this topic is Galactic Dynamics by Binney & Tremaine. The first edition was published in 1987, conveniently when I switched to grad school in astronomy. It was already a deep and well-developed field at that time; this is a compendium of considerable scientific knowledge.

Fun story: a colleague in a joint physics & astronomy department once complained to me that she wanted to develop a course in galaxy dynamics, which is a staple of graduate programs in astronomy & astrophysics. However, there was a certain senior colleague who objected, saying that since it was astronomy, it couldn’t possibly be a rigorous course worthy of a full semester graduate course. This is a casual bias that astronomers often encounter when talking to physicists, many of whom have attitudes about the subject that were trapped in amber sometime in the Jurassic. I suggested that she walk into his office and drop a copy of Galactic Dynamics on his desk from on high, as (1) it would make a hefty impact, and (2) no one who so much as skims this book could persist in this toxic attitude.

She later reported that she had done this, and it had worked.

Galactic Dynamics is not a starter book. It is the textbook we use when teaching the graduate course that this is not. A useful how-to guide for the specific material I’ll discuss here is provided by Federico Lelli. In brief, to model the gravitational potential of an observed distribution of matter, we can make one of the following series of approximations:

*This is a slide I sometimes use to introduce mass modeling in science talks as a reminder for expert audiences.*

All science is an approximation at some level. The most crude approximation we can employ here is to imagine that all of the mass resides at a central point. In this limit, the potential is simply

V² = GM/R

where V is the orbital speed of a test particle on a circular orbit, G is Newton’s constant, M is the mass, and R is the distance from the point mass. Galaxies are not point masses, so this is a terrible approximation, as can be seen by the divergent V ~ R^-1/2 behavior as R → 0 (the dotted line above).

The next bad approximation one can make is a spherical cow: assume the mass is distributed in a sphere that is projected as the image we see on the sky. This at least incorporates the fact that the mass is not all concentrated at a point, so

V² = GM(R)/R

acknowledges that the mass M is spread out as a function of radius. This is a spherical cow. Since we cannot see dark matter, we almost always assume it to be a spherical cow.

For the luminous disk of a spiral galaxy, a common approximation is the so-called exponential disk:

Σ(R) = Σ₀ e^-R/R_d

where Σ₀ is the central surface density of stars and R_d is the scale length of the disk – the characteristic size over which the surface brightness declines exponentially. This can be integrated by parts to obtain an expression for the enclosed mass M(R) which I leave as an exercise for the eager reader. This provides a handy analytic formula, the rotation curve of which is illustrated above by the dashed line.

Spiral galaxies are fairly thin when seen edge-on, so the spherical cow is not a great approximation. In a classic paper, Freeman (1970) solved the Poisson equation for the case of a razor-thin exponential disk, where one meets modified Bessel functions of the first and second kind (denoted “ikik” above). These must be solved numerically, but one can make a tabulation for use with any choice of disk mass and scale length. Such a thin disk is illustrated by the grey line above for a choice of stellar mass and scale length appropriate to NGC 6946.

*The spiral galaxy NGC 6946, aka the fireworks galaxy.*

Spiral galaxies are not razor thin of course. We only see a projected image on the sky, so for a galaxy like NGC 6946, we may have a good measurement of its azimuthally averaged light (and presumable stellar mass) distribution Σ(R) but we have no idea how thick it is. Here, we have to make an educated guess based on observations of edge-on galaxies. A ballpark average is R:z = 8:1, but some galaxies are thicker and others thinner, so this becomes an approximation with an associated uncertainty. This uncertainty cannot be unambiguously eliminated; it is one of the known unknowns that comprise the inevitable systematic errors in astronomy. Fortunately, allowing for a finite thickness only takes the harsh edge off of the thin disk case, and the assumption one chooses makes little difference to the result (compare the lines labeled thick and thin above).

The exponential disk formula Σ(R) is an azimuthal average over an image like that of NGC 6946. This approximation captures none of the spiral structure: it only tells us about the average rate at which the surface brightness falls off. It also imposes a smooth shape on that fall off that our eyes can see is not necessarily a great approximation. So the next level of approximation is to solve the Poisson equation numerically for the observed surface brightness profile, Σ(R), not just the exponential approximation thereto. This is the blue line in the bottom right graph above.

There are important differences between using the numerical solution for the observed light distribution and the exponential disk approximation. This has been known since the 1980s, but the analytic expression is so convenient that people need an occasional reminder not to trust it too much. Jerry Sellwood felt the need to provide this reminder in 1999:

Small apparent differences in the shape of the mass profile (left) correspond to pronounced differences in the rotation curve (right). I chose the example of NGC 6946 in part because the exponential approximation for it is pretty good. Nevertheless, the details matter, so the best practice is to build numerical mass models, as we did for SPARC.

Building numerical mass models is tractable for external galaxies, where we can see the entire light distribution. It is not possible for our own Milky Way, since we are located within it and cannot see it as a whole. Consequently, the vast majority of Milky Way models rely on the exponential approximation; so far as I’m aware, I’m the only one who has built a model that attempts to get beyond this.

Numerical mass models are still an approximation. We’re assuming that the gravitational potential is static and azimuthally symmetric. Taking the next step would require abandoning these assumptions to model the spiral arms. The Poisson equation can handle that, but it becomes dicey because the arms rotate with some pattern speed (generally unknown) and may grow or dissolve or reform on some unknown timescale. The potential at any given point is time variable even in equilibrium, so we need not just a numerical solution but a live numerical simulation to keep track of it. That can be done, but it has to be done on a case by case basis, and the answer will depend somewhat on additional assumptions that have to be introduced to run the simulation, like specifying a dark matter halo.

One can generalize further to consider the full 3D potential, e.g., to allow for asymmetry in the z-direction as well as in azimuth. One can further imagine non-equilibrium processes, such as an external perturbations. There is good evidence that the Milky Way suffers both of these effects, the passage of the Large Magellanic Cloud being one obvious and apparently large perturbation. So we are in the awkward position that the Gaia data now oblige us to consider the entire run of possible effects through non-equilibrium processes in a mass distribution that is not completely symmetric in any of the three spatial dimensions, but for the main mass component we are stuck with the inadequate approximation of an exponential disk.

Geometry appears to play a crucial role in the approach of Deur to the acceleration discrepancy problem. The essential claim is that the discrepancy correlates with flattening, with highly flattened systems like spirals evincing the classic discrepancy while spherical systems like E0 galaxies showing none. Big if true!

A useful plot appears on slide 44:

*Some measure of the discrepancy as a function of apparent ellipticity.*

This is the one example shown that goes into the plot of many determinations of the slope a on the following slide. It being the only one, it is the only thing I have to evaluate without chasing down every other case. Looking at this, I am not inclined to do so.

At first it looks persuasive: the best fit slope is clear. There is no reason why the discrepancy should depend on the projected ellipticity of a triaxial 3D blob of stars, so this must be telling us something important. I’d be on board with that if it were true, but I’ve seen too many non-correlations masquerading as correlations to believe this one. The fitted slope is strongly influenced by the one point at large ellipticity; absent that, a slope of zero works fine. Mostly what I see here is a lot of scatter, which is normal in extragalactic astronomy. Since there are only a few points at high and low ellipticity, we don’t know what would happen if we went out and got more data. But I bet that what would happen is that the high ellipticity points would wind up looking like those in the middle: a big blob of scatter, with no significant correlation.

I’d kinda like to be wrong about this one, so I won’t even get into the theory side, which I find sorta compelling but ultimately unpersuasive. Why are gravitons confined to a disk? What happens way far out? Surely the flatness of the disk at tens of kpc is not dictating the flatness at 1000 kpc.

Surely.

Why’d it have to be MOND?

I want to take another step back in perspective from the last post to say a few words about what the radial acceleration relation (RAR) means and what it doesn’t mean. Here it is again:

The Radial Acceleration Relation over many decades. The grey region is forbidden – there cannot be less acceleration than caused by the observed baryons. The entire region above the diagonal line (yellow) is accessible to dark matter models as the sum of baryons and however much dark matter the model prescribes. MOND is the blue line.

This information was not available when the dark matter paradigm was developed. We observed excess motion, like flat rotation curves, and inferred the existence of extra mass. That was perfectly reasonable given the information available at the time. It is not now: we need to reassess as we learn more.

There is a clear organization to the data at both high and low acceleration. No objective observer with a well-developed physical intuition would look at this and think “dark matter.” The observed behavior does not follow from one force law plus some arbitrary amount of invisible mass. That could do literally anything in the yellow region above, and beyond the bounds of the plot, both upwards and to the left. Indeed, there is no obvious reason why the data don’t fall all over the place. One of the lingering, niggling concerns is the 5:1 ratio of dark matter:baryons – why is it in the same ballpark, when it could be pretty much anything? Why should the data organize in terms of acceleration? There is no reason for dark matter to do this.

Plausible dark matter models have been predicted to do a variety of things – things other than what we observe. The problem for dark matter is that real objects only occupy a tiny line through the vast region available to them in the plot above. This is a fine-tuning problem: why do the data reside only where they do when they could be all over the place? I recognized this as a problem for dark matter before I became aware^$ of MOND. That it turns out that the data follow the line uniquely predicted^* by MOND is just chef’s kiss: there is a fine-tuning problem for dark matter because MOND is the effective force law.

The argument against dark matter is that the data could reside anywhere in the yellow region above, but don’t. The argument against MOND is that a small portion of the data fall a little off the blue line. Arguing that such objects, be they clusters of galaxies or particular individual galaxies, falsify MOND while ignoring the fine-tuning problem faced by dark matter is a case of refusing to see the forest for a few outlying trees.^%

So to return to the question posed in the title of this post, I don’t know why it had to be MOND. That’s just what we observe. Pretending dark matter does the same thing is a false presumption.

^$I’d heard of MOND only vaguely, and, like most other scientists in the field, had paid it no mind until it reared its ugly head in my own data.

*I talk about MOND here because I believe in giving credit where credit is due. MOND predicted this; no other theory did so. Dark matter theories did not predict this. My dark matter-based galaxy formation theory did not predict this. Other dark matter-based galaxy formation theories (including simulations) continue to fail to explain this. Other hypotheses of modified gravity also did not predict what is observed. Who⁺ ordered this?

*Modified Dynamics. Very dangerous. You go first.*

Many people in the field hate MOND, often with an irrational intensity that has the texture of religion. It’s not as if I woke up one morning and decided to like MOND – sometimes I wish I had never heard of it – but disliking a theory doesn’t make it wrong, and ignoring it doesn’t make it go away. MOND and only MOND predicted the observed RAR a priori. So far, MOND and only MOND provides a satisfactory explanation of thereof. We might not like it, but there it is in the data. We’re not going to progress until we get over our fear of MOND and cope with it. Imagining that it will somehow fall out of simulations with just the right baryonic feedback prescription is a form of magical thinking, not science.

⁺Milgrom. Milgrom ordered this.

^%I expect many cosmologists would argue the same in reverse for the cosmic microwave background (CMB) and other cosmological constraints. I have some sympathy for this. The fit to the power spectrum of the CMB seems too good to be an accident, and it points to the same parameters as other constraints. Well, mostly – the Hubble tension might be a clue that things could unravel, as if they haven’t already. The situation is not symmetric – where MOND predicted what we observe a priori with a minimum of assumptions, LCDM is an amalgam of one free parameter after another after another: dark matter and dark energy are, after all, auxiliary hypotheses we invented to save FLRW cosmology. When they don’t suffice, we invent more. Feedback is single word that represents a whole Pandora’s box of extra degrees of freedom, and we can invent crazier things as needed. The results is a Frankenstein’s monster of a cosmology that we all agree is the same entity, but when we examine it closely the pieces don’t fit, and one cosmologist’s LCDM is not really the same as that of the next. They just seem to agree because they use the same words to mean somewhat different things. Simply agreeing that there has to be non-baryonic dark matter has not helped us conjure up detections of the dark matter particles in the laboratory, or given us the clairvoyance to explain^# what MOND predicted a prioi. So rather than agree that dark matter must exist because cosmology works so well, I think the appearance of working well is a chimera of many moving parts. Rather, cosmology, as we currently understand it, works if and only if non-baryonic dark matter exists in the right amount. That requires a laboratory detection to confirm.

^#I have a disturbing lack of faith that a satisfactory explanation can be found.

The Radial Acceleration Relation starting from high accelerations

In the previous post, we discussed how lensing data extend the Radial Acceleration Relation (RAR) seen in galaxy kinematics to very low accelerations. Let’s zoom out now, and look at things at higher accelerations and from a historical perspective.

This all started with Kepler’s Laws of Planetary Motion, which are explained by Newton’s Universal Gravitation – the inverse square law g_bar = GM/r² is exactly what is needed to explain the observed centripetal acceleration, g_obs = V²/r. It also explains the surface gravity of the Earth. Indeed, it was the famous falling apple that is reputed to have given Newton the epiphany that it was the same force that made the apple fall to the ground that made the Moon circle the Earth that made the planets revolve around the sun.

The inverse square law holds over more than six decades of observed acceleration in the solar system, from the one gee we feel here on the surface of the Earth to the outskirts patrolled by Neptune.

Planetary motion in the radial acceleration plane. The dotted line is Newton’s inverse square law of universal gravity.*

The inverse square force law is what it takes to make the planetary data line up. A different force law would give a line with a different slope in this plot. No force law at all would give chaos, with planets all over the place in this plot, if, say, the solar system were run by a series of deferents and epicycles as envisioned for Ptolemaic cosmologies. In such a system, there is no reason to expect the organization seen above. It would require considerable contrivance to make it so.

Newtonian gravity and General Relativity are exquisitely well-tested in the solar system. There are also some very precise tests at higher accelerations that GR passes with flying colors. The story to lower accelerations is another matter. The most remote solar system probes we’ve launched are the Voyger and Pioneer missions. These probe down to ~10^-6 m/s/s; below that is uncharted territory.

The RAR extended from high solar system accelerations to much low accelerations typical of galaxies – not the change in scale. Some early rotation curves (of NGC 55, NGC 801, NGC 2403, NGC 2841, & UGC 2885) are shown as lines. These probed an entirely new regime of acceleration. The departure of these lines from the dotted line are the flat rotation curves indicating the acceleration discrepancy/need for dark matter. This discrepancy was clear by the end of the 1970s, but the amplitude of the discrepancy then was modest.

Galaxies (and extragalactic data in general) probe an acceleration range that is unprecedented from the perspective of solar system tests. General Relativity has passed so many precise tests that the usual presumption is that is applies at all scales. But it is an assumption that it applies to scales where it hasn’t been tested. Galaxies and cosmology pose such a test. That we need to invoke dark matter to save the phenomenon would be interpreted as a failure if we had set out to test the theory rather than assume it applied.

It was clear from flat rotation curves that something extra was needed. However, when we invented the dark matter paradigm, it was not clear that the data were organized in terms of acceleration. As the data continued to improve, it became clear that the vast majority of galaxies adhered to a single, apparently universal⁺ radial acceleration relation. What had been a hint of systematic behavior in early data became clean and clear. The data did not exhibit the scatter that as was expected from a sum of a baryonic disk and a non-baryonic dark matter halo – there is no reason that these two distinct components should sum to the single effective force law that is observed.

The RAR with modern data for both early (red triangles) and late (cyan circles) morphological types. The blue line is the prediction of MOND: there is a transition at an acceleration scale to a force law that is universal but no longer inverse-square.

The observed force-law happened to already have a name: MOND. If it had been something else, then we could have claimed to discover something new. But instead we were obliged to admit that the unexpected thing we had found had in fact been predicted by Milgrom.

This predictive power now extends to much lower accelerations. Again, only MOND got this prediction right in advance.

*The RAR as above, extended by weak gravitational lensing observations. These follow the prediction of MOND as far as they are credible.*

The data could have done many different things here. It could have continued along the dotted line, in which case we’d have need for no dark matter or modified gravity. It could have scattered all over the place – this is the natural expectation of dark matter theories, as there is no reason to expect the gravitational potential of the dominant dark matter halo to be dictated by the distribution of baryons. One expects that not to happen. Yet the data evince the exceptional degree of organization seen above.

It requires considerable contrivance to explain the RAR with dark matter. No viable explanation yet exists, despite many unconvincing claims to this effect. I have worked more on trying to explain this in terms of dark matter than I have on MOND, and all I can tell you is what doesn’t work. Every explanation I’ve seen so far is a special case of a model I had previously considered and rejected as obviously unworkable. At this point, I don’t see how dark matter can ever plausibly do what the data require.

I worry that dark matter has become an epicycle theory. We’re sure it is right, so whatever we observe, no matter how awkward or unexpected, must be what it does. But what if it is wrong, and it does not exist? How do we ever disabuse ourselves of the notion that there is invisible mass once we’ve convinced ourselves that there has to be?

Of course, MOND has its own problems. Clusters of galaxies are systems^$ for which it persistently fails to explain the amplitude of the observed acceleration discrepancy. So let’s add those to the plot as well:

*As above, with clusters of galaxies added (x: Sanders 2003; +: Li et al. 2023).*

So: do clusters violate the RAR, or follow it? I’d say yes and yes – the offset, thought modest in amplitude in this depiction, is statistically significant. But there is also a similar scaling with acceleration, only the amplitude is off. The former makes no sense in MOND; the latter makes no sense in terms of dark matter which did not predict a RAR at all.

Clusters are the strongest evidence against MOND. Just being evidence against MOND doesn’t automatically make it evidence in favor of dark matter. I often pose myself the question: which theory requires me to disbelieve the least amount of data? When I first came to the problem, I was shocked to find that the answer was clearly MOND. Since then, it has gone back and forth, but rather than a clear answer emerging, what has happened is more a divergence of different lines of evidence: that which favors the standard cosmology is incommensurate with that which favors MOND. This leads to considerable cognitive dissonance.

One way to cope with cognitive dissonance is to engage with a problem from different perspectives. If I put on a MOND hat, I worry about the offset seen above for clusters. If I put on a dark matter hat, I worry about the same kind of offset for every system that is not a rich cluster of galaxies. Most critics of MOND seem unconcerned about this problem for dark matter, so how much should a critic of dark matter worry about it in MOND?

*For the hyper-pedantic: the eccentricity of each orbit causes the exact location of each planet in the first plot to oscillate up and down along the dotted line. The extent of this oscillation is smaller than the size of each symbol with the exception of Mercury, which has a relatively high eccentricity (but nowhere near enough to reach Venus).

⁺There are a few exceptions, of course – there are always exceptions in astronomy. The issue is whether these are physically meaningful, or the result of systematic uncertainties or non-equilibrium processes. The claimed discrepancies range from dubious to unconvincing to obviously wrong.

^$I’ve heard some people criticize MOND because the centroid of the lensing signal does not peak around the gas in the Bullet cluster. This assumes that the gas represents the majority of the baryons. We know the is not the case, and that there is some missing mass in clusters. Whatever it is, it is clearly more centrally concentrated than the gas, so we don’t expect the lensing signal to peak where the gas is. All the Bullet cluster teaches us is that whatever this stuff is, it is collisionless. So this particular complaint is a logical fallacy of the a red herring and/or straw man variety born of not understanding MOND well enough to criticize it accurately. Why bother to do that when you come to the problem already sure that MOND is wrong? I understand this line of thought extraordinarily well, because that’s the attitude I started with, and I’ve seen it repeated by many colleagues. The difference is that I bothered to educate myself.

A personal note – I will be on vacation next week, so won’t be quick to respond to comments.

Clusters of galaxies ruin everything

A common refrain I hear is that MOND works well in galaxies, but not in clusters of galaxies. The oft-unspoken but absolutely intended implication is that we can therefore dismiss MOND and never speak of it again. That’s silly.

Even if MOND is wrong, that it works as well as it does is surely telling us something. I would like to know why that is. Perhaps it has something to do with the nature of dark matter, but we need to engage with it to make sense of it. We will never make progress if we ignore it.

Like the seventeenth century cleric Paul Gerhardt, I’m a stickler for intellectual honesty:

“When a man lies, he murders some part of the world.”
Paul Gerhardt

I would extend this to ignoring facts. One should not only be truthful, but also as complete as possible. It does not suffice to be truthful about things that support a particular position while eliding unpleasant or unpopular facts^* that point in another direction. By ignoring the successes of MOND, we murder a part of the world.

Clusters of galaxies are problematic in different ways for different paradigms. Here I’ll recap three ways in which they point in different directions.

1. Cluster baryon fractions

An unpleasant fact for MOND is that it does not suffice to explain the mass discrepancy in clusters of galaxies. When we apply Milgrom’s formula to galaxies, it explains the discrepancy that is conventionally attributed to dark matter. When we apply MOND clusters, it comes up short. This has been known for a long time; here is a figure from the review Sanders & McGaugh (2002):

*Figure 10 from Sanders & McGaugh (2002)*: (Left) the Newtonian dynamical mass of clusters of galaxies within an observed cutoff radius (r_out) vs. the total observable mass in 93 X-ray-emitting clusters of galaxies (White et al. 1997). The solid line corresponds to Mdyn = Mobs (no discrepancy). (Right) the MOND dynamical mass within r_out vs. the total observable mass for the same X-ray-emitting clusters. From Sanders (1999).

The Newtonian dynamical mass exceeds what is seen in baryons (left). There is a missing mass problem in clusters. The inference is that the difference is made up by dark matter – presumably the same non-baryonic cold dark matter that we need in cosmology.

When we apply MOND, the data do not fall on the line of equality as they should (right panel). There is still excess mass. MOND suffers a missing baryon problem in clusters.

The common line of reasoning is that MOND still needs dark matter in clusters, so why consider it further? The whole point of MOND is to do away with the need of dark matter, so it is terrible if we need both! Why not just have dark matter?

This attitude was reinforced by the discovery of the Bullet Cluster. You can “see” the dark matter.

An artistic rendition of data for the Bullet Cluster. Pink represents hot *X-ray* emitting gas, blue the mass concentration inferred through gravitational lensing, and the optical image shows many galaxies. There are two clumps of galaxies that collided and passed through one another, getting ahead of the gas which shocked on impact and lags behind as a result. The gas of the smaller “bullet” subcluster shows a distinctive shock wave.

Of course, we can’t really see the dark matter. What we see is that the mass required by gravitational lensing observations exceeds what we see in normal matter: this is the same discrepancy that Zwicky first noticed in the 1930s. The important thing about the Bullet Cluster is that the mass is associated with the location of the galaxies, not with the gas.

The baryons that we know about in clusters are mostly in the gas, which outweighs the stars by roughly an order of magnitude. So we might expect, in a modified gravity theory like MOND, that the lensing signal would peak up on the gas, not the stars. That would be true, if the gas we see were indeed the majority of the baryons. We already knew from the first plot above that this is not the case.

I use the term missing baryons above intentionally. If one already believes in dark matter, then it is perfectly reasonable to infer that the unseen mass in clusters is the non-baryonic cold dark matter. But there is nothing about the data for clusters that requires this. There is also no reason to expect every baryon to be detected. So the unseen mass in clusters could just be ordinary matter that does not happen to be in a form we can readily detect.

I do not like the missing baryon hypothesis for clusters in MOND. I struggle to imagine how we could hide the required amount of baryonic mass, which is comparable to or exceeds the gas mass. But we know from the first figure that such a component is indicated. Indeed, the Bullet Cluster falls at the top end of the plots above, being one of the most massive objects known. From that perspective, it is perfectly ordinary: it shows the same discrepancy every other cluster shows. So the discovery of the Bullet was neither here nor there to me; it was just another example of the same problem. Indeed, it would have been weird if it hadn’t shown the same discrepancy that every other cluster showed. That it does so in a nifty visual is, well, nifty, but so what? I’m more concerned that the entire population of clusters shows a discrepancy than that this one nifty case does so.

The one new thing that the Bullet Cluster did teach us is that whatever the missing mass is, it is collisionless. The gas shocked when it collided, and lags behind the galaxies. Whatever the unseen mass is, is passed through unscathed, just like the galaxies. Anything with mass separated by lots of space will do that: stars, galaxies, cold dark matter particles, hard-to-see baryonic objects like brown dwarfs or black holes, or even massive [potentially sterile] neutrinos. All of those are logical possibilities, though none of them make a heck of a lot of sense.

As much as I dislike the possibility of unseen baryons, it is important to keep the history of the subject in mind. When Zwicky discovered the need for dark matter in clusters, the discrepancy was huge: a factor of a thousand. Some of that was due to having the distance scale wrong, but most of it was due to seeing only stars. It wasn’t until 40 some years later that we started to recognize that there was intracluster gas, and that it outweighed the stars. So for a long time, the mass ratio of dark to luminous mass was around 70:1 (using a modern distance scale), and we didn’t worry much about the absurd size of this number; mostly we just cited it as evidence that there had to be something massive and non-baryonic out there.

Really there were two missing mass problems in clusters: a baryonic missing mass problem, and a dynamical missing mass problem. Most of the baryons turned out to be in the form of intracluster gas, not stars. So the 70:1 ratio changed to 7:1. That’s a big change! It brings the ratio down from a silly number to something that is temptingly close to the universal baryon fraction of cosmology. Consequently, it becomes reasonable to believe that clusters are fair samples of the universe. All the baryons have been detected, and the remaining discrepancy is entirely due to non-baryonic cold dark matter.

That’s a relatively recent realization. For decades, we didn’t recognize that most of the normal matter in clusters was in an as-yet unseen form. There had been two distinct missing mass problems. Could it happen again? Have we really detected all the baryons, or are there still more lurking there to be discovered? I think it unlikely, but fifty years ago I would also have thought it unlikely that there would have been more mass in intracluster gas than in stars in galaxies. I was ten years old then, but it is clear from the literature that no one else was seriously worried about this at the time. Heck, when I first read Milgrom’s original paper on clusters, I thought he was engaging in wishful thinking to invoke the X-ray gas as possibly containing a lot of the mass. Turns out he was right; it just isn’t quite enough.

All that said, I nevertheless think the residual missing baryon problem MOND suffers in clusters is a serious one. I do not see a reasonable solution. Unfortunately, as I’ve discussed before, LCDM suffers an analogous missing baryon problem in galaxies, so pick your poison.

It is reasonable to imagine in LCDM that some of the missing baryons on galaxy scales are present in the form of warm/hot circum-galactic gas. We’ve been looking for that for a while, and have had some success – at least for bright galaxies where the discrepancy is modest. But the problem gets progressively worse for lower mass galaxies, so it is a bold presumption that the check-sum will work out. There is no indication (beyond faith) that it will, and the fact that it gets progressively worse for lower masses is a direct consequence of the data for galaxies looking like MOND rather than LCDM.

Consequently, both paradigms suffer a residual missing baryon problem. One is seen as fatal while the other is barely seen.

2. Cluster collision speeds

A novel thing the Bullet Cluster provides is a way to estimate the speed at which its subclusters collided. You can see the shock front in the X-ray gas in the picture above. The morphology of this feature is sensitive to the speed and other details of the collision. In order to reproduce it, the two subclusters had to collide head-on, in the plane of the sky (practically all the motion is transverse), and fast. I mean, really fast: nominally 4700 km/s. That is more than the virial speed of either cluster, and more than you would expect from dropping one object onto the other. How likely is this to happen?

There is now an enormous literature on this subject, which I won’t attempt to review. It was recognized early on that the high apparent collision speed was unlikely in LCDM. The chances of observing the bullet cluster even once in an LCDM universe range from merely unlikely (~10%) to completely absurd (< 3 x 10^-9). Answers this varied follow from what aspects of both observation and theory are considered, and the annoying fact that the distribution of collision speed probabilities plummets like a stone so that slightly different estimates of the “true” collision speed make a big difference to the inferred probability. What the “true” gravitationally induced collision speed is is somewhat uncertain because the hydrodynamics of the gas plays a role in shaping the shock morphology. There is a long debate about this which bores me; it boils down to it being easy to explain a few hundred extra km/s but hard to get up to the extra 1000 km/s that is needed.

At its simplest, we can imagine the two subclusters forming in the early universe, initially expanding apart along with the Hubble flow like everything else. At some point, their mutual attraction overcomes the expansion, and the two start to fall together. How fast can they get going in the time allotted?

The Bullet Cluster is one of the most massive systems in the universe, so there is lots of dark mass to accelerate the subclusters towards each other. The object is less massive in MOND, even spotting it some unseen baryons, but the long-range force is stronger. Which effect wins?

Gary Angus wrote a code to address this simple question both conventionally and in MOND. Turns out, the longer range force wins this race. MOND is good at making things go fast. While the collision speed of the Bullet Cluster is problematic for LCDM, it is rather natural in MOND. Here is a comparison:

A reasonable answer falls out of MOND with no fuss and no muss. There is room for some hydrodynamical⁺ high jinx, but it isn’t needed, and the amount that is reasonable makes an already reasonable result more reasonable, boosting the collision speed from the edge of the observed band to pretty much smack in the middle. This is the sort of thing that keeps me puzzled: much as I’d like to go with the flow and just accept that it has to be dark matter that’s correct, it seems like every time there is a big surprise in LCDM, MOND just does it. Why? This must be telling us something.

3. Cluster formation times

Structure is predicted to form earlier in MOND than in LCDM. This is true for both galaxies and clusters of galaxies. In his thesis, Jay Franck found lots of candidate clusters at redshifts higher than expected. Even groups of clusters:

**Figure 7** from Franck & McGaugh (2016). A group of four protocluster candidates at z = 3.5 that are proximate in space. The left panel is the sky association of the candidates, while the right panel shows their galaxy distribution along the LOS. The ellipses/boxes show the search volume boundaries (R_search = 20 cMpc, Δz ± 20 cMpc). Three of these (CCPC-z34-005, CCPC-z34-006, CCPC-z35-003) exist in a chain along the LOS stretching ≤120 cMpc. This may become a supercluster-sized structure at z = 0.

The cluster candidates at high redshift that Jay found are more common in the real universe than seen with mock observations made using the same techniques within the Millennium simulation. Their velocity dispersions are also larger than comparable simulated objects. This implies that the amount of mass that has assembled is larger than expected at that time in LCDM, or that speeds are boosted by something like MOND, or nothing has settled into anything like equilibrium yet. The last option seems most likely to me, but that doesn’t reconcile matters with LCDM, as we don’t see the same effect in the simulation.

MOND also predicts the early emergence of the cosmic web, which would explain the early appearance of very extended structures like the “big ring.” While some of these very large scale structures are probably not real, there seem to be a lot of such things being noted for all of them to be an illusion. The knee-jerk denials of all such structures reminds me of the shock cosmologists expressed at seeing quasars at redshifts as high as 4 (even 4.9! how can it be so?) or clusters are redshift 2, or the original CfA stickman, which surprised the bejeepers out of everybody in 1987. So many times I’ve been told that a thing can’t be true because it violates theoretician’s preconceptions, only for them to prove to be true, ultimately to be something the theorists expected all along.

Well, which is it?

So, as the title says, clusters ruin everything. The residual missing baryon problem that MOND suffers in clusters is both pernicious and persistent. It isn’t the outright falsification that many people presume it to be, but is sure don’t sit right. On the other hand, both the collision speeds of clusters (there are more examples now than just the Bullet Cluster) and the early appearance of clusters at high redshift is considerably more natural in MOND than In LCDM. So the data for clusters cuts both ways. Taking the most obvious interpretation of the Bullet Cluster data, this one object falsifies both LCDM and MOND.

As always, the conclusion one draws depends on how one weighs the different lines of evidence. This is always an invitation to the bane of cognitive dissonance, accepting that which supports our pre-existing world view and rejecting the validity of evidence that calls it into question. That’s why we have the scientific method. It was application of the scientific method that caused me to change my mind: maybe I was wrong to be so sure of the existence of cold dark matter? Maybe I’m wrong now to take MOND seriously? That’s why I’ve set criteria by which I would change my mind. What are yours?

^*In the discussion associated with a debate held at KITP in 2018, one particle physicist said “We should just stop talking about rotation curves.” Straight-up said it out loud! No notes, no irony, no recognition that the dark matter paradigm faces problems beyond rotation curves.

⁺There are now multiple examples of colliding cluster systems known. They’re a mess (Abell 520 is also called “the train wreck cluster“), so I won’t attempt to describe them all. In Angus & McGaugh (2008) we did note that MOND predicted that high collision speeds would be more frequent than in LCDM, and I have seen nothing to make me doubt that. Indeed, Xavier Hernandez pointed out to me that supersonic shocks like that of the Bullet Cluster are often observed, but basically never occur in cosmological simulations.

Discussion of Dark Matter and Modified Gravity

To start the new year, I provide a link to a discussion I had with Simon White on Phil Halper’s YouTube channel:

In this post I’ll say little that we don’t talk about, but will add some background and mildly amusing anecdotes. I’ll also try addressing the one point of factual disagreement. For the most part, Simon & I entirely agree about the relevant facts; what we’re discussing is the interpretation of those facts. It was a perfectly civil conversation, and I hope it can provide an example for how it is possible to have a positive discussion about a controversial topic⁺ without personal animus.

First, I’ll comment on the title, in particular the “vs.” This is not really Simon vs. me. This is a discussion between two scientists who are trying to understand how the universe works (no small ask!). We’ve been asked to advocate for different viewpoints, so one might call it “Dark Matter vs. MOND.” I expect Simon and I could swap sides and have an equally interesting discussion. One needs to be able to do that in order to not simply be a partisan hack. It’s not like MOND is my theory – I falsified my own hypothesis long ago, and got dragged reluctantly into this business for honestly reporting that Milgrom got right what I got wrong.

For those who don’t know, Simon White is one of the preeminent scholars working on cosmological computer simulations, having done important work on galaxy formation and structure formation, the baryon fraction in clusters, and the structure of dark matter halos (Simon is the W in NFW halos). He was a Reader at the Institute of Astronomy at the University of Cambridge where we overlapped (it was my first postdoc) before he moved on to become the director of the Max Planck Institute for Astrophysics where he was mentor to many people now working in the field.

That’s a very short summary of a long and distinguished career; Simon has done lots of other things. I highlight these works because they came up at some point in our discussion. Davis, Efstathiou, Frenk, & White are the “gang of four” that was mentioned; around Cambridge I also occasionally heard them referred to as the Cold Dark Mafia. The baryon fraction of clusters was one of the key observations that led from SCDM to LCDM.

The subject of galaxy formation runs throughout our discussion. It is always a fraught issue how things form in astronomy. It is one thing to understand how stars evolve, once made; making them in the first place is another matter. Hard as that is to do in simulations, galaxy formation involves the extra element of dark matter in an expanding universe. Understanding how galaxies come to be is essential to predicting anything about what they are now, at least in the context of LCDM^*. Both Simon and I have worked on this subject our entire careers, in very much the same framework if from different perspectives – by which I mean he is a theorist who does some observational work while I’m an observer who does some theory, not LCDM vs. MOND.

When Simon moved to Max Planck, the center of galaxy formation work moved as well – it seemed like he took half of Cambridge astronomy with him. This included my then-office mate, Houjun Mo. At one point I refer to the paper Mo & I wrote on the clustering of low surface brightness galaxies and how I expected them to reside in late-forming dark matter halos^**. I often cite Mo, Mao, & White as a touchstone of galaxy formation theory in LCDM; they subsequently wrote an entire textbook about it. (I was already warning them then that I didn’t think their explanations of the Tully-Fisher relation were viable, at least not when combined with the effect we have subsequently named the diversity of rotation curve shapes.)

When I first began to worry that we were barking up the wrong tree with dark matter, I asked myself what could falsify it. It was hard to come up with good answers, and I worried it wasn’t falsifiable. So I started asking other people what would falsify cold dark matter. Most did not answer. They often had a shocked look like they’d never thought about it, and would rather not^***. It’s a bind: no one wants it to be false, but most everyone accepts that for it to qualify as physical science it should be falsifiable. So it was a question that always provoked a record-scratch moment in which most scientists simply freeze up.

Simon was one of the first to give a straight answer to this question without hesitation, circa 1999. At that point it was clear that dark matter halos formed central density cusps in simulations; so those “cusps had to exist” in the centers of galaxies. At that point, we believed that to mean all galaxies. The question was complicated by the large dynamical contribution of stars in high surface brightness galaxies, but low surface brightness galaxies were dark matter dominated down to small radii. So we thought these were the ideal place to test the cusp hypothesis.

We no longer believe that. After many attempts at evasion, cold dark matter failed this test; feedback was invoked, and the goalposts started to move. There is now a consensus among simulators that feedback in intermediate mass galaxies can alter the inner mass distribution of dark matter halos. Exactly how this happens depends on who you ask, but it is at least possible to explain the absence of the predicted cusps. This goes in the right direction to explain some data, but by itself does not suffice to address the thornier question of why the distribution of baryons is predictive of the kinematics even when the mass is dominated by dark matter. This is why the discussion focused on the lowest mass galaxies where there hasn’t been enough star formation to drive the feedback necessary to alter cusps. Some of these galaxies can be described as having cusps, but probably not all. Thinking only in those terms elides the fact that MOND has a better record of predictive success. I want to know why this happens; it must surely be telling us something important about how the universe works.

The one point of factual disagreement we encountered had to do with the mass profile of galaxies at large radii as traced by gravitational lensing. It is always necessary to agree on the facts before debating their interpretation, so we didn’t press this far. Afterwards, Simon sent a citation to what he was talking about: this paper by Wang et al. (2016). In particular, look at their Fig. 4:

Fig. 4 of Wang et al. (2016). The excess surface density inferred from gravitational lensing for galaxies in different mass bins (data points) compared to mock observations of the same quantity made from within a simulation (lines). Looks like excellent agreement.

This plot quantifies the mass distribution around isolated galaxies to very large scales. There is good agreement between the lensing observations and the mock observations made within a simulation. Indeed, one can see an initial downward bend corresponding to the outer part of an NFW halo (the “one-halo term”), then an inflection to different behavior due to the presence of surrounding dark matter halos (the “two-halo term”). This is what Simon was talking about when he said gravitational lensing was in good agreement with LCDM.

I was thinking of a different, closely related result. I had in mind the work of Brouwer et al. (2021), which I discussed previously. Very recently, Dr. Tobias Mistele has made a revised analysis of these data. That’s worthy its own post, so I’ll leave out the details, which can be found in this preprint. The bottom line is in Fig. 2, which shows the radial acceleration relation derived from gravitational lensing around isolated galaxies:

*The radial acceleration relation from weak gravitational lensing (colored points) extending existing kinematic data (grey points) to lower acceleration corresponding to very large radii (~ 1 Mpc)*. The dashed line is the prediction of MOND. Looks like excellent agreement.

This plot quantifies the radial acceleration due to the gravitational potential of isolated galaxies to very low accelerations. There is good agreement between the lensing observations and the extrapolation of the radial acceleration relation predicted by MOND. There are no features until extremely low acceleration where there may be a hint of the external field effect. This is what I was talking about when I said gravitational lensing was in good agreement with MOND, and that the data indicated a single halo with an r^-2 density profile that extends far out where we ought to see the r^-3 behavior of NFW.

The two plots above use the same method applied to the same kind of data. They should be consistent, yet they seem to tell a different story. This is the point of factual disagreement Simon and I had, so we let it be. No point in arguing about the interpretation when you can’t agree on the facts.

I do not know why these results differ, and I’m not going to attempt to solve it here. I suspect it has something to do with sample selection. Both studies rely on isolated galaxies, but how do we define that? How well do we achieve the goal of identifying isolated galaxies? No galaxy is an island; at some level, there is always a neighbor. But is it massive enough to perturb the lensing signal, or can we successfully define samples of galaxies that are effectively isolated, so that we’re only looking at the gravitational potential of that galaxy and not that of it plus some neighbors? Looks like there is some work left to do to sort this out.

Stepping back from that, we agreed on pretty much everything else. MOND as a fundamental theory remains incomplete. LCDM requires us to believe that 95% of the mass-energy content of the universe is something unknown and perhaps unknowable. Dark matter has become familiar as a term but remains a mystery so long as it goes undetected in the laboratory. Perhaps it exists and cannot be detected – this is a logical possibility – but that would be the least satisfactory result possible: we might as well resume counting angels on the head of a pin.

The community has been working on these issues for a long time. I have been working on this for a long time. It is a big problem. There is lots left to do.

⁺I get a lot of kill the messenger from people who are not capable of discussing controversial topics without personal animus. A lot – inevitably from people who know assume they know more about the subject than I do but actually know much less. It is really amazing how many scientists equate me as a person with MOND as a theory without bothering to do any fact-checking. This is logical fallacy 101.

^*The predictions of MOND are insensitive to the details of galaxy formation. Though of course an interesting question, we don’t need that in order to make predictions. All we need is the mass distribution that the kinematics respond to – we don’t need to know how it got that way. This is like the solar system, where it suffices to know Newton’s laws to compute orbits; we don’t need to know how the sun and planets formed. In contrast, one needs to know how a galaxy was assembled in LCDM to have any hope of predicting what its distribution of dark matter is and then using that to predict kinematics.

^**The ideas Mo & I discussed thirty years ago have reappeared in the literature under the designation “assembly bias.”

^***It was often accompanied by “why would you even ask that?” followed by a pained, constipated expression when they realized that every physical theory has to answer that question.

Holiday Concordance

Screw the Earth and its smoking habit. The end of 2023 approaches, so let’s talk about the whole universe, which is its own special kind of mess.

As I’ve related before, our current cosmology, LCDM, was established over the course of the 1990s through a steady drip, drip, drip of results in observational cosmology – what Peebles calls the classic cosmological tests. There were many contributory results; I’m not going to attempt to go through them all. Important among them were the age problem, the realization that the mass density was lower than expected, and that there was more structure on large scales⁺ than predicted. These established LCDM in the mid-1990s as the “concordance model” – the most probable flavor of FLRW universe. Here is the key figure from Ostriker & Steinhardt depicting the then-allowed region of the density parameter and Hubble constant:

The addition of the cosmological constant to the standard model – replacing SCDM with LCDM – was a brain-wrenching ordeal. Lambda had long been anathema, and there was a region in which an open universe was possible, even reasonable (stripes over shade in the figure above). Moreover, this strange new LCDM made the seemingly inconceivable prediction that not only was the universe expanding [itself the older mind-bender brought to us by Hubble (and Slipher and Lemaître)], the expansion rate should be accelerating. This sounded like crazy talk at the time, so it was greeted with great rejoicing when corroborated by observations of Type Ia supernovae.

A further prediction that could distinguish LCDM from then-viable open models was the geometry of the universe. Open models have a negative curvature (Ω_k < 0, in which initially parallel light beams diverge) while the geometry in LCDM should be uniquely flat (Ω_k = 0, in which initially parallel light beams remain parallel forever). Uniqueness is important, as it makes for a strong prediction, such as the location of the first peak of the acoustic power spectrum of the cosmic microwave background. In LCDM, this location was predicted to be ℓ ≈ 200 with little flexibility. For viable open models, it was more like ℓ ≈ 800 with a great deal of flexibility. The interpretation of the supernova data relied heavily on the assumption of a flat geometry, so I recall breathing a sigh of relief^* when ℓ ≈ 200 was clearly observed.

Where are we now? I decided to reconstruct the Ostriker & Steinhardt plot with modern data. Here it is, with the axes swapped for reasons unrelated to this post. Deal with it.

The concordance region (white space) in the mass density-expansion rate space where the allowed regions (colored bands) of many constraints intersect. Illustrated constraints include a direct measurement of the Hubble constant, the age of the universe, the cluster baryon fraction, and large scale structure. Also shown are the best-fit values from CMB fits labeled by their date of publication (WMAP in orange; Planck in yellow). These follow the green line of constant Ω_mH₀³; combinations of parameters along the line are tolerable but regions away from it are strongly excluded.

There is lots to be said here. First, note the scale. As the accuracy of data have improved, it has become possible to zoom in. My version of the figure is a wee postage stamp on that of Ostriker & Steinhardt. Nevertheless, the concordance region is in pretty much the same spot. Not exactly, of course; the biggest thing that has changed is that the age constraint is now completely incompatible with an open universe, so I haven’t bothered depicting it. Indeed, for the illustrated Hubble constant, the Hubble time (the age of a completely empty, “coasting” universe) is 13.4 Gyr. This is consistent with the illustrated age (13.80 ± 0.75 Gyr) only for Ω_m ≈ 0, which is far off the left edge of the plot.

Second, the CMB best-fit values follow a line of constant Ω_mH₀³. This is a deep trench in χ² space. The region outside this trench is strongly excluded – it’s kinda the grand canyon of cosmology. Even a little off, and you’re standing on the rim looking a long way down, knowing that a much better fit is only a short step away. Once you’re in the valley of χ², one must hunt along its bottom to find the true minimum. In the mid-`00s, a decade after Ostriker & Steinhardt, the best fit fell smack in the middle of the concordance region defined by completely independent data. It was this additional concordance that impressed me most, more than the detailed CMB fits themselves. This convinced the vast majority of scientists practicing in the field that it had to be LCDM and could only be LCDM and nothing but LCDM.

Since that time, the best-fit CMB value has wandered down the trench, away from the concordance region. These are the results that changed, not everything else. This temporal variation suggests a systematic in the interpretation of the CMB data rather than in the local distance scale.

I recall being at a conference (the Bright & Dark Universe in Naples in 2017) when the latest Planck results were announced. There was a palpable sense in the audience of having been whacked by a blunt object, like walking into a closed door you thought was open. We’d been doing precision cosmology for a long time and had settled on an answer informed by lots of independent lines of evidence, but they were telling us the One True answer was off over there. Not crazy far, but not consistent with the concordance we had come to expect. Worse, they had these crazy tiny error bars – not only were they getting an answer outside the concordance region, it was in tension with pretty much everything else. Not strong tension, but enough to make us all uncomfortable if not outright object. Indeed, there was a definite vibe that people were afraid to object. Not terrified, but nervous. Worried about being on the wrong side of the community. I get it. I know a lot about that.

People are remarkably talented at refashioning the past. Over the past five years, the Planck best-fit parameters have come to be synonymous with LCDM: all else is moot. Young scientists can be forgiven for not realizing it was ever otherwise, just as they might have been taught that cosmic acceleration was discovered by the supernova experiments totally out of the blue. These are convenient oversimplifications that elide so many pertinent events as to be tantamount to gaslighting. We refashion the past until there was never a serious controversy, then it seems strange that some of us think there still is. Sorry, not so fast, there definitely is: if you use the Planck value of the Hubble constant to estimate distances to local galaxies, you will get it wrong^%, along with all distance-dependent quantities.

I’m old enough to remember a time when there was a factor of two uncertainty in the Hubble constant (50 vs. 1000) and the age constraint was the most accurate one in this plot. Thanks to genuine progress, the Hubble constant is now the more precise. Consequently, of all the data one could plot above, this is the choice that matters most to where the concordance region falls. If I adopt our own estimate (H₀ = 75.1 ± 2.3 km/s/Mpc), then the concordance band gets wider and slides up a little but is basically the same as above. If instead I adopt the lowest highly accurate value, H₀ = 69.8 ± 0.8 km/s/Mpc, the window slides down, but not enough to be consistent with the Planck results. Indeed, it stays to the left of the CMB constraint, becoming inconsistent with the mass density as well as the expansion rate.

Dang it, now I want to make that plot. Processing… OK, here it is:

As above, but with a lower measurement of H₀. Only the range of statistical uncertainty is illustrated as a systematic uncertainty corresponds to a calibration error that slides H₀ up and down – i.e., the exact situation being illustrated relative to the figure above. These two plots illustrate the range of outcomes that are possible from slightly discordant direct modern measurements of the Hubble constant; it is hard to go lower. Doing so doesn’t really help as it would just shift the tension from H₀ to Ω_m.

Yes, as I expected: the allowed range slides down but remains to the left of the green line. It is less inconsistent with the Planck H₀, but that isn’t the only thing that matters. It is also inconsistent with the matter density. Indeed, it misses the CMB-allowed trench entirely. There is no allowed FLRW universe here.

These are only two parameters. Though arguably the most important, there are others, all of which matter to CMB fits. These are difficult to visualize simultaneously. We could, for starters, plot the baryon density as a third axis. If we did so, the concordance region would become a 3D object. It would also get squeezed, depending on what we think the baryon density actually is. Even restricting ourselves to the above-plotted constraints, there is some tension between the cluster baryon fraction and large scale structure constraint along the new third axis. I’m sure I could find in the literature more or less consistent values; this way the madness of cherry-picking lies.

There are many other constraints that could be added here. I’ve tried to stay consistent with the spirit of the original plot without making it illegible by overburdening it with lots and lots of data that all say pretty much the same thing. Nor do I wish to engage in cherry-picking. There are so many results out there that I’m sure one could find some combination that slides the allowed box this way or that – but only a little.

Whenever I’ve taught cosmology, I’ve made it a class exercise^$ to investigate diagrams like this, with each student choosing an observational constraint to explore and champion. as a result, I’ve seen many variations on the above plots over the years, but since I first taught it in 1999 they’ve always been consistent with pretty much the same concordance region. It often happens that there is no concordance region; there are so many constraints that when you put them all together, nothing is left. We then debate which results to believe, or not, a process that has always been a part of the practice of cosmology.

We have painted ourselves into a corner. The usual interpretation is that we have painted ourselves into the correct corner: we live in this strange LCDM universe. It is also possible that there really is nothing left, the concordance window is closed, and we’ve falsified FLRW cosmology. That is a fate most fear to contemplate, and it seems less likely than mistakes in some discordant results, so we inevitably go down the path of cognitive dissonance, giving more credence to results that are consistent with our favorite set of LCDM parameters and less to those that do not. This is widely done without contemplating the possibility that the weird FLRW parameters we’ve ended up with are weird because they are just an approximation to some deeper theory.

So, as 2023 winds to an end, we [still] know pretty well what the parameters of cosmology are. While the tension between H₀ = 67 and 73 km/s/Mpc is real, it seems like small beans compared to the successful isolation of a narrow concordance window. Sure beats arguing between 50 and 100! Even deciding which concordance window is right seems like a small matter compared to the deeper issues raised by LCDM: what is the cold dark matter? Does it really exist, or is it just a mythical entity we’ve invented for the convenient calculation of cosmic quantities? What the heck do we even mean by Lambda? Does the whole picture hang together so well that it must be correct? Or can it be falsified? Has it already been? How do we decide?

I’m sure we’ll be arguing over these questions for a long time to come.

⁺Structure formation is often depicted as a great success of cosmology, but it was the failure of the previous standard model, SCDM, to predict enough structure on large scales that led to its demise and its replacement by LCDM, which now faces a similar problem. The observer’s experience has consistently been that there is more structure in place earlier and on larger scales than had been anticipated before its observation.

^*I believe in giving theories credit where credit is due. Putting on a cosmologist’s hat, the location of the first peak was a great success of LCDM. It was the amplitude of the second peak that came as a great surprise – unless you can take off the cosmology hat and don a MOND hat – then it was predicted. What is surprising from that perspective is the amplitude of the third peak, which makes more sense in LCDM. It seems impossible to some people that I can wear both hats without my head exploding, so they seem to simply assume I don’t think about it from their perspective when in reality it is the other way around.

^%As adjudicated by galaxies with distances known from direct measurements provided by Cepheids or the tip of the red giant branch or surface brightness fluctuations or geometric methods, etc., etc., etc.

^$This is a great exercise, but only works if CMB results are excluded. There has to be some narrative suspense: will the various disparate lines of evidence indeed line up? Since CMB fits constrain all parameters simultaneously, and brook no dissent, they suck the joy away from everything else in the sky and drain all interest in the debate.

Full speed in reverse!

People have been asking me about comments in a recent video by Sabine Hossenfelder. I have not watched it, but the quote I’m asked about is “the higher the uncertainty of the data, the better MOND seems to work” with the implication that this might mean that MOND is a systematic artifact of data interpretation. I believe, because they consulted me about it, that the origin of this claim emerged from recent work by Sabine’s student Maria Khelashvili on fitting the SPARC data.

Let me address the point about data interpretation first. Fitting the SPARC data had exactly nothing to do with attracting my attention to MOND. Detailed MOND fits to these data are not particularly important in the overall scheme of these things as I’ll discuss in excruciating detail below. Indeed, these data didn’t even exist until relatively recently.

It may, at this juncture in time, surprise some readers to learn that I was once a strong advocate for cold dark matter. I was, like many of its current advocates, rather derisive of alternatives, the most prominent at the time being baryonic dark matter. What attracted my attention to MOND was that it made a priori predictions that were corroborated, quite unexpectedly, in my data for low surface brightness galaxies. These results were surprising in terms of dark matter then and to this day remain difficult to understand. After a lot of struggle to save dark matter, I realized that the best we could hope to do with dark matter was to contrive a model that reproduced after the fact what MOND had predicted a priori. That can never be satisfactory.

So – I changed my mind. I admitted that I had been wrong to be so completely sure that the solution to the missing mass problem had to be some new form of non-baryonic dark matter. It was not easy to accept this possibility. It required lengthy and tremendous effort to admit that Milgrom had got right something that the rest of us had got wrong. But he had – his predictions came true, so what was I supposed to say? That he was wrong?

Perhaps I am wrong to take MOND seriously? I would love to be able to honestly say it is wrong so I can stop having this argument over and over. I’ve stipulated the conditions whereby I would change my mind to again believe that dark matter is indeed the better option. These conditions have not been met. Few dark matter advocates have answered the challenge to stipulate what could change their minds.

People seem to have become obsessed with making fits to data. That’s great, but it is not fundamental. Making a priori predictions is fundamental, and has nothing to do with fitting data. By construction, the prediction comes before the data. Perhaps this is one way to distinguish between incremental and revolutionary science. Fitting data is incremental science that seeks the best version of an accepted paradigm. Successful predictions are the hallmark of revolutionary science that make one take notice and say, hey, maybe something entirely different is going on.

One of the predictions of MOND is that the RAR should exist. It was not expected in dark matter. As a quick review of the history, here is the RAR as it was known in 2004 and now (as of 2016):

*The radial acceleration relation constructed from data available in 2004 and that from 2016.*

The big improvement provided by SPARC was a uniform estimate of the stellar mass surface density of galaxies based on Spitzer near-infrared data. These are what are used to construct the x-axis: g_bar is what Newton predicts for the observed mass distribution. SPARC was a vast improvement over the optical data we had previously, to the point that the intrinsic scatter is negligibly small: the observed scatter can be attributed to the various uncertainties and the expected scatter in stellar mass-to-light ratios. The latter never goes away, but did turn out to be at the low end of the range we expected. It could easily have looked worse, as it did in 2004, even if the underlying physical relation was perfect.

Negligibly small intrinsic scatter is the best one can hope to find. The issue now is the fit quality to individual galaxies (not just the group plot above). We already know MOND fits rotation curve data. The claim that appears in Dr. Hossenfelder’s video boils down to dark matter providing better fits. This would be important if it told us something about nature. It does not. All it teaches us about is the hazards of fitting data for which the errors are not well behaved.

While SPARC provides a robust estimate of g_bar, g_obs is based on a heterogeneous set of rotation curves drawn from a literature spanning decades. The error bars on these rotation curves have not been estimated in a uniform way, so we cannot blindly fit the data with our favorite software tool and expect that to teach us something about physical reality. I find myself having to say this to physicists over and over and over and over and over again: you cannot trust astronomical error bars to behave as Gaussian random variables the way one would like and expect in a controlled laboratory setting.

Astronomy is not conducted in a controlled laboratory. It is an observational science. We cannot put the entire universe in a box and control all the variables. We can hope to improve the data and approach this ideal, but right now we’re nowhere near it. These fitting analyses assume that we are.

Screw it. I really am sick of explaining this over and over, so I’m just going to cut & paste verbatim what I told Hossenfelder & Khelashvili by email when they asked. This is not the first time I’ve written an email like this, and I’m sure it won’t be the last.

Excruciating details: what I said to Hossenfelder & Khelashvili about the perils of rotation curve fitting on 22 September 2023 in response for their request for comments on the draft of the relevant paper:

First, the work of Desmond is a good place to look for an opinion independent of mine.

Second, in my experience, the fit quality you find is what I’ve found before: DM halos with a constant density core consistently give the best fits in terms of chi^2, then MOND, then NFW. The success of cored DM halos happens because it is an extremely flexible fitting function: the core radius and core density can be traded off to fit any dog’s leg, and is highly degenerate with the stellar M*/L. NFW works less well because it has a less flexible shape. But both work because they have more parameters [than MOND].

Third, statistics will not save us here. I once hoped that the BIC would sort this out, but having gone down that road, I believe the BIC does not penalize models sufficiently for adding free parameters. You allude to this at the end of section 3.2. When you go from MOND (with fixed a₀ it has only one parameter, M*/L, to fit to account for everything) to a dark matter halo (which has at a minimum 3 parameters: M*/L plus two to describe the halo) then you gain an enormous amount of freedom – the volume of possible parameter space grows enormously. But the BIC just says if you had 20 degrees of freedom before, now you have 22. That does not remotely represent the amount of flexibility that represents: some free parameters are more equal than others. MOND fits and DM halo fits are not the same beast; we can’t compare them this way any more than we can compare apples and snails.

Worse, to do this right requires that the uncertainties be real random errors. They are not. SPARC provides homogeneous mass models based on near-IR observations of the stellar mass distribution. Those should be OK to the extent that near-IR light == stellar mass. That is a decent mapping, but not perfect. Consequently, we expect the occasional galaxy to misbehave. UGC 128 is a case where the MOND fit was great with optical data then became terrible with near-IR data. The absolute difference in the data are not great, but in terms of the formal chi^2 it is. So is that a failure of the model, or of the data to represent what we want it to represent?

This happens all the time in astronomy. Here, we want to know the circular velocity of a test particle in the gravitational potential predicted by the baryonic mass distribution. We never measure either of those quantities. What we measure is the (i) stellar light distribution and the (ii) Doppler velocities of gas. We assume we can map stellar light to stellar mass and Doppler velocity to orbital speed, but no mass model is perfect, nor is any patch of observed gas guaranteed to be on a purely circular orbit. These are known unknowns: uncertainties that we know are real but we cannot easily quantify. These assumptions that we have to make to do the analysis dominate over the random errors in many cases. We also assume that galaxies are in dynamical equilibrium, but 20% of spirals show gross side-to-side asymmetries, and at least 50% mild ones. So what is the circular motion in those cases? (F579-1 is a good example)

While SPARC is homogeneous in its photometry, it is extremely heterogeneous in its rotation curve measurements. We’re working on fixing that, but it’ll take a while. Consequently, as you note, some galaxies have little constraining power while others appear to have lots. That’s because many of the rotation curve velocity uncertainties are either grossly over or underestimated. To see this, plot the cumulative distribution of chi^2 for any of your models (or see the CDF published by Li et al 2018 for the RAR and Li et al 2020 for dark matter halos of many flavors. So many, I can’t recall how many CDF we published.) Anyway, for a good model, chi^2 is always close to one, so the CDF should go up sharply and reach one quickly – there shouldn’t be many cases with very low chi^2 or very high chi^2. Unfortunately, rotation curve data do not do this for any type of model. There are always way too many cases with chi^2 << 1 and also too many with chi^2 >> 1. One might conclude that all models are unacceptable – or that the error bars are Messed Up. I think the second option is the case. If so, then this sort of analysis will always have the power to mislead.

I insert Fig. 1 from Li et al. (2020) so you don’t have to go look it up. The CDF of a statistically good model would rise sharply, being an almost vertical line at chi^2 = 1. No model of any flavor does that. That’s in large part because the uncertainties on some rotation curves are too large, while those on others are too small. The greater flexibility of dark matter models make them incrementally better than MOND for the cases with error bars that are too small – hence the corollary statement that “the higher the uncertainty of the data, the better MOND seems to work.” *This happens because dark matter models are allowed to chase bogus outliers with tiny error bars in a way that MOND cannot. That doesn’t make dark matter better, it just makes it is easier to fool.*

A key thing to watch out for is the outsized effects of a few points with tiny error bars. Among galaxies with high chi^2, what often happens is that there is one point with a tiny error bar that does not agree with any of the rest of the data for any smoothly continuous rotation curve. Fitting programs penalize a model for missing this point by many sigma, so will do anything they can to make it better. So what happens is that if you let a₀ vary with a flat prior, it will got to some very silly values in order to buy a tiny improvement in chi^2. Formally, that’s a better fit, so you say OK, a₀ has to vary. But if you plot the fitted RCs with fixed and variable a₀, you will be hard pressed to see the difference. Chi^2 is different, sure, but both will have chi^2 >> 1, so a lousy fit either way, and we haven’t really gained anything meaningful from allowing for the greater fitting freedom. Really it is just that one point that is Wrong even though it has a tiny error bar – which you can see relative to the other points, never mind the model. Dark matter halos have more flexibility from the beginning, so this is less obvious for them even though the same thing happens.

So that’s another big point – what is the prior for a dark matter halo? [Your] Table 1 allows V₂₀₀ and C₂₀₀ to be pretty much anything. So yes, you will find a fit from that range. For Burkert halos, there is no prior, since these do not emerge from any theory – they’re just a flexible French curve. For NFW halos, there is a prior from cosmology – see McGaugh et al (2007) among a zillion other possible references, including Li et al (2020). In any[L]CDM cosmology, the parameters V200 and C200 correlate – they are not independent. So a reasonable prior would be a Gaussian in log(C200) at a given V200 as specified by some simulation (Macio et al; see Li et al 2020). Another prior is how V200 (or M200) relates to the observed baryonic mass (or stellar mass). This one is pretty dodgy. Originally, we expected a fixed ratio between baryonic and dark mass. So when I did this kind of analysis in the ’90s, I found NFW flunked hard compared to MOND. (I didn’t know about the BIC then.) Galaxy DM halos simply do not look like NFW halos that form in LCDM and host galaxies with a few percent of their mass in the luminous disk even though this was the standard model for many years (Mo, Mao, & White 1998). If we drop the assumption that luminous galaxies are always a fixed fraction of their dark matter halos, then better fits can be obtained. I suspect your uniform prior fits have halo masses all over the place; they probably don’t correlate well with the baryonic mass, nor are their C and V200 parameters likely to correlate as they are predicted to do. You could apply the expected mass-concentration and stellar mass-halo mass relations as priors, then NFW will come off worse in your analysis because you’ve restricted them to where they ought to live.

So, as you say – it all comes down to the prior.

Even applying a stellar mass-halo mass relation from abundance matching isn’t really independent information, though that’s the best you can hope to do. But I was saying 20+ years ago that fixed mass ratios wouldn’t work, but nobody then wanted to abandon that obvious assumption. Since then, they’ve been forced to do so. But there is no good physical reason for it (feedback is the deus ex machina of all problems in the field), what happened is that the data forced us to drop the obvious assumption. Data including kinematic data (McGaugh et al 2010). So adopting a modern stellar mass-halo mass relation will give you a stronger prior than a uniform prior, but that choice has already been informed by the kinematic data that you’re trying to fit. How do we properly penalize the model for cheating about its “prior” by peaking at past data?

So, as you say – it all comes down to the prior. I think it would be important here to better constrain the priors on the DM halo fits. Li et al (2020) discuss this. Even then we’re not done, because galaxy formation modifies the form of the halo function we’re fitting. They shouldn’t end up as NFW even if they start out that way – see Li et al 2022a & b. Those papers consider the inevitable effects of adiabatic compression, but not of feedback. If feedback really has the effects on DM halos that is frequently advertised, then neither NFW or Burkert are appropriate fitting functions – they’re not what LCDM+feedback predicts. Good luck extracting a legitimate prediction from simulations, though. So we’re stuck doing what you’re trying to do: adopt some functional form to represent the DM halo, and see what fits. What you’ve done here agrees with my experience: cored DM halos work best. But they don’t represent an LCDM prediction, or any other broader theory, so – so what?

Another detail to be wary of – the radial range over which the RC data constrain the DM halo fit is often rather limited compared to the size of the halo. To complicate matters further, the inner regions are often star-dominated, so there is not much of a handle on DM from where the data are best, at least beyond many galaxies preferring not to have a cusp since the stars already get the job done at small R. So, one ends up with V_DM(R) constrained from 3% to 10% of the virial radius, or something like that. V200 and C200 are defined at the notional virial radius, so there are many combinations of these parameters that might adequately fit the observed range while being quite different elsewhere. Even worse, NFW halos are pretty self-similar – there are combinations of (C200,V200) that are highly degenerate, so you can’t really tell the difference between them even with excellent data – the confidence contours look like bananas in C200-V200 space, with low C/high V often being as good as high C/low V. Even even even worse is that the observed V_DM(R) is often approximately a straight line. Any function looks like a straight line if you stretch it out enough. Consequently, the fits to LSB galaxies often tend to absurdly low C and high V200: NFW never looks like a straight line, but it does if you blow it up enough. So one ends up inferring that the halo masses of tiny galaxies are nearly as big as those of huge galaxies, or more so! My favorite example was NGC 3109, a tiny dwarf on the edge of the Local Group. A straight NFW fit suggests that the halo of this one little galaxy weighs more than the entire Local Group, M31 + MW + everything else combined. This is the sort of absurd result that comes from fitting the NFW halo form to a limited radial range of data.

I don’t know that this helps you much, but you see a few of the concerns.

Triton Station

A Blog About the Science and Sociology of Cosmology and Dark Matter

Category: Data Interpretation