The baryons are mostly in the intergalactic medium. Mostly.

The baryons are mostly in the intergalactic medium. Mostly.

My colleague Jim Schombert pointed out a nifty new result published in Nature Astronomy which you probably can’t access so here is a link to what looks to be the preliminary version. The authors use the Deep Synoptic Array (DSA) to discover some new Fast Radio Bursts (FRBs), many of which are apparently in galaxies at large enough distances to provide an interesting probe of the intervening intergalactic medium (IGM).

There is lots that’s new and cool here. The DSA-110 is able to localize FRBs well enough to figure out where they are, which is an interesting challenge and impressive technological accomplishment. FRBs themselves remain something of a mystery. The are observed as short (typically millisecond), high intensity pulses of very low frequency radio emission, typically 1,400 MHz or less. What causes these pulses isn’t entirely clear, but they might be produced in the absurdly intense magnetic fields around some neutron stars.

FRBs are intrinsically luminous – lots of energy packed into a short burst – so can be detected from cosmological distances. The trick is to find them (blink and miss it!) and also to localize them on the sky. That’s challenging to do at these frequencies well enough to uniquely associate them with optical sources like candidate host galaxies. To quote from their website, “DSA-110 is a radio interferometer purpose-built for fast radio burst (FRB) detection and direct localization.” It was literally made to do this.

Connor et al. analyze dozens of known and report nine new FRBs covering enough of the sky to probe an interesting cosmological volume. Host galaxies with known redshifts define a web of pencil-beam probes – the paths that the radio waves have to traverse to get here. Low frequency radio waves are incredibly useful as a probe of the intervening space because they are sensitive to the density of intervening electrons, providing a measure of how many there are between us and each FRB.

Most of intergalactic space is so empty that the average density of matter is orders of magnitude lower than the best vacuum we can achieve in the laboratory. But there is some matter there, and of course intergalactic space is huge, so even low densities might add up to a lot. This provides a good way to find out how much.

The speed of light is the ultimate speed limit, in a vacuum. When propagating through a medium like glass or water, the effective speed of light is reduced by the index of refraction. For low frequency radio waves, the exceedingly low density of free electrons of the IGM suffice to slow them down a bit. This effect, called the dispersion measure, is frequency dependent. It usually comes up in the context of pulsars for which the width of their pulses is spread by the effect, but it works for any radio source with appropriate observable frequencies, like FRBs. The dispersion measure tells us the product of the distance and the density traversed along the line of sight to the source, so is usually expressed in typical obscure astronomical fashion as pc cm-3. This is really a column density, the number per square cm, but with host galaxies of known redshift the distance in known independently and we get a measure of the average electron volume density along the line of sight.

That’s it. That by itself provides a good measure of the density of intergalactic matter. The IGM is highly ionized, with a neutral fraction < 10-4, so counting electrons is the same as counting atoms. (Not every nucleus is hydrogen, so they adopt 0.875 electrons per baryon to account for the neutrons in helium and heavier elements. We know the neutral fraction is low in the IGM because hydrogen is incredibly opaque to ultraviolet radiation: absorption would easily be seen, yet there is no Gunn-Peterson trough until z > 6.) This leads to a baryon density of ΩBh2 = 0.025 ± 0.003, which is 5% of the critical density for a reasonable Hubble parameter of h = 0.7.

This solves the cosmic missing baryon problem. There had been an order of magnitude discrepancy when most of the baryons we knew about were in stars. It gradually became clear that many of the baryons were in various forms of tenuous plasma in the space between galaxies, for example in the Lyman alpha forest, but these didn’t account for everything so a decade ago a third of the baryons expected from BBN were still unaccounted for in the overall baryon budget. Now that checksum is complete. Indeed, if anything, we now have a small (if not statistically significant) baryon surplus+.

Here is a graphic representing the distribution of baryons among the various reservoirs. Connor et al. find that the fraction in the intergalactic medium is fIGM = 0.76 +0.10/-0.11. Three quarters of the baryons are Out There, spread incredibly thin throughout the vastness of cosmic space, with an absolute density of a few x 10-31 g cm-3, which is about one atom per cubic meter. Most of the atoms are hydrogen, so “normal” for most of the universe is one proton and one electron in a box a meter across rather than the 10-10 m occupied by a bound hydrogen atom. That’s a whole lot of empty.

Connor et al. assess that about 3/4 of all baryons are in the intergalactic medium (IGM), give or take 10% – the side bars illustrate the range of uncertainty. Many of the remaining baryons are in other forms of space plasma associated with but not in galaxies: the intracluster medium (ICM) of rich clusters, the intragroup medium (IGroupM) of smaller groups, and the circumgalactic medium (CGM) associated with individual galaxies. All the stars in all the galaxies add up to less than 10%, and the cold (non-ionized) atomic and molecular gas in galaxies comprise about 1% of the baryons.

The other reservoirs of baryons pale in comparison to the IGM. Most are still in some form of diffuse space plasma, like the intracluster media of clusters of galaxies and groups of galaxies, or associated with but not in individual galaxies (the circumgalactic medium). These distinctions are a bit fuzzy, as are the uncertainties on each component, especially the CGM (fCGM = 0.08 +0.07/-0.06). This leaves some room for a lower overall baryon density, but not much.

Connor et al. get some constraint on the CGM by looking at the increase in the dispersion measure for FRBs with sight-lines that pass close to intervening galaxies vs. those that don’t. This shows that there does seem to be some extra gas associated with such galaxies, but not enough to account for all the baryons that should be associated with their dark matter halos. So the object-by-object checksum of how the baryons are partitioned remains problematic, and I hope to have more to say about it in the near future. Connor et al. argue that some of the baryons have to have been blown entirely out of their original dark matter halos by feedback; they can’t all be lurking there or there would be less dispersion measure from the general IGM between us and relatively nearby galaxies where there is no intervening CGM*.

The baryonic content of visible galaxies – the building blocks of the universe that most readily meet the eye – is less than 10% of the total baryon density. Most of that is in stars and their remnants, which contain about 5% of the baryons, give or take a few percent stemming from the uncertainty in the stellar initial mass function. The cold gas – both neutral atomic gas and the denser molecular gas from which stars form, only add up to about 1% of all baryons. What we see most readily is only a fraction of what’s out there, even when restricting our consideration to normal matter: mostly the baryons are in the IGM. Mostly.

The new baryon inventory is now in good agreement with big bang nucleosynthesis: ΩBh2 = 0.025 ± 0.003 is consistent with Ωbh2 = 0.0224 ± 0.0001 from Planck CMB fits. It is more consistent with this and the higher baryon density favored by deuterium than it is with lithium, but isn’t accurate enough to exclude the latter. Irrespective of this important detail, I feel better that the third of the baryons that used to be missing (or perhaps not there at all) are now accounted for. The agreement with the checksum of the baryon inventory with the density of baryons consistent with BBN is as encouraging success of this deeply fundamental aspect of the hot big bang cosmology.


+Looking at their equation 2, there is some degeneracy between the baryon density Ωb and the fraction of ionized baryons Out There. Lower Ωb would mean a higher baryon fraction in the diffuse ionized state. This is already large, so there is only a little room to trade off between the two.

*What counts as CGM is a bit dicey. Putting on a cosmology hat, the definition Connor et al. adopt involving a range of masses of dark matter halos appropriate for individual galaxies is a reasonable one, and it makes sense to talk about the baryon fraction of those objects relative to the cosmic value, of which they fall short (fgas = 0.35 +0.30/-0.25 in individual galaxies where f* < 0.35: these don’t add up to unity). Switching to MOND, the notional association of the CGM with the virial radii of a host dark matter halos is meaningless, so it doesn’t matter if the gas in the vicinity of galaxies was once part of them and got blown out or simply never accreted in the first place. In LCDM we require at least some blow out to explain the sub-cosmic baryon fractions, while in MOND I’m inclined to suspect that the dominant process is non-accretion due to inefficient galaxy formation. Of course, the universe may indulge in a mix of both physical effects, in either paradigm!

%Unlike FLRW cosmology, there is no special scale defined by the critical density; a universe experiencing the MOND force-law will ultimately recollapse whatever its density, at least in the absence of something that acts like anti-gravity (i.e., dark energy). In retrospect, this is a more satisfactory solution of the flatness problem than Inflation, as there is nothing surprising about the observed density being what it is. There is no worry about it being close to but not quite equal to the critical density since the critical density is no longer a special scale.

The Deuterium-Lithium tension in Big Bang Nucleosynthesis

The Deuterium-Lithium tension in Big Bang Nucleosynthesis

There are many tensions in the era of precision cosmology. The most prominent, at present, is the Hubble tension – the difference between traditional measurements, which consistently obtain H0 = 73 km/s/Mpc, and best fit* to the acoustic power spectrum of the cosmic microwave background (CMB) observed by Planck, H0 = 67 km/s/Mpc. There are others of varying severity that are less widely discussed. In this post, I want to talk about a persistent tension in the baryon density implied by the measured primordial abundances of deuterium and lithium+. Unlike the tension in H0, this problem is not nearly as widely discussed as it should be.

Framing

Part of the reason that this problem is not seen as an important tension has to do with the way in which it is commonly framed. In most discussions, it is simply the primordial lithium problem. Deuterium agrees with the CMB, so those must be right and lithium must be wrong. Once framed that way, it becomes a trivial matter specific to one untrustworthy (to cosmologists) observation. It’s a problem for specialists to sort out what went wrong with lithium: the “right” answer is otherwise known, so this tension is not real, making it unworthy of wider discussion. However, as we shall see, this might not be the right way to look at it.

It’s a bit like calling the acceleration discrepancy the dark matter problem. Once we frame it this way, it biases how we see the entire problem. Solving this problem becomes a matter of finding the dark matter. It precludes consideration of the logical possibility that the observed discrepancies occur because the force law changes on the relevant scales. This is the mental block I struggled mightily with when MOND first cropped up in my data; this experience makes it easy to see when other scientists succumb to it sans struggle.

Big Bang Nucleosynthesis (BBN)

I’ve talked about the cosmic baryon density here a lot, but I’ve never given an overview of BBN itself. That’s because it is well-established, and has been for a long time – I assume you, the reader, already know about it or are competent to look it up. There are many good resources for that, so I’ll only give enough of a sketch necessary to the subsequent narrative – a sketch that will be both too little for the experts and too much for the subsequent narrative that most experts are unaware of.

Primordial nucleosynthesis occurs in the first few minutes after the Big Bang when the universe is the right temperature and density to be one big fusion reactor. The protons and available neutrons fuse to form helium and other isotopes of the light elements. Neutrons are slightly more massive and less numerous than protons to begin with. In addition, free neutrons decay with a half-life of roughly ten minutes, so are outnumbered by protons when nucleosynthesis happens. The vast majority of the available neutrons pair up with protons and wind up in 4He while most of the protons remain on their own as the most common isotope of hydrogen, 1H. The resulting abundance ratio is one alpha particle for every dozen protons, or in terms of mass fractions&, Xp = 3/4 hydrogen and Yp = 1/4 helium. That is the basic composition with which the universe starts; heavy elements are produced subsequently in stars and supernova explosions.

Though 1H and 4He are by far the most common products of BBN, there are traces of other isotopes that emerge from BBN:

The time evolution of the relative numbers of light element isotopes through BBN. As the universe expands, nuclear reactions “freeze-out” and establish primordial abundances for the indicated species. The precise outcome depends on the baryon density, Ωb. This plot illustrates a particular choice of Ωb; different Ωb result in observationally distinguishable abundances. (Figures like this are so ubiquitous in discussions of the early universe that I have not been able to identify the original citation for this particular version.)

After hydrogen and helium, the next most common isotope to emerge from BBN is deuterium, 2H. It is the first thing made (one proton plus one neutron) but most of it gets processed into 4He, so after a brief peak, its abundance declines. How much it declines is very sensitive to Ωb: the higher the baryon density, the more deuterium gets gobbled up by helium before freeze-out. The following figure illustrates how the abundance of each isotope depends on Ωb:

“Schramm diagram” adopted from Cyburt et al (2003) showing the abundance of 4He by mass fraction (top) and the number relative to hydrogen of deuterium (D = 2H), helium-3, and lithium as a function of the baryon-to-photon ratio. We measure the photon density in the CMB, so this translates directly to the baryon density$ Ωbh2 (top axis).

If we can go out and measure the primordial abundances of these various isotopes, we can constrain the baryon density.

The Baryon Density

It works! Each isotope provides an independent estimate of Ωbh2, and they agree pretty well. This was the first and for a long time the only over-constrained quantity in cosmology. So while I am going to quibble about the exact value of Ωbh2, I don’t doubt that the basic picture is correct. There are too many details we have to get right in the complex nuclear reaction chains coupled to the decreasing temperature of a universe expanding at the rate required during radiation domination for this to be an accident. It is an exquisite success of the standard Hot Big Bang cosmology, albeit not one specific to LCDM.

Getting at primordial, rather than current, abundances is an interesting observational challenge too involved to go into much detail here. Suffice it to say that it can be done, albeit to varying degrees of satisfaction. We can then compare the measured abundances to the theoretical BBN abundance predictions to infer the baryon density.

The Schramm diagram with measured abundances (orange boxes) for the isotopes of the light elements. The thickness of the box illustrates the uncertainty: tiny for deuterium and large for 4He because of the large zoom on the axis scale. The lithium abundance could correspond to either low or high baryon density. 3He is omitted because its uncertainty is too large to provide a useful constraint.

Deuterium is considered the best baryometer because its relic abundance is very sensitive to Ωbh2: a small change in baryon density corresponds to a large change in D/H. In contrast, 4He is a great confirmation of the basic picture – the primordial mass fraction has to come in very close to 1/4 – but the precise value is not very sensitive to Ωbh2. Most of the neutrons end up in helium no matter what, so it is hard to distinguish# a few more from a few less. (Note the huge zoom on the linear scale for 4He. If we plotted it logarithmically with decades of range as we do the other isotopes, it would be a nearly flat line.) Lithium is annoying for being double-valued right around the interesting baryon density so that the observed lithium abundance can correspond to two values of Ωbh2. This behavior stems from the trade off with 7Be which is produced at a higher rate but decays to 7Li after a few months. For this discussion the double-valued ambiguity of lithium doesn’t matter, as the problem is that the deuterium abundance indicates Ωbh2 that is even higher than the higher branch of lithium.

BBN pre-CMB

The diagrams above and below show the situation in the 1990s before CMB estimates became available. Consideration of all the available data in the review of Walker et al. led to the value Ωbh2 = 0.0125 ± 0.0025. This value** was so famous that it was Known. It formed the basis of my predictions for the CMB for both LCDM and no-CDM. This prediction hinged on BBN being correct, and that we understood the experimental bounds on the baryon density. A few years after Walker’s work, Copi et al. provided the estimate++ 0.009 < Ωbh2 < 0.02. Those were the extreme limits of the time, as illustrated by the green box below:

The baryon density as it was known before detailed observations of the acoustic power spectrum of the CMB. BBN was a mature subject before 1990; the massive reviews of Walker et al. and Copi et al. creak with the authority of a solved problem. The controversial tension at the time was between the high and low deuterium measurements from Hogan and Tytler, which were at the extreme ends of the ranges indicated by the bulk of the data in the reviews.

Up until this point, the constraints on BBN had come mostly from helium observations in nearby galaxies and lithium measurements in metal poor stars. It was only just then becoming possible to obtain high quality spectra of sufficiently high redshift quasars to see weak deuterium lines associated with strongly damped primary hydrogen absorption in intergalactic gas along the line of sight. This is great: deuterium is the most sensitive baryometer, the redshifts were high enough to be early in the history of the universe close to primordial times, and the gas was in the middle of intergalactic nowhere so shouldn’t be altered by astrophysical processes. These are ideal conditions, at least in principle.

First results were binary. Craig Hogan obtained a high deuterium abundance, corresponding to a low baryon density. Really low. From my Walker et al.-informed confirmation bias, too low. It was a a brand new result, so promising but probably wrong. Then Tytler and his collaborators came up with the opposite result: low deuterium abundance corresponding to a high baryon density: Ωbh2 = 0.019 ± 0.001. That seemed pretty high at the time, but at least it was within the bound Ωbh2 < 0.02 set by Copi et al. There was a debate between these high/low deuterium camps that ended in a rare act of intellectual honesty by a cosmologist when Hogan&& conceded. We seemed to have settled on the high-end of the allowed range, just under Ωbh2 = 0.02.

Enter the CMB

CMB data started to be useful for constraining the baryon density in 2000 and improved rapidly. By that point, LCDM was already well-established, and I had published predictions for both LCDM and no-CDM. In the absences of cold dark matter, one expects a damping spectrum, with each peak lower than the one before it. For the narrow (factor of two) Known range of possible baryon densities, all the no-CDM models run together to essentially the same first-to-second peak ratio.

Peak locations measured by WMAP in 2003 (points) compared to the a priori (1999) predictions of LCDM (red tone lines) and no-CDM (blue tone lines). Models are normalized in amplitude around the first peak.

Adding CDM into the mix adds a driver to the oscillations. This fights the baryonic damping: the CDM is like a parent pushing a swing while the baryons are the kid dragging his feet. This combination makes just about any pattern of peaks possible. Not all free parameters are made equal: the addition of a single free parameter, ΩCDM, makes it possible to fit any plausible pattern of peaks. Without it (no-CDM means ΩCDM = 0), only the damping spectrum is allowed.

For BBN as it was known at the time, the clear difference was in the relative amplitude$$ of the first and second peaks. As can be seen above, the prediction for no-CDM was correct and that for LCDM was not. So we were done, right?

Of course not. To the CMB community, the only thing that mattered was the fit to the CMB power spectrum, not some obscure prediction based on BBN. Whatever the fit said was True; too bad for BBN if it didn’t agree.

The way to fit the unexpectedly small## second peak was to crank up the baryon density. To do that, Tegmark & Zaldarriaga (2000) needed 0.022 < Ωbh2 < 0.040. That’s what the first blue point below. This was the first time that I heard it suggested that the baryon density could be so high.

The baryon density from deuterium (red triangles) before and after (dotted vertical line) estimates from the CMB (blue points). The horizontal dotted line is the pre-CMB upper limit of Copi et al.

The astute reader will note that the CMB-fit 0.022 < Ωbh2 < 0.040 sits entirely outside the BBN bounds 0.009 < Ωbh2 < 0.02. So we’re done, right? Well, no – the community simply ignored the successful a priori prediction of the no-CDM scenario. That was certainly easier than wrestling with its implications, and no one seems to have paused to contemplate why the observed peak ratio came in exactly at the one unique value that it could obtain in the case of no-CDM.

For a few years, the attitude seemed to be that BBN was close but not quite right. As the CMB data improved, the baryon density came down, ultimately settling on Ωbh2 = 0.0224 ± 0.0001. Part of the reason for this decline from the high initial estimate is covariance. In this case, the tilt plays a role: the baryon density declined as ns = 1 → 0.965 ± 0.004. Getting the second peak amplitude right takes a combination of both.

Now we’re back in the ballpark, almost: Ωbh2 = 0.0224 is not ridiculously far above the BBN limit Ωbh2 < 0.02. Close enough for Spergel et al. (2003) to say “The remarkable agreement between the baryon density inferred from D/H values and our [WMAP] measurements is an important triumph for the basic big bang model.” This was certainly true given the size of the error bars on both deuterium and the CMB at the time. It also elides*** any mention of either helium or lithium or the fact that the new Known was not consistent with the previous Known. Ωbh2 = 0.0224 was always the ally; Ωbh2 = 0.0125 was always the enemy.

Note, however, that deuterium made a leap from below Ωbh2 = 0.02 to above 0.02 exactly when the CMB indicated that it should do so. They iterated to better agreement and pretty much stayed there. Hopefully that is the correct answer, but given the history of the field, I can’t help worrying about confirmation bias. I don’t know if that is what’s going on, but if it were, this convergence over time is what it would look like.

Lithium does not concur

Taking the deuterium results at face value, there really is excellent agreement with the LCDM fit to the CMB, so I have some sympathy for the desire to stop there. Deuterium is the best baryometer, after all. Helium is hard to get right at a precise enough level to provide a comparable constraint, and lithium, well, lithium is measured in stars. Stars are tiny, much smaller than galaxies, and we know those are too puny to simulate.

Spite & Spite (1982) [those are names, pronounced “speet”; we’re not talking about spiteful stars] discovered what is now known as the Spite plateau, a level of constant lithium abundance in metal poor stars, apparently indicative of the primordial lithium abundance. Lithium is a fragile nucleus; it can be destroyed in stellar interiors. It can also be formed as the fragmentation product of cosmic ray collisions with heavier nuclei. Both of these things go on in nature, making some people distrustful of any lithium abundance. However, the Spite plateau is a sort of safe zone where neither effect appears to dominate. The abundance of lithium observed there is indeed very much in the right ballpark to be a primordial abundance, so that’s the most obvious interpretation.

Lithium indicates a lowish baryon density. Modern estimates are in the same range as BBN of old; they have not varied systematically with time. There is no tension between lithium and pre-CMB deuterium, but it disagrees with LCDM fits to the CMB and with post-CMB deuterium. This tension is both persistent and statistically significant (Fields 2011 describes it as “4–5σ”).

The baryon density from lithium (yellow symbols) over time. Stars are measurements in groups of stars on the Spite plateau; the square represents the approximate value from the ISM of the SMC.

I’ve seen many models that attempt to fix the lithium abundance, e.g., by invoking enhanced convective mixing via <<mumble mumble>> so that lithium on the surface of stars is subject to destruction deep in the stellar interior in a previously unexpected way. This isn’t exactly satisfactory – it should result in a mess, not a well-defined plateau – and other attempts I’ve seen to explain away the problem do so with at least as much contrivance. All of these models appeared after lithium became a problem; they’re clearly motivated by the assumption bias that the CMB is correct so the discrepancy is specific to lithium so there must be something weird about stars that explains it.

Another way to illustrate the tension is to use Ωbh2 from the Planck fit to predict what the primordial lithium abundance should be. The Planck-predicted band is clearly higher than and offset from the stars of the Spite plateau. There should be a plateau, sure, but it’s in the wrong place.

The lithium abundance in metal poor stars (points), the interstellar medium of the Small Magellanic Cloud (green band), and the primordial lithium abundance expected for the best-fit Planck LCDM. For reference, [Fe/H] = -3 means an iron abundance that is one one-thousandth that of the sun.

An important recent observation is that a similar lithium abundance is obtained in the metal poor interstellar gas of the Small Magellanic Cloud. That would seem to obviate any explanation based on stellar physics.

The Schramm diagram with the Planck CMB-LCDM value added (vertical line). This agrees well with deuterium measurements made after CMB data became available, but not with those before, nor with the measured abundance of lithium.

We can also illustrate the tension on the Schramm diagram. This version adds the best-fit CMB value and the modern deuterium abundance. These are indeed in excellent agreement, but they don’t intersect with lithium. The deuterium-lithium tension appears to be real, and comparable in significance to the H0 tension.

So what’s the answer?

I don’t know. The logical options are

  • A systematic error in the primordial lithium abundance
  • A systematic error in the primordial deuterium abundance
  • Physics beyond standard BBN

I don’t like any of these solutions. The data for both lithium and deuterium are what they are. As astronomical observations, both are subject to the potential for systematic errors and/or physical effects that complicate their interpretation. I am also extremely reluctant to consider modifications to BBN. There are occasional suggestions to this effect, but it is a lot easier to break than it is to fix, especially for what is a fairly small disagreement in the absolute value of Ωbh2.

I have left the CMB off the list because it isn’t part of BBN: it’s constraint on the baryon density is real, but involves completely different physics. It also involves different assumptions, i.e., the LCDM model and all its invisible baggage, while BBN is just what happens to ordinary nucleons during radiation domination in the early universe. CMB fits are corroborative of deuterium only if we assume LCDM, which I am not inclined to accept: deuterium disagreed with the subsequent CMB data before it agreed. Whether that’s just progress or a sign of confirmation bias, I also don’t know. But I do know confirmation bias has bedeviled the history of cosmology, and as the H0 debate shows, we clearly have not outgrown it.

The appearance of confirmation bias is augmented by the response time of each measured elemental abundance. Deuterium is measured using high redshift quasars; the community that does that work is necessarily tightly coupled to cosmology. It’s response was practically instantaneous: as soon as the CMB suggested that the baryon density needed to be higher, conforming D/H measurements appeared. Indeed, I recall when that first high red triangle appeared in the literature, a colleague snarked to me “we can do that too!” In those days, those of us who had been paying attention were all shocked at how quickly Ωbh2 = 0.0125 ± 0.0025 was abandoned for literally double that value, ΩBh2 = 0.025 ± 0.001. That’s 4.6 sigma for those keeping score.

The primordial helium abundance is measured in nearby dwarf galaxies. That community is aware of cosmology, but not as strongly coupled to it. Estimates of the primordial helium abundance have drifted upwards over time, corresponding to higher implied baryon densities. It’s as if confirmation bias is driving things towards the same result, but on a timescale that depends on the sociological pressure of the CMB imperative.

Fig. 8 from Steigman (2012) showing the history of primordial helium mass fraction (YP) determinations as a function of time.

I am not accusing anyone of trying to obtain a particular result. Confirmation bias can be a lot more subtle than that. There is an entire field of study of it in psychology. We “humans actively sample evidence to support prior beliefs” – none of us are immune to it.

In this case, how we sample evidence depends on the field we’re active in. Lithium is measured in stars. One can have a productive career in stellar physics while entirely ignoring cosmology; it is the least likely to be perturbed by edicts from the CMB community. The inferred primordial lithium abundance has not budged over time.

What’s your confirmation bias?

I try not to succumb to confirmation bias, but I know that’s impossible. The best I can do is change my mind when confronted with new evidence. This is why I went from being sure that non-baryonic dark matter had to exist to taking seriously MOND as the theory that predicted what I observed.

I do try to look at things from all perspectives. Here, the CMB has been a roller coaster. Putting on an LCDM hat, the location of the first peak came in exactly where it was predicted: this was strong corroboration of a flat FLRW geometry. What does it mean in MOND? No idea – MOND doesn’t make a prediction about that. The amplitude of the second peak came in precisely as predicted for the case of no-CDM. This was corroboration of the ansatz inspired by MOND, and the strongest possible CMB-based hint that we might be barking up the wrong tree with LCDM.

As an exercise, I went back and maxed out the baryon density as it was known before the second peak was observed. We already thought we knew LCDM parameters well enough to do this. We couldn’t. The amplitude of the second peak came as a huge surprise to LCDM; everyone acknowledged that at the time (if pressed; many simply ignored it). Nowadays this is forgotten, or people have gaslit themselves into believing this was expected all along. It was not.

Fig. 45 from Famaey & McGaugh (2012): WMAP data are shown with the a priori prediction of no-CDM (blue line) and the most favorable prediction that could have been made ahead of time for LCDM (red line).

From the perspective of no-CDM, we don’t really care whether deuterium or lithium hits closer to the right baryon density. All plausible baryon densities predict essentially the same A1:2 amplitude ratio. Once we admit CDM as a possibility, then the second peak amplitude becomes very sensitive to the mix of CDM and baryons. From this perspective, the lithium-indicated baryon density is unacceptable. That’s why it is important to have a test that is independent of the CMB. Both deuterium and lithium provide that, but they disagree about the answer.

Once we broke BBN to fit the second peak in LCDM, we were admitting (if not to ourselves) that the a priori prediction of LCDM had failed. Everything after that is a fitting exercise. There are enough free parameters in LCDM to fit any plausible power spectrum. Cosmologists are fond of saying there are thousands of independent multipoles, but that overstates the case: it doesn’t matter how finely we sample the wave pattern, it matters what the wave pattern is. That is not as over-constrained as it is made to sound. LCDM is, nevertheless, an excellent fit to the CMB data; the test then is whether the parameters of this fit are consistent with independent measurements. It was until it wasn’t; that’s why we face all these tensions now.

Despite the success of the prediction of the second peak, no-CDM gets the third peak wrong. It does so in a way that is impossible to fix short of invoking new physics. We knew that had to happen at some level; empirically that level occurs at L = 600. After that, it becomes a fitting exercise, just as it is in LCDM – only now, one has to invent a new theory of gravity in which to make the fit. That seems like a lot to ask, so while it remained as a logical possibility, LCDM seemed the more plausible explanation for the CMB if not dynamical data. From this perspective, that A1:2 came out bang on the value predicted by no-CDM must just be one heck of a cosmic fluke. That’s easy to accept if you were unaware of the prediction or scornful of its motivation; less so if you were the one who made it.

Either way, the CMB is now beyond our ability to predict. It has become a fitting exercise, the chief issue being what paradigm in which to fit it. In LCDM, the fit follows easily enough; the question is whether the result agrees with other data: are these tensions mere hiccups in the great tradition of observational cosmology? Or are they real, demanding some new physics?

The widespread attitude among cosmologists is that it will be impossible to fit the CMB in any way other than LCDM. That is a comforting thought (it has to be CDM!) and for a long time seemed reasonable. However, it has been contradicted by the success of Skordis & Zlosnik (2021) using AeST, which can fit the CMB as well as LCDM.

CMB power spectrum observed by Planck fit by AeST (Skordis & Zlosnik 2021).

AeST is a very important demonstration that one does not need dark matter to fit the CMB. One does need other fields+++, so now the reality of those have to be examined. Where this show stops, nobody knows.

I’ll close by noting that the uniqueness claimed by the LCDM fit to the CMB is a property more correctly attributed to MOND in galaxies. It is less obvious that this is true because it is always possible to fit a dark matter model to data once presented with the data. That’s not science, that’s fitting French curves. To succeed, a dark matter model must “look like” MOND. It obviously shouldn’t do that, so modelers refuse to go there, and we continue to spin our wheels and dig the rut of our field deeper.

Note added in proof, as it were: I’ve been meaning to write about this subject for a long time, but hadn’t, in part because I knew it would be long and arduous. Being deeply interested in the subject, I had to slap myself repeatedly to refrain from spending even more time updating the plots with publication date as an axis: nothing has changed, so that would serve only to feed my OCD. Even so, it has taken a long time to write, which I mention because I had completed the vast majority of this post before the IAU announced on May 15 that Cooke & Pettini have been awarded the Gruber prize for their precision deuterium abundance. This is excellent work (it is one of the deuterium points in the relevant plot above), and I’m glad to see this kind of hard, real-astronomy work recognized.

The award of a prize is a recognition of meritorious work but is not a guarantee that it is correct. So this does not alter any of the concerns that I express here, concerns that I’ve expressed for a long time. It does make my OCD feels obliged to comment at least a little on the relevant observations, which is itself considerably involved, but I will tack on some brief discussion below, after the footnotes.

*These methods were in agreement before they were in tension, e.g., Spergel et al. (2003) state: “The agreement between the HST Key Project value and our [WMAP CMB] value, h = 0.72 ±0.05, is striking, given that the two methods rely on different observables, different underlying physics, and different model assumptions.”

+Here I mean the abundance of the primary isotope of lithium, 7Li. There is a different problem involving the apparent overabundance of 6Li. I’m not talking about that here; I’m talking about the different baryon densities inferred separately from the abundances of D/H and 7Li/H.

&By convention, X, Y, and Z are the mass fractions of hydrogen, helium, and everything else. Since the universe starts from a primordial abundance of Xp = 3/4 and Yp = 1/4, and stars are seen to have approximately that composition plus a small sprinkling of everything else (for the sun, Z ≈ 0.02), and since iron lines are commonly measured in stars to trace Z, astronomers fell into the habit of calling Z the metallicity even though oxygen is the third most common element in the universe today (by both number and mass). Since everything in the periodic table that isn’t hydrogen and helium is a small fraction of the mass, all the heavier elements are often referred to collectively as metals despite the unintentional offense to chemistry.

$The factor of h2 appears because of the definition of the critical density ρc = (3H02)/(8πG): Ωb = ρbc. The physics cares about the actual density ρb but Ωbh2 = 0.02 is a lot more convenient to write than ρb,now = 3.75 x 10-31 g/cm3.

#I’ve worked on helium myself, but was never able to do better than Yp = 0.25 ± 0.01. This corroborates the basic BBN picture, but does not suffice as a precise measure of the baryon density. To do that, one must obtain a result accurate to the third place of decimals, as discussed in the exquisite works of Kris Davidson, Bernie Pagel, Evan Skillman, and their collaborators. It’s hard to do for both observational reasons and because a wealth of subtle atomic physics effects come into play at that level of precision – helium has multiple lines; their parent population levels depend on the ionization mechanism, the plasma temperature, its density, and fluorescence effects as well as abundance.

**The value reported by Walker et al. was phrased as Ωbh502 = 0.05 ± 0.01, where h50 = H0/(50 km/s/Mpc); translating this to the more conventional h = H0/(100 km/s/Mpc) decreases these numbers by a factor of four and leads to the impression of more significant digits than were claimed. It is interesting to consider the psychological effect of this numerology. For example, the modern CMB best-fit value in this phrasing is Ωbh502 = 0.09, four sigma higher than the value Known from the combined assessment of the light isotope abundances. That seems like a tension – not just involving lithium, but the CMB vs. all of BBN. Amusingly, the higher baryon density needed to obtain a CMB fit assuming LCDM is close to the threshold where we might have gotten away without the dynamical needm > Ωb) for non-baryonic dark matter that motivated non-baryonic dark matter in the first place. (For further perspective at a critical juncture in the development of the field, see Peebles 1999).

The use of h50 itself is an example of the confirmation bias I’ve mentioned before as prevalent at the time, that Ωm = 1 and H0 = 50 km/s/Mpc. I would love to be able to do the experiment of sending the older cosmologists who are now certain of LCDM back in time to share the news with their younger selves who were then equally certain of SCDM. I suspect their younger selves would ask their older selves at what age they went insane, if they didn’t simply beat themselves up.

++Craig Copi is a colleague here at CWRU, so I’ve asked him about the history of this. He seemed almost apologetic, since the current “right” baryon density from the CMB now is higher than his upper limit, but that’s what the data said at the time. The CMB gives a more accurate value only once you assume LCDM, so perhaps BBN was correct in the first place.

&&Or succumbed to peer pressure, as that does happen. I didn’t witness it myself, so don’t know.

$$The absolute amplitude of the no-CDM model is too high in a transparent universe. Part of the prediction of MOND is that reionization happens early, causing the universe to be a tiny bit opaque. This combination came out just right for τ = 0.17, which was the original WMAP measurement. It also happens to be consistent with the EDGES cosmic dawn signal and the growing body of evidence from JWST.

##The second peak was unexpectedly small from the perspective of CDM; it was both natural and expected in no-CDM. At the time, it was computationally expensive to calculate power spectra, so people had pre-computed coarse grids within which to hunt for best fits. The range covered by the grids was informed by extant knowledge, of which BBN was only one element. From a dynamical perspective, Ωm > 0.2 was adopted as a hard limit that imposed an edge in the grids of the time. There was no possibility of finding no-CDM as the best fit because it had been excluded as a possibility from the start.

***Spergel et al. (2003) also say “the best-fit Ωbh2 value for our fits is relatively insensitive to cosmological model and dataset combination as it depends primarily on the ratio of the first to second peak heights (Page et al. 2003b)” which is of course the basis of the prediction I made using the baryon density as it was Known at the time. They make no attempt to test that prediction, nor do they cite it.

+++I’ve heard some people assert that this is dark matter by a different name, so is a success of the traditional dark matter picture rather than of modified gravity. That’s not at all correct. It’s just stage three in the list of reactions to surprising results identified by Louis Agassiz.

All of the figures below are from Cooke & Pettini (2018), which I employ here to briefly illustrate how D/H is measured. This is the level of detail I didn’t want to get into for either deuterium or helium or lithium, which are comparably involved.

First, here is a spectrum of the quasar they observe, Q1243+307. The quasar itself is not the object of interest here, though quasars are certainly interesting! Instead, we’re looking at the absorption lines along the line of sight; the quasar is being used as a spotlight to illuminate the gas between it and us.

Figure 1. Final combined and flux-calibrated spectrum of Q1243+307 (black histogram) shown with the corresponding error spectrum (blue histogram) and zero level (green dashed line). The red tick marks above the spectrum indicate the locations of the Lyman series absorption lines of the sub-DLA at redshift zabs = 2.52564. Note the exquisite signal-to-noise ratio (S/N) of the combined spectrum, which varies from S/N ≃ 80 near the Lyα absorption line of the sub-DLA (∼4300 Å) to S/N ≃ 25 at the Lyman limit of the sub-DLA, near 3215 Å in the observed frame.

The big hump around 4330 Å is Lyman α emission from the quasar itself. Lyα is the n = 2 to 1 transition of hydrogen, Lyβ is the n = 3 to 1 transition, and so on. The rest frame wavelength of Lyα is far into the ultraviolet at 1216 Å; we see it redshifted to z = 2.558. The rest of the spectrum is continuum and emission lines from the quasar with absorption lines from stuff along the line of sight. Note that the red end of the spectrum at wavelengths longer than 4400 Å is mostly smooth with only the occasional absorption line. Blueward of 4300 Å, there is a huge jumble. This is not noise, this is the Lyα forest. Each of those lines is absorption from hydrogen in clouds at different distances, hence different redshifts, along the line of sight.

Most of the clouds in the Lyα forest are ephemeral. The cross section for Lyα is huge so It takes very little hydrogen to gobble it up. Most of these lines represent very low column densities of neutral hydrogen gas. Once in a while though, one encounters a higher column density cloud that has enough hydrogen to be completely opaque to Lyα. These are damped Lyα systems. In damped systems, one can often spot the higher order Lyman lines (these are marked in red in the figure). It also means that there is enough hydrogen present to have a shot at detecting the slightly shifted version of Lyα of deuterium. This is where the abundance ratio D/H is measured.

To measure D/H, one has not only to detect the lines, but also to model and subtract the continuum. This is a tricky business in the best of times, but here its importance is magnified by the huge difference between the primary Lyα line which is so strong that it is completely black and the deuterium Lyα line which is incredibly weak. A small error in the continuum placement will not matter to the measurement of the absorption by the primary line, but it could make a huge difference to that of the weak line. I won’t even venture to discuss the nonlinear difference between these limits due to the curve of growth.

Figure 2. Lyα profile of the absorption system at zabs = 2.52564 toward the quasar Q1243+307 (black histogram) overlaid with the best-fitting model profile (red line), continuum (long dashed blue line), and zero-level (short dashed green line). The top panels show the raw, extracted counts scaled to the maximum value of the best-fitting continuum model. The bottom panels show the continuum normalized flux spectrum. The label provided in the top left corner of every panel indicates the source of the data. The blue points below each spectrum show the normalized fit residuals, (data–model)/error, of all pixels used in the analysis, and the gray band represents a confidence interval of ±2σ. The S/N is comparable between the two data sets at this wavelength range, but it is markedly different near the high order Lyman series lines (see Figures 4 and 5). The red tick marks above the spectra in the bottom panels show the absorption components associated with the main gas cloud (Components 2, 3, 4, 5, 6, 8, and 10 in Table 2), while the blue tick marks indicate the fitted blends. Note that some blends are also detected in Lyβ–Lyε.

The above examples look pretty good. The authors make the necessary correction for the varying spectral sensitivity of the instrument, and take great care to simultaneously fit the emission of the quasar and the absorption. I don’t think they’ve done anything wrong; indeed, it looks like they did everything right – just as the people measuring lithium in stars have.

Still, as an experienced spectroscopist, there are some subtle details that make me queasy. There are two independent observations, which is awesome, and the data look almost exactly the same, a triumph of repeatability. The fitted models are nearly identical, but if you look closely, you can see the model cuts slightly differently along the left edge of the damped absorption around 4278 Å in the two versions of the spectrum, and again along the continuum towards the right edge.

These differences are small, so hopefully don’t matter. But what is the continuum, really? The model line goes through the data, because what else could one possibly do? But there is so much Lyα absorption, is that really continuum? Should the continuum perhaps trace the upper envelope of the data? A physical effect that I worry about is that weak Lyα is so ubiquitous, we never see the true continuum but rather continuum minus a tiny bit of extraordinarily weak (Gunn-Peterson) absorption. If the true continuum from the quasar is just a little higher, then the primary hydrogen absorption is unaffected but the weak deuterium absorption would go up a little. That means slightly higher D/H, which means lower Ωbh2, which is the direction in which the measurement would need to move to come into closer agreement with lithium.

Is the D/H measurement in error? I don’t know. I certainly hope not, and I see no reason to think it is. I do worry that it could be. The continuum level is one thing that could go wrong; there are others. My point is merely that we shouldn’t assume it has to be lithium that is in error.

An important check is whether the measured D/H ratio depends on metallicity or column density. It does not. There is no variation with metallicity as measured by the logarithmic oxygen abundance relative to solar (left panel below). Nor does it appear to depend on the amount of hydrogen in the absorbing cloud (right panel). In the early days of this kind of work there appeared to be a correlation, raising the specter of a systematic. That is not indicated here.

Figure 6. Our sample of seven high precision D/H measures (symbols with error bars); the green symbol represents the new measure that we report here. The weighted mean value of these seven measures is shown by the red dashed and dotted lines, which represent the 68% and 95% confidence levels, respectively. The left and right panels show the dependence of D/H on the oxygen abundance and neutral hydrogen column density, respectively. Assuming the Standard Model of cosmology and particle physics, the right vertical axis of each panel shows the conversion from D/H to the universal baryon density. This conversion uses the Marcucci et al. (2016) theoretical determination of the d(p,γ)3He cross-section. The dark and light shaded bands correspond to the 68% and 95% confidence bounds on the baryon density derived from the CMB (Planck Collaboration et al. 2016).

I’ll close by noting that Ωbh2 from this D/H measurement is indeed in very good agreement with the best-fit Planck CMB value. The question remains whether the physics assumed by that fit, baryons+non-baryonic cold dark mater+dark energy in a strictly FLRW cosmology, is the correct assumption to make.

Some more persistent cosmic tensions

Some more persistent cosmic tensions

I set out last time to discuss some of the tensions that persist in afflicting cosmic concordance, but didn’t get past the Hubble tension. Since then, I’ve come across more of that, e.g., Boubel et al (2024a), who use a variant of Tully-Fisher to obtain H0 = 73.3 ± 2.1(stat) ± 3.5(sys) km/s/Mpc. Having done that sort of work, their systematic uncertainty term seemed large to me. I then came across Scolnic et al. (2024) who trace this issue back to one apparently erroneous calibration amongst many, and correct the results to H0 = 76.3 ± 2.1(stat) ± 1.5(sys) km/s/Mpc. Boubel is an author of the latter paper, so apparently agrees with this revision. Fortunately they didn’t go all Sandage-de Vaucouleurs on us, but even so, this provides a good example of how fraught this field can get. It also demonstrates the opportunity for confirmation bias, as the revised numbers are almost exactly what we find ourselves. (New results coming soon!)

It’s a dang mess.

The Hubble tension is only the most prominent of many persistent tensions, so let’s wade into some of the rest.

The persistent tension in the amplitude of the power spectrum

The tension that cosmologists seem to stress about most after the Hubble tension is that in σ8. σ8 quantifies the amplitude of the power spectrum; it is a measure of the rms fluctuation in mass in spheres of 8h-1 Mpc. Historically, this scale was chosen because early work by Peebles & Yu (1970) indicated that this was the scale on which the rms contrast in galaxy numbers* is unity. This is also a handy dividing line between linear and nonlinear regimes. On much larger scales, the fluctuations are smaller (a giant sphere is closer to the average for the whole universe) so can be treated in the limit of linear perturbation theory. Individual galaxies are “small” by this standard, so can’t be treated+ so simply, which is the excuse many cosmologists use to run shrieking from discussing them.

As we progressed from wrapping our heads around an expanding universe to quantifying the large scale structure (LSS) therein, the power spectrum statistically describing LSS became part of the canonical set of cosmological parameters. I don’t myself consider it to be on par with the Big Two, the Hubble constant H0 and the density parameter Ωm, but many cosmologists do seem partial to it despite the lack of phase information. Consequently, any tension in the amplitude σ8 garners attention.

The tension in σ8 has been persistent insofar as I recall debates in the previous century where some kinds of data indicated σ8 ~ 0.5 while other data preferred σ8 ~ 1. Some of that tension was in underlying assumptions (SCDM before LCDM). Today, the difference is [mostly] between the Planck best-fit amplitude σ8 = 0.811 ± 0.006 and various local measurements that typically yield 0.7something. For example, Karim et al. (2024) find low σ8 for emission line galaxies, even after specifically pursuing corrections in a necessary dust model that pushed things in the right direction:

Fig. 16 from Karim et al. (2024): Estimates of σ8 from emission line galaxies (red and blue), luminous red galaxies (grey), and Planck (green).

As with so many cosmic parameters, there is degeneracy, in this case between σ8 and Ωm. Physically this happens because you get more power when you have more stuff (Ωm), but the different tracers are sensitive to it in different ways. Indeed, if I put on a cosmology hat, I personally am not too worried about this tension – emission line galaxies are typically lower mass than luminous red galaxies, so one expects that there may be a difference in these populations. The Planck value is clearly offset from both, but doesn’t seem too far afield. We wouldn’t fret at all if it weren’t for Planck’s damnably small error bars.

This tension is also evident as a function of redshift. Here are measures of the combination of parameters fσ8  =  Ωm(z)γσ8 measured and compiled by Boubel et al (2024b):

Fig. 16 from Boubel et al (2024b). LCDM matches the data for σ8 = 0.74 (green line); the purple line is the expectation from Planck (σ8 = 0.81). The inset shows the error ellipse, which is clearly offset from the Planck value (crossed lines), particularly for the GR& value of γ = 0.55.

The line representing the Planck value σ8 = 0.81 overshoots most of the low redshift data, particularly those with the smallest uncertainties. The green line has σ8 = 0.74, so is a tad lower than Planck in the same sense as other low redshift measures. Again, the offset is modest, but it does look significant. The tension is persistent but not a show-stopper, so we generally shrug our shoulders and proceed as if it will inevitably work out.

The persistent tension in the cosmic mass density

A persistent tension that nobody seems to worry about is that in the density parameter Ωm. Fits to the Planck CMB acoustic power spectrum currently peg Ωm = 0.315±0.007, but as we’ve seen before, this covaries with the Hubble constant. Twenty years ago, WMAP indicated Ωm = 0.24 and H0 = 73, in good agreement with the concordance region of other measurements, both then and now. As with H0, the tension is posed by the itty bitty uncertainties on the Planck fit.

Experienced cosmologists may be inclined to scoff at such tiny error bars. I was, so I’ve confirmed them myself. There is very little wiggle room to match the Planck data within the framework of the LCDM model. I emphasize that last bit because it is an assumption now so deeply ingrained that it is usually left unspoken. If we leave that part out, then the obvious interpretation is that Planck is correct and all measurements that disagree with it must suffer from some systematic error. This seems to be what most cosmologists believe at present. If we don’t leave that part out, perhaps because we’re aware of other possibilities so are not willing to grant this assumption, then the various tensions look like failures of a model that’s already broken. But let’s not go there today, and stay within the conventional framework.

There are lots of ways to estimate the gravitating mass density of the universe. Indeed, it was the persistent, early observation that the mass density Ωm exceeded that in baryons, Ωb, from big bang nucleosynthesis that got got the non-baryonic dark matter show on the road: there appears to be something out there gravitating that’s not normal matter. This was the key observation that launched non-baryonic cold dark matter: if Ωm > Ωb, there has% to be some kind of particle that is non-baryonic.

So what is Ωm? Most estimates have spanned the range 0.2 < Ωm < 0.4. In the 1980s and into the 1990s, this seemed close enough to Ωm = 1, by the standards of cosmology, that most Inflationary cosmologists presumed it would work out to what Inflation predicted, Ωm = 1 exactly. Indeed, I remember that community directing some rather vicious tongue-lashings at observers, castigating them to look harder: you will surely get Ωm = 1 if you do it right, you fools. But despite the occasional claim to get this “right” answer, the vast majority of the evidence never pointed that way. As I’ve related before, an important step on the path to LCDM – probably the most important step – was convincing everyone that really Ωm < 1.

Discerning between Ωm = 0.2 and 0.3 is a lot more challenging than determining that Ωm < 1, so we tend to treat either as acceptable. That’s not really fair in this age of precision cosmology. There are far too many estimates of the mass density to review here, so I’ll just note a couple of discrepant examples while also acknowledging that it is easy to find dynamical estimates that agree with Planck.

To give a specific example, Mohayaee & Tully (2005) obtained Ωm = 0.22 ± 0.02 by looking at peculiar velocities in the local universe. This was consistent with other constraints at the time, including WMAP, but is 4.5σ from the current Planck value. That’s not quite the 5σ we arbitrarily define to be an undeniable difference, but it’s plenty significant.

There have of course been other efforts to do this, and many of them lead to the same result, or sometimes even lower Ωm. For example, Shaya et al. (2022) use the Numerical Action Method developed by Peebles to attempt to work out the motions of nearly 10,000 galaxies – not just their Hubble expansion, but their individual trajectories under the mutual influence of each other’s gravity and whatever else may be out there. The resulting deviations from a pure Hubble flow depend on how much mass is associated with each galaxy and whatever other density there is to perturb things.

Fig. 4 from Shaya et al (2022): The gravitating mass density as a function of scale. After some local variations (hello Virgo cluster!), the data converge to Ωm = 0.12. Reaching Ωm = 0.24 requires an equal, additional amount of mass in “interhalo matter.” Even more mass would be required to reach the Planck value (red line added to original figure).

This result is in even greater tension with Planck than the earlier work by Mohayaee & Tully (2005). I find the need to invoke interhalo matter disturbing, since it acts as a pedestal in their analysis: extra mass density that is uniform everywhere. This is necessary so that it contributes to the global mass density Ωm but does not contribute to perturbing the Hubble flow.

One can imagine mass that is uniformly distributed easily enough, but what bugs me is that dark matter should not do this. There is no magic segregation between dark matter that forms into halos that contain galaxies and dark matter that just hangs out in the intergalactic medium and declines to participate in any gravitational dynamics. That’s not an option available to it: if it gravitates, it should clump. To pull this off, we’d need to live in a universe made of two distinct kinds of dark matter: cold dark matter that clumps and a fluid that gravitates globally but does not clump, sort of an anti-dark energy.

Alternatively, we might live in an underdense region such that the local Ωm is less than the global Ωm. This is an idea that comes and goes for one reason or another, but it has always been hard to sustain. The convergence to low Ωm looks pretty steady out to ~100 Mpc in the plot above; that’s a pretty big hole. Recall the non-linearity scale discussed above; this scale is a factor of ten larger so over/under-densities should typical be ±10%. This one is -60%, so I guess we’d have to accept that we’re not Copernican observers after all.

The persistent tension in bulk flows

Once we get past the basic Hubble expansion, individual galaxies each have their own peculiar motion, and beyond that we have bulk flows. These have been around a long time. We obsessed a lot about them for a while with discoveries like the Great Attractor. It was weird; I remember some pundits talking about “plate tectonics” in the universe, like there were giant continents of galaxy superclusters wandering around in random directions relative to the frame of the microwave background. Many of us, including me, couldn’t grok this, so we chose not to sweat it.

There is no single problem posed by bulk flows^, and of course you can find those that argue they pose no problem at all. We are in motion relative to the cosmic (CMB) frame$, but that’s just our Milky Way’s peculiar motion. The strange fact is that it’s not just us; the entirety of the local universe seems to have a unexpected peculiar motion. There are lots of ways to quantify this; here’s a summary table from Courtois et al (2025):

Table 1 from Courtois et al (2025): various attempts to measure the scale of dynamical homogeneity.

As we look to large scales, we expect the universe to converge to homogeneity – that’s the Cosmological Principle, which is one of those assumptions that is so fundamental that we forget we made it. The same holds for dynamics – as we look to large scales, we expect the peculiar motions to average out, and converge to a pure Hubble flow. The table above summarizes our efforts to measure the scale on which this happens – or doesn’t. It also shows what we expect on the second line, “predicted LCDM,” where you can see the expected convergence in the declining bulk velocities as the scale probed increases. The third line is for “cosmic variance;” when you see these words it usually means something is amiss so in addition to the usual uncertainties we’re going to entertain the possibility that we live in an abnormal universe.

Like most people, I was comfortably ignoring this issue until recently, when we had a visit and a talk from one of the protagonists listed above, Richard Watkins (W23). One of the problems that challenge this sort of work is the need for a large sample of galaxies with complete sky coverage. That’s observationally challenging to obtain. Real data are heterogeneous; treating this properly demands a more sophisticated treatment than the usual top-hat or Gaussian approaches. Watkins described in detail what a better way could be, and patiently endured the many questions my colleagues and I peppered him with. This is hard to do right, which gives aid and comfort to the inclination to ignore it. After hearing his talk, I don’t think we should do that.

Panel from Fig. 7 of Watkins et al. (2023): The magnitude of the bulk flow as a function of scale. The green points are the data and the red dashed line is the expectation of LCDM. The blue dotted line is an estimate of known systematic effects.

The data do not converge with increasing scale as expected. It isn’t just the local space density Ωm that’s weird, it’s also the way in which things move. And “local” isn’t at all small here, with the effect persisting out beyond 300 Mpc for any plausible h = H0/100.

This is formally a highly significant result, with the authors noting that “the probability of observing a bulk flow [this] large … is small, only about 0.015 per cent.” Looking at the figure above, I’d say that’s a fairly conservative statement. A more colloquial way of putting it would be “no way we gonna reconcile this!” That said, one always has to worry about systematics. They’ve made every effort to account for these, but there can always be unknown unknowns.

Mapping the Universe

It is only possible to talk about these things thanks to decades of effort to map the universe. One has to survey a large area of sky to identify galaxies in the first place, then do follow-up work to obtain redshifts from spectra. This has become big business, but to do what we’ve just been talking about, it is further necessary to separate peculiar velocities from the Hubble flow. To do that, we need to estimate distances by some redshift-independent method, like Tully-Fisher. Tully has been doing this his entire career, with the largest and most recent data product being Cosmicflows-4. Such data reveal not only large bulk flows, but extensive structure in velocity space:

The Laniakea supercluster of galaxies (Tully et al. 2014).

We have a long way to go to wrap our heads around all of this.

Persistent tensions persist

I’ve discussed a few of the tensions that persist in cosmic data. Whether these are mere puzzles or a mounting pile of anomalies is a matter of judgement. They’ve been around for a while, so it isn’t fair to suggest that all of the data are consistent with LCDM. Nevertheless, I hear exactly this asserted with considerable frequency. It’s as if the definition of all is perpetually shrinking to include only the data that meet the consistency criterion. Yet it’s the discrepant bits that are interesting for containing new information; we need to grapple with them if the field is to progress.

*This was well before my time, so I am probably getting some aspect of the history wrong or oversimplifying it in some gross way. Crudely speaking, if you randomly plop down spheres of this size, some will be found to contain the cosmic average number of galaxies, some twice that, some half that. That the modern value of σ8 is close to unity means that Peebles got it basically right with the data that were available back then and that galaxy light very nearly traces mass, which is not guaranteed in a universe dominated by dark matter.


+It amazes me how pervasively “galaxies are complicated” is used as an excuse++ to ignore all small scale evidence.

Not all of us are limited to working on the simplest systems. In this case, it doesn’t matter. The LCDM prediction here is that galaxies should be complicated because they are nonlinear. But the observation is that they are simple – so simple that they obey a single effective force law. That’s the contradiction right there, regardless of what flavor of complicated might come out of some high resolution simulation.

++At one KITP conference I attended, a particle-cosmologist said during a discussion session, in all seriousness and with a straight face, “We should stop talking about rotation curves.” Because scientific truth is best revealed by ignoring the inconvenient bits. David Merritt remarked on this in his book A Philosophical Approach to MOND. He surveyed the available cosmology textbooks, and found that not a single one of them mentioned the acceleration scale in the data. I guess that would go some way to explaining why statements of basic observational facts are often met with stunned silence. What’s obvious and well-established to me is a wellspring of fresh if incredible news to them. I’d probably give them the stink-eye about the cosmological constant if I hadn’t been paying the slightest attention to cosmology for the past thirty years.


&There is an elegant approach to parameterizing the growth of structure in theories that deviate modestly from GR. In this context, such theories are usually invoked as an alternative to dark energy, because it is socially acceptable to modify GR to explain dark energy but not dark matter. The curious hysteresis of that strange and seemingly self-contradictory attitude aside, this approach cannot be adapted to MOND because it assumes linearity while MOND is inherently nonlinear. My very crude, back-of-the-envelope expectation for MOND is very nearly constant γ ~ 0.4 (depending on the scale probed) out to high redshift. The bend we see in the conventional models around z ~ 0.6 will occur at z > 2 (and probably much higher) because structure forms fast in MOND. It is annoyingly difficult to put a more precise redshift on this prediction because it also depends on the unknown metric. So this is a more of a hunch than a quantitative prediction. Still, it will be interesting to see if roughly constant fσ8 persists to higher redshift.


%The inference that non-baryonic dark matter has to exist assumes that gravity is normal in the sense taught to us by Newton and Einstein. If some other theory of gravity applies, then one has to reassess the data in that context. This is one of the first considerations I made of MOND in the cosmological context, finding Ωm ≈ Ωb.


^MOND is effective at generating large bulk flows.


$Fun fact: you can type the name of a galaxy into NED (the NASA Extragalactic Database) and it will give you lots of information, including its recession velocity referenced to a variety of frames of reference and the corresponding distance from the Hubble law V = H0D. Naively, you might think that the obvious choice of reference from is the CMB. You’d be wrong. If you use this, you will get the wrong distance to the galaxy. Of all the choices available there, it consistently performs the worst as adjudicated by direct distance measurements (e.g., Cepheids).

NED used to provide a menu of choices for the value of H0 to use. It says something about the social-tyranny of precision cosmology that it now defaults to the Planck value. If you use this, you will get the wrong distance to the galaxy. Even if the Planck H0 turns out to be correct in some global sense, it does not work for real galaxies that are relatively near to us. That’s what it means to have all the “local” measurements based on direct distance measurements (e.g., Cepheids) consistently give a larger H0.

Galaxies in the local universe are closer than they appear. Photo by P.S. Pratheep, www.pratheep.com

Some persistent cosmic tensions

Some persistent cosmic tensions

I took the occasion of the NEIU debate to refresh my knowledge of the status of some of the persistent tensions in cosmology. There wasn’t enough time to discuss those, so I thought I’d go through a few of them here. These issues tend to get downplayed or outright ignored when we hype LCDM’s successes.

When I teach cosmology, I like to have the students do a project in which they each track down a measurement of some cosmic parameter, and then report back on it. The idea, when I started doing this back in 1999, was to combine the different lines of evidence to see if we reach a consistent concordance cosmology. Below is an example from the 2002 graduate course at the University of Maryland. Does it all hang together? I ask the students to debate the pros and cons of the various lines of evidence.

The mass density parameter Ωm = ρmcrit and the Hubble parameter h = H0/(100 km/s/Mpc) from various constraints (colored lines) available in 2002. I later added the first (2003) WMAP result (box). The combination of results excludes the grey region; only the white portion is viable: this is the concordance region.

The concordance cosmology is the small portion of this diagram that was not ruled out. This is the way in which LCDM was established. Before we had either the CMB acoustic power spectrum or Type Ia supernovae, LCDM was pretty much a done deal based on a wide array of other astronomical evidence. It was the subsequentα agreement of the Type Ia SN and the CMB that cemented the picture in place.

The implicit assumption in this approach is that we have identified the correct cosmology by process of elimination: whatever is left over must be the right answer. But what if nothing is left over?

I have long worried that we’ve painted ourselves into a corner: maybe the concordance window is merely the least unlikely spot before everything is excluded. Excluding everything would effectively falsify LCDM cosmology, if not the more basic picture of an expanding universe% emerging from a hot big bang. Once one permits oneself to think this way, then it occurs to one that perhaps the reason we have to invoke the twin tooth fairies of dark matter and dark energy is to get FLRW to approximate some deeper, underlying theory.

Most cosmologists do not appear to contemplate this frightening scenario. And indeed, before we believe something so drastic, we have to have thoroughly debunked the standard picture – something rather difficult to do when 95% of it is invisible. It also means believing all the constraints that call the standard picture into question (hence why contradictory results experience considerably more scrutiny* than conforming results). The fact is that some results are more robust than others. The trick is deciding which to trust.^

In the diagram above, the range of Ωm from cluster mass-to-light ratios comes from some particular paper. There are hundreds of papers on this topic, if not thousands. I do not recall which one this particular illustration came from, but most of the estimates I’ve seen from the same method come in somewhat higher. So if we slide those green lines up, the allowed concordance window gets larger.

The practice of modern cosmology has necessarily been an exercise in judgement: which lines of evidence should we most trust? For example, there is a line up there for rotation curves. That was my effort to ask what combination of cosmological parameters led to dark matter halo densities that were tolerable to the rotation curve data of the time. Dense cosmologies give birth to dense dark matter halos, so everything above that line was excluded because those parameters cram too much dark matter into too little space. This was a pretty conservative limit at the time, but it is predicated on the insistence of theorists that dark matter halos had to have the NFW form predicted by dark matter-only simulations. Since that time, simulations including baryons have found any number of ways to alter the initial cusp. This in turn means that the constraint no longer applies as the halo might have been altered from its original, cosmology-predicted initial form. Whether the mechanisms that might cause such alterations are themselves viable becomes a separate question.

If we believed all of the available constraints, then there is no window left and FLRW is already ruled out. But not all of those data are correct, and some contradict each other, even absent the assumption of FLRW. So which do we believe? Finding one’s path in this field is like traipsing through an intellectual mine field full of hardened positions occupied by troops dedicated to this or that combination of parameters.

H0 = 100! No, repent you fools, H0 = 50! (Comic by Paul North)

It is in every way an invitation to confirmation bias. The answer we get depends on how we weigh disparate lines of evidence. We are prone to give greater weight to lines of evidence that conform to our pre-established+ beliefs.

So, with that warning, let’s plunge ahead.

The modern Hubble tension

Gone but not yet forgotten are the Hubble wars between camps Sandage (H0 = 50!) and de Vaucouleurs (H0 = 100!). These were largely resolved early this century thanks to the Hubble Space Telescope Key Project on the distance scale. Obtaining this measurement was the major motivation to launch HST in the first place. Finally, this long standing argument was resolved: nearly everyone agreed that H0 = 72 km/s/Mpc.

That agreement was long-lived by the standards of cosmology, but did not last forever. Here is an illustration of the time dependence of H0 measurements this century, from Freedman (2021):

There are many illustrations like this; I choose this one because it looks great and seems to have become the go-to for illustrating the situation. Indeed, it seems to inform the attitude of many scientists close to but not directly involved in the H0 debate. They seem to perceive this as a debate between Adam Riess and Wendy Freedman, who have become associated with the Cepheid and TRGB$ calibrations, respectively. This is a gross oversimplification, as they are not the only actors on a very big stage&. Even in this plot, the first Cepheid point is from Freedman’s HST Key Project. But this apparent dichotomy between calibrators and people seems to be how the subject is perceived by scientists who have neither time nor reason for closer scrutiny. Let’s scrutinize.

Fits to the acoustic power spectrum of the CMB agreed with astronomical measurements of H0 for the first decade of the century. Concordance was confirmed. The current tension appeared with the first CMB data from Planck. Suddenly the grey band of the CMB best-fit no longer overlapped with the blue band of astronomical measurements. This came as a shock. Then a new (red) band appears, distinguishing between the “local” H0 calibrated by the TRGB from that calibrated by Cepheids.

I think I mentioned that cosmology was an invitation to confirmation bias. If you put a lot of weight on CMB fits, as many cosmologists do, then it makes sense from that perspective that the TRGB measurement is the correct one and the Cepheid H0 must be wrong. This is easy to imagine given the history of systematic errors that plagued the subject throughout the twentieth century. This confirmation bias makes one inclined to give more credence to the new# TRGB calibration, which is only in modest tension with the CMB value. The narrative is then simplified to two astronomical methods that are subject to systematic uncertainty: one that agrees with the right answer and one that does not. Ergo, the Cepheid H0 is in systematic error.

This narrative oversimplifies that matter to the point of being actively misleading, and the plot above abets this by focusing on only two of the many local measurements. There is no perfect way to do this, but I had a go at it last year. In the plot below, I cobbled together all the data I could without going ridiculously far back, but chose to show only one point per independent group, the most recent one available from each, the idea being that the same people don’t get new votes every time they tweak their result – that’s basically what is illustrated above. The most recent points from above are labeled Cepheids & TRGB (the date of the TRGB goes to the full Chicago-Carnegie paper, not Freedman’s summary paper where the above plot can be found). See McGaugh (2024) for the references.

When I first made this plot, I discovered that many measurements of the Hubble constant are not all that precise: the plot was an indecipherable forest of error bars. So I chose to make a cut at a statistical uncertainty of 3 km/s/Mpc: worse than that, the data are shown as open symbols sans error bars; better than that, the datum gets explicit illustration of both its statistical and systematic uncertainty. One could make other choices, but the point is that this choice paints a different picture from the choice made above. One of these local measurements is not like the others, inviting a different version of confirmation bias: the TRGB point is the outlier, so perhaps it is the one that is wrong.

Recent measurements of the Hubble constant (left) and the calibration of the baryonic Tully-Fisher relation (right) underpinning one of those measurements.

I highlight the measurement our group made not to note that we’ve done this too so much as to highlight an underappreciated aspect of the apparent tension between Cepheid and TRGB calibrations. There are 50 galaxies that calibrate the baryonic Tully-Fisher relation, split nearly evenly between galaxies whose distance is known through Cepheids (blue points) and TRGB (red points). They give the same answer. There is no tension between Cepheids and the TRGB here.

Chasing this up, it appears to me that what happened was that Freedman’s group reanalyzed the data that calibrate the TRGB, and wound up with a slightly different answer. This difference does not appear to be in the calibration equation (the absolute magnitude of the tip of the red giant branch didn’t change that much), but in something to do with how the tip magnitude is extracted. Maybe, I guess? I couldn’t follow it all the way, and I got bad vibes reminding me of when I tried to sort through Sandage’s many corrections in the early ’90s. That doesn’t make it wrong, but the point is that the discrepancy is not between Cepheids and TRGB calibrations so much as it is between the TRGB as implemented by Freedman’s group and the TRGB as implemented by others. The depiction of the local Hubble constant debate as being between Cepheid and TRGB calibrations is not just misleading, it is wrong.

Can we get away from Cepheids and the TRGB entirely? Yes. The black points above are for megamasers and gravitational lensing. These are geometric methods that do not require intermediate calibrators like Cepheids at all. It’s straight trigonometry. Both indicate H0 > 70. Which way is our confirmation bias leaning now?

The way these things are presented has an impact on scientific consensus. A fascinating experiment on this has been done in a recent conference report. Sometimes people poll conference attendees in an attempt to gauge consensus; this report surveys conference attendees “to take a snapshot of the attitudes of physicists working on some of the most pressing questions in modern physics.” One of the topics queried is the Hubble tension. Survey says:

Table XII from arXiv:2503.15776 in which scientists at the 2024 conference Black Holes Inside and Out vote on their opinion about the most likely solution of the Hubble tension.

First, a shout out to the 1/4 of scientists who expressed no opinion. That’s the proper thing to do when you’re not close enough to a subject to make a well-informed judgement. Whether one knows enough to do this is itself a judgement call, and we often let our arrogance override our reluctance to over-share ill-informed opinions.

Second, a shout out to the folks who did the poll for including a line for systematics in the CMB. That is a logical possibility, even if only 3 of the 72 participants took it seriously. This corroborates the impression I have that most physicists seem to think the CMB is prefect like some kind of holy scripture written in fire on the primordial sky, so must be correct and cannot be questioned, amen. That’s silly; systematics are always a possibility in any observation of the sky. In the case of the CMB, I suspect it is not some instrumental systematic but the underlying assumption of LCDM FLRW that is the issue; once one assumes that, then indeed, the best fit to the Planck data as published is H0 = 67.4, with H0 > 68 being right out. (I’ve checked.)

A red flag that the CMB is where the problem lies is the systematic variation of the best-fit parameters along the trench of minimum χ2:

The time evolution of best-fit CMB cosmology parameters. These have steadily drifted away from the LCDM concordance window while the astronomical measurements that established it have not.

I’ve shown this plot and variations for other choices of H0 before, yet it never fails to come as a surprise when I show it to people who work closely on the subject. I’m gonna guess that extends to most of the people who participated in the survey above. Some red flags prove to be false alarms, some don’t, but one should at least be aware of them and take them into consideration when making a judgement like this.

The plurality (35%) of those polled selected “systematic error in supernova data” as the most likely cause of the Hubble tension. It is indeed a common attitude, as I mentioned above, that the Hubble tension is somehow a problem of systematic errors in astronomical data like back in the bad old days** of Sandage & de Vaucouleurs.

Let’s unpack this a bit. First, the framing: systematic error in supernova data is not the issue. There may, of course, be systematic uncertainties in supernova data, but that’s not a contender for what is causing the apparent Hubble tension. The debate over the local value of H0 is in the calibrators of supernovae. This is often expressed as a tension between Cepheid and TRGB calibrators, but as we’ve seen, even that is misleading. So posing the question this way is all kinds of revealing, including of some implicit confirmation bias. It’s like putting the right answer of a multiple choice question first and then making up some random alternatives.

So what do we learn from this poll for consensus? There is no overwhelming consensus, and the most popular choice appears to be ill-informed. This could be a meme. Tell me you’re not an expert on a subject by expressing an opinion as if you were.

The kicker here is that this was a conference on black hole physics. There seems to have been some fundamental gravitational and quantum physics discussed, which is all very interesting, but this is a community that is pretty far removed from the nitty-gritty of astronomical observations. There are many other polls reported in this conference report, many of them about esoteric aspects of black holes that I find interesting but would not myself venture an opinion on: it’s not my field. It appears that a plurality of participants at this particular conference might want to consider adopting that policy for fields beyond their own expertise.

I don’t want to be too harsh, but it seems like we are repeating the same mistakes we made in the 1980s. As I’ve related before, I came to astronomy from physics with the utter assurance that H0 had to be 50. It was Known. Then I met astronomers who were actually involved in measuring H0 and they were like, “Maybe it is ~80?” This hurt my brain. It could not be so! and yet they turned out to be correct within the uncertainties of the time. Today, similar strong opinions are being expressed by the same community (and sometimes by the same people) who were wrong then, so it wouldn’t surprise me if they are wrong now. Putting how they think things should be ahead of how they are is how they roll.

There are other tensions besides the Hubble tension, but I’ll get to them in future posts. This is enough for now.


αAs I’ve related before, I date the genesis of concordance LCDM to the work of Ostriker & Steinhardt (1995), though there were many other contributions leading to it (e.g., Efstathiou et al. 1990). Certainly many of us anticipated that the Type Ia SN experiments would confirm or deny this picture. Since the issue of confirmation bias is ever-present in cosmic considerations, it is important to understand this context: the acceleration of the expansion rate that is often depicted as a novel discovery in 1998 was an expect result. So much so that at a conference in 1997 in Aspen I recall watching Michael Turner badger the SN presenters to Proclaim Lambda already. One of the representatives from the SN teams was Richard Ellis, who wasn’t having it: the SN data weren’t there yet even if the attitude was. Amusingly, I later heard Turner claim to have been completely surprised by the 1998 discovery, as if he hadn’t been pushing for it just the year before. Aspen is a good venue for discussion; I commented at the time that the need to rehabilitate the cosmological constant was a big stop sign in the sky. He glared at me, and I’ve been on his shit list ever since.

%I will not be entertaining assertions that the universe is not expanding in the comments: that’s beyond the scope of this post.

*Every time a paper corroborating a prediction of MOND is published, the usual suspects get on social media to complain that the referee(s) who reviewed the paper must be incompetent. This is a classic case of admitting you don’t understand how the process works by disparaging what happened in a process to which you weren’t privy. Anyone familiar with the practice of refereeing will appreciate that the opposite is true: claims that seem extraordinary are consistently held to a higher standard.

^Note that it is impossible to exclude the act of judgement. There are approaches to minimizing this in particular experiments, e.g., by doing a blind analysis of large scale structure data. But you’ve still assumed a paradigm in which to analyze those data; that’s a judgement call. It is also a judgement call to decide to believe only large scale data and ignore evidence below some scale.

+I felt this hard when MOND first cropped up in my data for low surface brightness galaxies. I remember thinking How can this stupid theory get any predictions right when there is so much evidence for dark matter? It took a while for me to realize that dark matter really meant mass discrepancies. The evidence merely indicates a problem, the misnomer presupposes the solution. I had been working so hard to interpret things in terms of dark matter that it came as a surprise that once I allowed myself to try interpreting things in terms of MOND I no longer had to work so hard: lots of observations suddenly made sense.

$TRGB = Tip of the Red Giant Branch. Low metallicity stars reach a consistent maximum luminosity as they evolve up the red giant branch, providing a convenient standard candle.

&Where the heck is Tully? He seldom seems to get acknowledged despite having played a crucial role in breaking the tyranny of H0 = 50 in the 1970s, having published steadily on the topic, and his group continues to provide accurate measurements to this day. Do physics-trained cosmologists even know who he is?

#The TRGB was a well-established method before it suddenly appears on this graph. That it appears this way shortly after the CMB told us what answer we should get is a more worrisome potential example of confirmation bias, reminiscent of the situation with the primordial deuterium abundance.

**Aside from the tension between the TRGB as implemented by Freedman’s group and the TRGB as implemented by others, I’m not aware of any serious hint of systematics in the calibration of the distance scale. Can it still happen? Sure! But people are well aware of the dangers and watch closely for them. At this juncture, there is ample evidence that we may indeed have gotten past this.

Ha! I knew the Riess reference off the top of my head, but lots of people have worked on this so I typed “hubble calibration not a systematic error” into Google to search for other papers only to have its AI overview confidently assert

The statement that Hubble calibration is not a systematic error is incorrect

Google AI

That gave me a good laugh. It’s bad enough when overconfident underachievers shout about this from the wrong peak of the Dunning-Kruger curve without AI adding its recycled opinion to the noise, especially since its “opinion” is constructed from the noise.

The best search engine for relevant academic papers is NASA ADS; putting the same text in the abstract box returns many hits that I’m not gonna wade through. (A well-structured ADS search doesn’t read so casually; apparently the same still applies to Google.)

Dark Matter or Modified Gravity? A virtual panel discussion

Dark Matter or Modified Gravity? A virtual panel discussion

This is a quick post to announce that on Monday, April 7 there will be a virtual panel discussion about dark matter and MOND involving Scott Dodelson and myself. It will be moderated by Orin Harris at Northeastern Illinois University starting at 3pm US Central time*. I asked Orin if I should advertise it more widely, and he said yes – apparently their Zoom set up has a capacity for a thousand attendees.

See their website for further details. If you wish to attend, you need to register in advance.


*That’s 4PM EDT to me, which is when I’m usually ready for a nap.

Things I don’t understand in modified dynamics (it’s cosmology)

Things I don’t understand in modified dynamics (it’s cosmology)

I’ve been busy, and a bit exhausted, since the long series of posts on structure formation in the early universe. The thing I like about MOND is that it helps me understand – and successfully predict – the dynamics of galaxies. Specific galaxies that are real objects: one can observe this particular galaxy and predict that it should have this rotation speed or velocity dispersion. In contrast, LCDM simulations can only make statistical statements about populations of galaxy-like numerical abstractions, they can never be equated to real-universe objects. Worse, they obfuscate rather than illuminate. In MOND, the observed centripetal acceleration follows directly from that predicted by the observed distribution of stars and gas. In simulations, this fundamental observation is left unaddressed, and we are left grasping at straws trying to comprehend how the observed kinematics follow from an invisible, massive dark matter halo that starts with the NFW form but somehow gets redistributed just so by inadequately modeled feedback processes.

Simply put, I do not understand galaxy dynamics in terms of dark matter, and not for want of trying. There are plenty of people who claim to do so, but they appear to be fooling themselves. Nevertheless, what I don’t like about MOND is the same thing that they don’t like about MOND which is that I don’t understand the basics of cosmology with it.

Specifically, what I don’t understand about cosmology in modified dynamics is the expansion history and the geometry. That’s a lot, but not everything. The early universe is fine: the expanding universe went through an early hot phase that bequeathed us with the relic radiation field and the abundances of the light elements through big bang nucleosynthesis. There’s nothing about MOND that contradicts that, and arguably MOND is in better agreement with BBN than LCDM, there being no tension with the lithium abundance – this tension was not present in the 1990s, and was only imposed by the need to fit the amplitude of the second peak in the CMB.

But we’re still missing some basics that are well understood in the standard cosmology, and which are in good agreement with many (if not all) of the observations that lead us to LCDM. So I understand the reluctance to admit that maybe we don’t know as much about the universe as we think we do. Indeed, it provokes strong emotional reactions.

Screenshot from Dr. Strangelove paraphrasing Major Kong (original quote at top).

So, what might the expansion history be in MOND? I don’t know. There are some obvious things to consider, but I don’t find them satisfactory.

The Age of the Universe

Before I address the expansion history, I want to highlight some observations that pertain to the age of the universe. These provide some context that informs my thinking on the subject, and why I think LCDM hits pretty close to the mark in some important respects, like the time-redshift relation. That’s not to say I think we need to slavishly obey every detail of the LCDM expansion history when constructing other theories, but it does get some things right that need to be respected in any such effort.

One big thing I think we should respect are constraints on the age of the universe. The universe can’t be younger than the objects in it. It could of course be older, but it doesn’t appear to be much older, as there are multiple, independent lines of evidence that all point to pretty much the same age.

Expansion Age: The first basic is that if the universe is expanding, it has a finite age. You can imagine running the expansion in reverse, looking back in time to when the universe was progressively smaller, until you reach an incomprehensibly dense initial phase. A very long time, to be sure, but not infinite.

To put an exact number on the age of the universe, we need to know its detailed expansion history. That is something LCDM provides that MOND does not pretend to do. Setting aside theory, a good ball park age is the Hubble time, which is the inverse of the Hubble constant. This is how long it takes for a linearly expanding, “coasting” universe to get where it is today. For the measured H0 = 73 km/s/Mpc, the Hubble time is 13.4 Gyr. Keep that number in mind for later. This expansion age is the metric against which to compare the ages of measured objects, as discussed below.

Globular Clusters: The most famous of age constraints is provided by the ancient stars in globular clusters. One of the great accomplishments of 20th century astrophysics is a masterful understanding of the physics of stars as giant nuclear fusion reactors. This allows us to understand how stars of different mass and composition evolve. That, in turn, allows us to put an age on the stars in clusters. Globulars are the oldest of clusters, with a mean age of 13.5 Gyr (Valcin et al. 2021). Other estimates are similar, though I note that the age determinations depends on the distance scale, so keeping them rigorously separate from Hubble constant determinations has historically been a challenge. The covariance of age and distance renders the meaning of error bars rather suspect, but to give a flavor, the globular cluster M92 is estimated to have an age of 13.80±0.75 Gyr (Jiaqi et al. 2023).

Though globular clusters are the most famous in this regard, there are other constraints on the age of the contents of the universe.

White dwarfs: White dwarfs are the remnants of dead stars that were never massive enough to have exploded as supernova. The over/under line for that is about 8 solar mass; the oldest white dwarfs will be the remnants of the first stars that formed just below this threshold. Such stars don’t take long to evolve, around 100 Myr. That’s small compared to the age of the universe, so the first white dwarfs have just been cooling off ever since their progenitors burned out.

As the remnants of the incredibly hot cores of former stars, white dwarfs star off hot but cool quickly by radiating into space. The timescale to cool off can be crudely estimated from first principles just from the Stefan-Boltzmann law. As with so many situations in astrophysics, some detailed radiative transfer calculations are necessary to get the answer right in detail. But the ballpark of the back-of-the-envelope answer is not much different from the detailed calculation, giving some confidence in the procedure: we have a good idea of how long it takes white dwarfs to cool.

Since white dwarfs are not generating new energy but simply radiating into space, their luminosity fades over time as their surface temperature declines. This predicts that there will be a sharp drop in the numbers of white dwarfs corresponding to the oldest such objects: there simply hasn’t been enough time to cool further. The observational challenge then becomes finding the faint edge of the luminosity function for these intrinsically faint sources.

Despite the obvious challenges, people have done it, and after great effort, have found the expected edge. Translating that into an age, we get 12.5+1.4/-3.5 Gyr (Munn et al. 2017). This seems to hold up well now that we have Gaia data, which finds J1312-4728 to be the oldest known white dwarf at 12.41±0.22 Gyr (Torres et al. 2021). To get to the age of the universe, one does have to account for the time it takes to make a white dwarf in the first place, which is of order a Gyr or less, depending on the progenitor and when it formed in the early universe. This is pretty consistent with the ages of globular clusters, but comes from different physics: radiative cooling is the dominant effect rather than the hydrogen fusion budget of main sequence stars.

Radiochronometers: Some elements decay radioactively, so measuring their isotopic abundances provides a clock. Carbon-14 is a famous example: with a half-life of 5,730 years, its decay provides a great way to date the remains of prehistoric camp sites and bones. That’s great over some tens of thousands of years, but we need something with a half-life of order the age of the universe to constrain that. One such isotope is 232Thorium, with a half life of 14.05 Gyr.

Making this measurement requires that we first find stars that are both ancient and metal poor but with detectable Thorium and Europium (the latter providing a stable a reference). Then one has to obtain a high quality spectrum with which to do an abundance analysis. This is all hard work, but there are some examples known.

Sneden‘s star, CS 22892-052, fits the bill. Long story short, the measured Th/Eu ratio gives an age of 12.8±3 Gyr (Sneden et al. 2003). A similar result of ~13 Gyr (Frebel & Kratz 2009) is obtained from 238U (this “stable” isotope of uranium has a half-life of 4.5 Gyr, as opposed to the kind that can be provoked into exploding, 235U, which has a half-life of 700 Myr). While the search for the first stars and the secrets they may reveal is ongoing, the ages for individual stars estimated from radioactive decay are consistent with the ages of the oldest globular clusters indicated by stellar evolution.

Interstellar dust grains: The age of the solar system (4.56 Gyr) is well known from the analysis of isotopic abundances in meteorites. In addition to tracing the oldest material in the solar system, sometimes it is possible to identify dust grains of interstellar origin. One can do the same sort of analysis, and do the sum: how long did it take the star that made those elements to evolve, return them to the interstellar medium, get mixed in with the solar nebula, and lurk about in space until plunging to the ground as a meteorite that gets picked up by some scientifically-inclined human. This exercise has been done by Nittler et al. (2008), who estimate a total age of 13.7±1.3 Gyr

Taken in sum, all these different age indicators point to a similar, consistent age between 13 and 14 billion years. It might be 12, but not lower, nor is there reason to think it would be much higher: 15 is right out. I say that flippantly because I couldn’t resist the Monty Python reference, but the point is serious: you could in principle have a much older universe, but then why are all the oldest things pretty much the same age? Why would the universe sit around doing nothing for billions of years then suddenly decide to make lots of stars all at once? The more obvious interpretation is that the age of the universe is indeed in the ballpark of 13.something Gyr.

Expansion history

The expansion history in the standard FLRW universe is governed by the Friedmann equation, which we can write* as

H2(z) = H02m(1+z)3k(1+z)2Λ]

where z is the redshift, H(z) is the Hubble parameter, H0 is its current value, and the various Ω are the mass-energy density of stuff relative to the critical density: the mass density Ωm, the geometry Ωk, and the cosmological constant ΩΛ. I’ve neglected radiation for clarity. One can make up other stuff X and add a term for it as ΩX which will have an associated (1+z) term that depends on the equation of state of X. For our purposes, both normal matter and non-baryonic cold dark matter (CDM) share the same equation of state (cold meaning non-relativisitic motions meaning rest-mass density but negligible pressure), so both contribute to the mass density Ωm = ΩbCDM.

Note that since H(z=0)=H0, the various Ω’s have to sum to unity. Thus a cosmology is geometrically flat with the curvature term Ωk = 0 if ΩmΛ = 1. Vanilla LCDM has Ωm = 0.3 and ΩΛ = 0.7. As a community, we’ve become very sure of this, but that the Friedmann equation is sufficient to describe the expansion history of the universe is an assumption based on (1) General Relativity providing a complete description, and (2) the cosmological principle (homogeneity and isotropy) holds. These seem like incredibly reasonable assumptions, but let’s bear in mind that we only know directly about 5% of the sum of Ω’s, the baryons. ΩCDM = 0.25 and ΩΛ = 0.7 are effectively fudge factors we need to make things works out given the stated assumptions. LCDM is viable if and only if cold dark matter actually exists.

Gravity is an attractive force, so the mass term Ωm acts to retard the expansion. Early on, we expected this to be the dominant term due to the (1+z)3 dependence. In the long-presumed+ absence of a cosmological constant, cosmology was the search for two numbers: once H0 and Ωm are specified, the entire expansion history is known. Such a universe can only decelerate, so only the region below the straight line in the graph below is accessible; an expansion history like the red one representing LCDM should be impossible. That lots of different data seemed to want this is what led us kicking and screaming to rehabilitate the cosmological constant, which acts as a form of anti-gravity to accelerate an expansion that ought to be decelerating.

The expansion factor maps how the universe has grown over time; it corresponds to 1/(1+z) in redshift so that z → ∞ as t → 0. The “coasting” limit of an empty universe (H0 = 73, Ωm = ΩΛ = 0) that expands linearly is shown as the straight line. The red line is the expansion history of vanilla LCDM (H0 = 70, Ωm = 0.3, ΩΛ = 0.7).

The over/under between acceleration/deceleration of the cosmic expansion rate is the coasting universe. This is the conceptually useful limit of a completely empty universe with Ωm = ΩΛ = 0. It expands at a steady rate that neither accelerates nor decelerates. The Hubble time is exactly equal to the age of such a universe, i.e., 13.4 Gyr for H0 = 73.

LCDM has a more complicated expansion history. The mass density dominates early on, so there is an early phase of deceleration – the red curve bends to the right. At late times, the cosmological constant begins to dominate, reversing the deceleration and transforming it into an acceleration. The inflection point when it switches from decelerating to accelerating is not too far in the past, which is a curious coincidence given that the entire future of such a universe will be spent accelerating towards the exponential expansion of the de Sitter limit. Why do we live anywhen close to this special time?

Lots of ink has been spilled on this subject, and the answer seems to boil down to the anthropic principle. I find this lame and won’t entertain it further. I do, however, want to point out a related strange coincidence: the current age of vanilla LCDM (13.5 Gyr) is the same as that of a coasting universe with the locally measured Hubble constant (13.4 Gyr). Why should these very different models be so close in age? LCDM decelerates, then accelerates; there’s only one moment in the expansion history of LCDM when the age is equal to the Hubble time, and we happen to be living just then.

This coincidence problem holds for any viable set of LCDM parameters, as they all have nearly the same age. Planck LCDM has an age of 13.7 Gyr, still basically the same as the Hubble time for the locally measured Hubble constant. The lower Planck Hubble value is balanced by a larger amount of early-time deceleration. The universe reaches its current point after 13.something Gyr in all of these models. That’s in good agreement with the ages of the oldest observed stars, which is encouraging, but it does nothing to help us resolve the Hubble tension, much less constrain alternative cosmologies.

Cosmic expansion in MOND

There is no equivalent to the Friedmann equation is in MOND. This is not satisfactory. As an extension of Newtonian theory, MOND doesn’t claim to encompass cosmic phenomena$ – hence the search for a deeper underlying theory. Lacking this, what can we try?

Felten (1984) tried to derive an equivalent to the Friedmann equation using the same trick that can be used with Newtonian theory to recover the expansion dynamics in the absence of a cosmological constant. This did not work. The result was unsatisfactory& for application to the whole universe because the presence of a0 in the equations makes the result scale-dependent. So how big the universe is matters in a way that the standard cosmology does not; there’s no way to generalize is to describe the whole enchilada.

In retrospect, what Felten had really obtained was a solution for the evolution of a top-hat over-density: the dynamics of a spherical region embedded in an expanding universe. This result is the basis for the successful prediction of early structure formation in MOND. But once again it only tells us about the dynamics of an object within the universe, not the universe itself.

In the absence of a complete theory, one makes an ansatz to proceed. If there is a grander theory that encompasses both General Relativity and MOND, then it must approach both in the appropriate limit, so an obvious ansatz to make is that the entire universe obeys the conventional Friedmann equation while the dynamics of smaller regions in the low acceleration regime obey MOND. Both Bob Sanders and I independently adopted this approach, and explicitly showed that it was consistent with the constraints that were known at the time. The first obvious guess for the mass density of such a cosmology is Ωm = Ωb = 0.04. (This was the high end of BBN estimates at the time, so back then we also considered lower values.) The expansion history of this low density, baryon-only universe is shown as the blue line below:

As above, but with the addition of a low density, baryon-dominated, no-CDM universe (H0 = 73, Ωm = Ωb = 0.04, ΩΛ = 0; blue line).

As before, there is not much to choose between these models in terms of age. The small but non-zero mass density does cause some early deceleration before the model approaches the coasting limit, so the current age is a bit lower: 12.6 Gyr. This is on the small side, but not problematically so, or even particularly concerning given the history of the subject. (I’m old enough to remember when we were pretty sure that globular clusters were 18 Gyr old.)

The time-redshift relation for the no-CDM, baryon-only universe is somewhat different from that of LCDM. If we adopt it, then we find that MOND-driven structure forms at somewhat higher redshift than in with the LCDM time-redshift relation. The benchmark time of 500 Myr for L* galaxy formation is reached at z = 15 rather than z = 9.5 as in LCDM. This isn’t a huge difference, but it does mean that an L* galaxy could in principle appear even earlier than so far seen. I’ve stuck with LCDM as the more conservative estimate of the time-redshift relation, but the plain fact is we don’t really know what the universe is doing at those early times, or if the ansatz we’ve made holds well enough to do this. Surely it must fail at some point, and it seems likely that we’re past that point.

There is a bigger problem with the no-CDM model above. Even if it is close to the right expansion history, it has a very large negative curvature. The geometry is nowhere close to the flat Robertson-Walker metric indicated by the angular diameter distance to the surface of last scattering (the CMB).

Geometry

Much of cosmology is obsessed with geometry, so I will not attempt to do the subject justice. Each set of FLRW parameters has a specific geometry that comes hand in hand with its expansion history. The most sensitive probe we have of the geometry is the CMB. The a priori prediction of LCDM was that its flat geometry required the first acoustic peak to have a maximum near one degree on the sky. That’s exactly what we observe.

Fig. 45 from Famaey & McGaugh (21012): The acoustic power spectrum of the cosmic microwave background as observed by WMAP [229] together with the a priori predictions of ΛCDM (red line) and no-CDM (blue line) as they existed in 1999 [265] prior to observation of the acoustic peaks. ΛCDM correctly predicted the position of the first peak (the geometry is very nearly flat) but over-predicted the amplitude of both the second and third peak. The most favorable a priori case is shown; other plausible ΛCDM parameters [468] predicted an even larger second peak. The most important parameter adjustment necessary to obtain an a posteriori fit is an increase in the baryon density Ωb, above what had previously been expected from BBN. In contrast, the no-CDM model ansatz made as a proxy for MOND successfully predicted the correct amplitude ratio of the first to second peak with no parameter adjustment [268, 269]. The no-CDM model was subsequently shown to under-predict the amplitude of the third peak [442], so no model can explain these data without post-hoc adjustment.

In contrast, no-CDM made the correct prediction for the first-to-second peak amplitude ratio, but it is entirely ambivalent about the geometry. FLRW cosmology and MOND dynamics care about incommensurate things in the CMB data. That said, the naive prediction of the baryon-only model outlined above is that the first peak should occur around where the third peak is observed. That is obviously wrong.

Since the geometry is not a fundamental prediction of MOND, the position of the first peak is easily fit by invoking the same fudge factor used to fit it conventionally: the cosmological constant. We need a larger ΩΛ = 0.96, but so what? This parameter merely encodes our ignorance: we make no pretense to understand it, let alone vesting deep meaning in it. It is one of the things that a deeper theory must explain, and can be considered as a clue in its development.

So instead of a baryon-only universe, our FLRW proxy becomes a Lambda-baryon universe. That fits the geometry, and for an optical depth to the surface of last scattering of τ = 0.17, matches the amplitude of the CMB power spectrum and correctly predicts the cosmic dawn signal that EDGES claimed to detect. Sounds good, right? Well, not entirely. It doesn’t fit the CMB data at L > 600, but I expected to only get so far with the no-CDM, so it doesn’t bother me that you need a better underlying theory to fit the entire CMB. Worse, to my mind, is that the Lambda-baryon proxy universe is much, much older than everything in it: 22 Gyr instead of 13.something.

As above, but now with the addition of a low density, Lambda-dominated universe (H0 = 73, Ωm = Ωb = 0.04, ΩΛ = 0.96; dashed line).

This just don’t seem right. Or even close to right. Like, not even pointing in a direction that might lead to something that had a hope of being right.

Moreover, we have a weird tension between the baryon-only proxy and the Lambda-baryon proxy cosmology. The baryon-only proxy has a plausible expansion history but an unacceptable geometry. The Lambda-baryon proxy has a plausible geometry by an implausible expansion history. Technically, yes, it is OK for the universe to be much older than all of its contents, but it doesn’t make much sense. Why would the universe do nothing for 8 or 9 Gyr, then burst into a sudden frenzy of activity? It’s as if Genesis read “for the first 6 Gyr, God was a complete slacker and did nothing. In the seventh Gyr, he tried to pull an all-nighter only to discover it took a long time to build cosmic structure. Then He said ‘Screw it’ and fudged Creation with MOND.”

In the beginning the Universe was created.
This has made a lot of people very angry and been widely regarded as a bad move.

Douglas Adams, The Restaurant at the End of the Universe

So we can have a plausible geometry or we can have a plausible expansion history with a proxy FLRW model, but not both. That’s unpleasant, but not tragic: we know this approach has to fail somehow. But I had hoped for FLRW to be a more coherent first approximation to the underlying theory, whatever it may be. If there is such a theory, then both General Relativity and MOND are its limits in their respective regimes. As such, FLRW ought to be a good approximation to the underlying entity up to some point. That we have to invoke both non-baryonic dark matter and a cosmological constant is a hint that we’ve crossed that point. But I would have hoped that we crossed it in a more coherent fashion. Instead, we seem to get a little of this for the expansion history and a little of that for the geometry.

I really don’t know what the solution is here, or even if there is one. At least I’m not fooling myself into presuming it must work out.


*There are other ways to write the Friedmann equation, but this is a useful form here. For the mathematically keen, the Hubble parameter is the time derivative of the expansion factor normalized by the expansion factor, which in terms of redshift is

H(z) = -(dz/dt)/(1+z)2.

This quantity evolves, leading us to expect evolution in Milgrom’s constant if we associate it with the numerical coincidence

2π a0 = cH0

If the Hubble parameter evolves, as it appears to do, it would seem to follow that so should a(z) ~ H(z) – otherwise the coincidence is just that: a coincidence that applies only now. There is, at present, no persuasive evidence that a0 evolves with redshift.

A similar order-of-magnitude association can be made with the cosmological constant,

2π a0 = c2Λ1/2

so conceivably the MOND acceleration scale appears as the result of vacuum effects. It is a matter of judgement whether these numerical coincidences are mere coincidences or profound clues towards a deeper theory. That the proportionality constant is very nearly 2π is certainly intriguing, but the constancy of any of these parameters (including Newton’s G) depends on how they emerge from the deeper theory.


+In January 2019, I was attending a workshop at Princeton when I had a chance encounter with Jim Peebles. He was not attending the workshop, but happened to be walking across campus at the same time I was. We got to talking, and he affirmed my recollection of just how incredibly unpopular the cosmological constant used to be. Unprompted, he went on to make the analogy of how similar that seemed to how unpopular MOND is now.

Peebles was awarded a long-overdue Nobel Prize later that year.


$This is one of the things that makes it tricky to compare LCDM and MOND. MOND is a theory of dynamics in the limit of low acceleration. It makes no pretense to be a cosmological theory. LCDM starts as a cosmological theory, but it also makes predictions about the dynamics of systems within it (or at least the dark matter halos in which visible galaxies are presumed to form). So if one starts by putting on a cosmology hat, there is nothing to talk about: LCDM is the only game in town. But from the perspective of dynamics, it’s the other way around, with LCDM repeatedly failing to satisfactorily explain, much less anticipate, phenomena that MOND predicted correctly in advance.


&An intriguing thing about Felten’s MOND universe is that it eventually recollapses irrespective of the mass density. There is no critical value of Ωm, hence no coincidence problem. MOND is strong enough to eventually reverse the expansion of the universe, it just takes a very long time to do so, depending on the density.

I’m surprised this aspect of the issue was overlooked. The coincidence problem (then mostly called the flatness problem) obsessed people at the time, so much so that its solution by Cosmic Inflation led to its widespread acceptance. That only works if Ωm = 1; LCDM makes the coincidence worse. I guess the timing was off, as Inflation had already captured the community’s imagination by that time, likely making it hard to recognize that MOND was a more natural solution. We’d already accepted the craziness that was Inflation and dark matter; MOND craziness was a bridge too far.

I guess. I’m not quite that old; I was still an undergraduate at the time. I did hear about Inflation then, in glowing terms, but not a thing about MOND.

Kinematics suggest large masses for high redshift galaxies

Kinematics suggest large masses for high redshift galaxies

This is what I hope will be the final installment in a series of posts describing the results published in McGaugh et al. (2024). I started by discussing the timescale for galaxy formation in LCDM and MOND which leads to different and distinct predictions. I then discussed the observations that constrain the growth of stellar mass over cosmic time and the related observation of stellar populations that are mature for the age of the universe. I then put on an LCDM hat to try to figure out ways to wriggle out of the obvious conclusion that galaxies grew too massive too fast. Exploring all the arguments that will be made is the hardest part, not because they are difficult to anticipate, but because there are so many* options to consider. This leads to many pages of minutiae that no one ever seems to read+, so one of the options I’ve discussed (e.g., super-efficient star formation) will likely emerge as the standard picture even if it comes pre-debunked.

The emphasis so far has been on the evolution of the stellar masses of galaxies because that is observationally most accessible. That gives us the opportunity to wriggle, because what we really want to measure to test LCDM is the growth of [dark] mass. This is well-predicted but invisible, so we can always play games to relate light to mass.

Mass assembly in LCDM from the IllustrisTNG50 simulation. The dark matter mass assembles hierarchically in the merger tree depicted at left; the size of the circles illustrates the dark matter halo mass. The corresponding stellar mass of the largest progenitor is shown at right as the red band. This does not keep pace with the apparent assembly of stellar mass (data points), but what is the underlying mass really doing?

Galaxy Kinematics

What we really want to know is the underlying mass. It is reasonable to expect that the light traces this mass, but is there another way to assess it? Yes: kinematics. The orbital speeds of objects in galaxies trace the total potential, including the dark matter. So, how massive were early galaxies? How does that evolve with redshift?

The rotation curve of NGC 6946 traced by stars at small radii and gas farther out. This is a typical flat rotation curve (data points) that exceeds what can be explained by the observed baryonic mass (red line deduced from the stars and gas pictured at right), leading to the inference of dark matter.

The rotation curve for NGC 6946 shows a number of well-established characteristics for nearby galaxies, including the dominance of baryons at small radii in high surface brightness galaxies and the famous flat outer portion of the rotation curve. Even when stars contribute as much mass as allowed by the inner rotation curve (“maximum disk“), there is a need for something extra further out (i.e., dark matter or MOND). In the case of dark matter, the amplitude of flat rotation is typically interpreted as being indicative& of halo mass.

So far, the rotation curves of high redshift galaxies look very much like those of low redshift galaxies. There are some fast rotators at high redshift as well. Here is an example observed by Neeleman et al. (2020), who measure a flat rotation speed of 272 km/s for DLA0817g at z = 4.26. That’s more massive than either the Milky Way (~200 km/s) or Andromeda (~230 km/s), if not quite as big as local heavyweight champion UGC 2885 (300 km/s). DLA0817g looks to be a disk galaxy that formed early and is sedately rotating only 1.4 Gyr after the Big Bang. It is already massive at this time: not at all the little nuggets we expect from the CDM merger tree above.

Fig. 1 from Neeleman et al. (2020): the velocity field (left) and position-velocity diagram (right) of DLA0817g. The velocity field looks like that of a rotating disk with the raw position-velocity diagram shows motions of ~200 km/s on either side of the center. When corrected for inclination, the flat rotation speed is 272 km/s, corresponding to a massive galaxy near the top of the Tully-Fisher relation.

This is anecdotal, of course, but there are a good number of similar cases that are already known. For example, the kinematics of ALESS 073.1 at z ≈ 5 indicate the presence of a massive stellar bulge as well as a rapidly rotating disk (Lelli et al. 2021). A similar case has been observed at z ≈ 6 (Tripodi et al. 2023). These kinematic observations indicate the presence of mature, massive disk galaxies well before they were expected to be in place (Pillepich et al. 2019; Wardlow 2021). The high rotation speeds observed in early disk galaxies sometimes exceed 250 (Neeleman et al. 2020) or even 300 km s−1 (Nestor Shachar et al. 2023; Wang et al. 2024), comparable to the most massive local spirals (Noordermeer et al. 2007; Di Teodoro et al. 2021, 2023). That such rapidly rotating galaxies exist at high redshift indicates that there is a lot of mass present, not just light. We can’t just tweak the mass-to-light ratio of the stars to explain the photometry and also explain the kinematics.

In a seminal galaxy formation paper, Mo, Mao, & White (1998) predicted that “present-day disks were assembled recently (at z ≤ 1).” Today, we see that spiral galaxies are ubiquitous in JWST images up to z ∼ 6 (Ferreira et al. 2022, 2023; Kuhn et al. 2024). The early appearance of massive, dynamically cold (Di Teodoro et al. 2016; Lelli et al. 2018, 2023; Rizzo et al. 2023) disks in the first few billion years after the Big Bang is contradictory the natural prediction of ΛCDM. Early disks are expected to be small and dynamically hot (Dekel & Burkert 2014; Zolotov et al. 2015; Krumholz et al. 2018; Pillepich et al. 2019), but they are observed to be massive and dynamically cold. (Hot or cold in this context means a high or low amplitude of the velocity dispersion relative to the rotation speed; the modern Milky Way is cold with σ ~ 20 km/s and Vc ~ 200 km/s.) Understanding the stability and longevity of dynamically cold spiral disks is foundational to the problem.

Kinematic Scaling Relations

Beyond anecdotal cases, we can check on kinematic scaling relations like Tully–Fisher. These are expected to emerge late and evolve significantly with redshift in LCDM (e.g., Glowacki et al. 2021). In MOND, the normalization of the baryonic Tully–Fisher relation is set by a0, so is immutable for all time if a0 is constant. Let’s see what the data say:

Figure 9 from McGaugh et al (2024)The baryonic Tully–Fisher (left) and dark matter fraction–surface brightness (right) relations. Local galaxy data (circles) are from Lelli et al. (2019; left) and Lelli et al. (2016; right). Higher-redshift data (squares) are from Nestor Shachar et al. (2023) in bins with equal numbers of galaxies color coded by redshift: 0.6 < z < 1.22 (blue), 1.22 < z < 2.14 (green), and 2.14 < z < 2.53 (red). Open squares with error bars illustrate the typical uncertainties. The relations known at low redshift also appear at higher redshift with no clear indication of evolution over a lookback time up to 11 Gyr.

Not much to see: the data from Nestor Shachar et al. (2023) show no clear indication of evolution. The same can be said for the dark matter fraction-surface brightness relation. (Glad to see that being plotted after I pointed it out.) The local relations are coincident with those at higher redshift for both relations within any sober assessment of the uncertainties – exactly what we measure and how matters at this level, and I’m not going to attempt to disentangle all that here. Neither am I about to attempt to assess the consistency (or lack thereof) with either LCDM or MOND; the data simply aren’t good enough for that yet. It is also not clear to me that everyone agrees on what LCDM predicts.

What I can do is check empirically how much evolution there is within the 100-galaxy data set of Nestor Shachar et al. (2023). To do that, I fit a line to their data (the left panel above) and measure the residuals: for a given rotation speed, how far is each galaxy from the expected mass? To compare this with the stellar masses discussed previously, I normalize those residuals to the same M** = 9 x 1010 M. If there is no evolution, the data will scatter around a constant value as function of redshift:

This figure reproduces the stellar mass-redshift data for L* galaxies (black points) and the monolithic (purple line) and LCDM (red and green lines) models discussed previously. The blue squares illustrate deviations of the data of Nestor Shachar et al. (2023) from the baryonic Tully-Fisher relation (dashed line, normalized to the same mass as the monolithic model). There is no indication of evolution in the baryonic Tully-Fisher relation, which was apparently established within the first few billion years after the Big Bang (z = 2.5 corresponds to a cosmic age of about 2.6 Gyr). The data are consistent with a monolithic galaxy formation model in which all the mass had been assembled into a single object early on.

The data scatter around a constant value as function of redshift: there is no perceptible evolution.

The kinematic data for rotating galaxies tells much the same story as the photometric data for galaxies in clusters. The are both consistent with a monolithic model that gathered together the bulk of the baryonic mass early on, and evolved as an island universe for most of the history of the cosmos. There is no hint of the decline in mass with redshift predicted by the LCDM simulations. Moreover, the kinematics trace mass, not just light. So while I am careful to consider the options for LCDM, I don’t know how we’re gonna get out of this one.

Empirically, it is an important observation that there is no apparent evolution in the baryonic Tully-Fisher relation out to z ~ 2.5. That’s a lookback time of ~11 Gyr, so most of cosmic history. That means that whatever physics sets the relation did so early. If the physics is MOND, this absence of evolution implies that a0 is constant. There is some wiggle room in that given all the uncertainties, but this already excludes the picture in which a0 evolves with the expansion rate through the coincidence a0 ~ cH0. That much evolution would be readily perceptible if H(z) evolves as it appears to do. In contrast, the coincidence a0 ~ c2Λ1/2 remains interesting since the cosmological constant is constant. Perhaps this is just a coincidence, or perhaps it is a hint that the anomalous acceleration of the expansion of the universe is somehow connected with the anomalous acceleration in galaxy dynamics.

Though I see no clear evidence for evolution in Tully-Fisher to date, it remains early days. For example, a very recent paper by Amvrosiadis et al. (2025) does show a hint of evolution in the sense of an offset in the normalization of the baryonic Tully-Fisher relation. This isn’t very significant, being different by less than 2σ; and again we find ourselves in a situation where we need to take a hard look at all the assumptions and population modeling and velocity measurements just to see if we’re talking about the same quantities before we even begin to assess consistency or the lack thereof. Nevertheless, it is an intriguing result. There is also another interesting anecdotal case: one of their highest redshift objects, ALESS 071.1 at z = 3.7, is also the most massive in the sample, with an estimated stellar mass of 2 x 1012 M. That is a crazy large number, comparable to or maybe larger than the entire dark matter halo of the Milky Way. It falls off the top of any of the graphs of stellar mass we discussed before. If correct, this one galaxy is an enormous problem for LCDM regardless of any other consideration. It is of course possible that this case will turn out to be wrong for some reason, so it remains early days for kinematics at high redshift.

Cluster Kinematics

It is even earlier days for cluster kinematics. First we have to find them, which was the focus of Jay Franck’s thesis. Once identified, we have to estimate their masses with the available data, which may or may not be up to the task. And of course we have to figure out what theory predicts.

LCDM makes a clear prediction for the growth of cluster mass. This work out OK at low redshift, in the sense that the cluster X-ray mass function is in good agreement with LCDM. Where the theory struggles is in the proclivity for the most massive clusters to appear sooner in cosmic history than anticipated. Like individual galaxies, they appear too big too soon. This trend persisted in Jay’s analysis, which identified candidate protoclusters at higher redshifts than expected. It also measured velocity dispersions that were consistently higher than found in simulations. That is, when Jay applied the search algorithm he used on the data to mock data from the Millennium simulation, the structures identified there had velocity dispersions on average a factor of two lower than seen in the data. That’s a big difference in terms of mass.

Figure 11 from McGaugh et al. (2024): Measured velocity dispersions of protocluster candidates (Franck & McGaugh 2016a, 2016b) as a function of redshift. Point size grows with the assessed probability that the identified overdensities correspond to a real structure: all objects are shown as small points, candidates with P > 50% are shown as light blue midsize points, and the large dark blue points meet this criterion and additionally have at least 10 spectroscopically confirmed members. The MOND mass for an equilibrium system in the low-acceleration regime is noted at right; these are comparable to cluster masses at low redshift.

At this juncture, there is no way to know if the protocluster candidates Jay identified are or will become bound structures. We made some probability estimates that can be summed up as “some are probably real, but some probably are not.” The relative probability is illustrated by the size of the points in the plot above; the big blue points are the most likely to be real clusters, having at least ten galaxies at the same place on the sky at the same redshift, all with spectroscopically measured redshifts. Here the spectra are critical; photometric redshifts typically are not accurate enough to indicate that galaxies that happen to be nearby to each other on the sky are also that close in redshift space.

The net upshot is that there are at least some good candidate clusters at high redshift, and these have higher velocity dispersions than expected in LCDM. I did the exercise of working out what the equivalent mass in MOND would be, and it is about the same as what we find for clusters at low redshift. This estimate assumes dynamical equilibrium, which is very far from guaranteed. But the time at which these structures appear is consistent with the timescale for cluster formation in MOND (a couple Gyr; z ~ 3), so maybe? Certainly there shouldn’t be lots of massive clusters in LCDM at z ~ 3.

Kinematic Takeaways

While it remains early days for kinematic observations at high redshift, so far these data do nothing to contradict the obvious interpretation of the photometric data. There are mature, dynamically cold, fast rotating spiral galaxies in the early universe that were predicted not to be there by LCDM. Moreover, kinematics traces mass, not just light, so all the wriggling we might try to explain the latter doesn’t help with the former. The most obvious interpretation of the kinematic data to date is the same as that for the photometric data: galaxies formed early and grew massive quickly, as predicted a priori by MOND.


*The papers I write that cover both theories always seem to wind up lopsided in favor of LCDM in terms of the bulk of their content. That happens because it takes many pages to discuss all the ins and outs. In contrast, MOND just gets it right the first time, so that section is short: there’s not much more to say than “Yep, that’s what it predicted.”

+I’ve yet not heard directly any criticisms of our paper. The criticisms that I’ve heard second or third hand so far almost all fall in the category of things we explicitly discussed. That’s a pretty clear tell that the person leveling the critique hasn’t bothered to read it. I don’t expect everyone to agree with our take on this or that, but a competent critic would at least evince awareness that we had addressed their concern, even if not to their satisfaction. We rarely seem to reach that level: it is much easier to libel and slander than engage with the issues.

The one complaint I’ve heard so far that doesn’t fall in the category of things-we-already-discussed is that we didn’t do hydrodynamic simulations of star formation in molecular gas. That is a red herring. To predict the growth of stellar mass, all we need is a prescription for assembling mass and converting baryons into stars; this is essentially a bookkeeping exercise that can be done analytically. If this were a serious concern, it should be noted that most cosmological hydro-simulations also fail to meet this standard: they don’t resolve star formation, so they typically adopt some semi-empirical (i.e., data-informed) bookkeeping prescription for this “subgrid physics.”

Though I have not myself attempted to numerically simulate galaxy formation in MOND, Sanders (2008) did. More recently, Eappen et al. (2022) have done so, including molecular gas and feedback$ and everything. They find a star formation history compatible with the analytic models we discuss in our paper.

$Related detail: Eappen et al find that different feedback schemes make little difference to the end result. The deus ex machina invoked to solve all problems in LCDM is largely irrelevant in MOND. There’s a good physical reason for this: gravity in MOND is sourced by what you see; how it came to have its observed distribution is irrelevant. If 90% of the baryons are swept entirely out of the galaxy by some intense galactic wind, then they’re gone BYE BYE and don’t matter any more. In contrast, that is one of the scenarios sometimes invoked to form cores in dark matter halos that are initially cuspy: the departure of all those baryons perturbs the orbits of the dark matter particles and rearranges the structure of the halo. While that might work to alter halo structure, how it results in MOND-like phenomenology has never been satisfactorily explained. Mostly that is not seen as even necessary; converting cusp to core is close enough!


&Though we typically associate the observed outer velocity with halo mass, an important caveat is that the radius also matters: M ~ RV2, and most data for high redshift galaxies do not extend very far out in radius. Nevertheless, it takes a lot of mass to make rotation speeds of order 200 km/s within a few kpc, so it hardly matters if this is or is not representative of the dark matter halo: if it is all stars, then the kinematics directly corroborate the interpretation of the photometric data that the stellar mass is large. If it is representative of the dark matter halo, then we expect the halo radius to scale with the halo velocity (R200 ~ V200) so M200 ~ V2003 and again it appears that there is too much mass in place too early.

A few videos for the new year

A few videos for the new year

Happy new year to those who observe the Gregorian calendar. I will write a post on the observations that test the predictions discussed last time. It has been over a quarter century since Bob Sanders correctly predicted that massive galaxies would form by z = 10, and three years since I reiterated that for what JWST would see on this blog. This is a testament to both the scientific method and the inefficiency of communication.

Here I provide links to some recent interviews on the subject. These are listed in chronological order, which happen to flow in order of increasing technical detail.

The first entry is from my colleague Federico Lelli. It is in Italian rather than English, but short and easy on the ears. If nothing else, appreciate that Dr. Lelli did this on the absence of sleep afforded a new father.

Next is an interview I did with EarthSky. I thought this went well, and should be reasonably accessible.

Next is Scientific Sense:

Most recently, there is the entry from the AAS Journal Author Series. These are based on papers published in the journals of the American Astronomical Society in which authors basically narrate their papers, so this goes through it at an appropriately high (ApJ) level.

We discuss the “little red dots” some, which touches on the issues of size evolution that were discussed in the comments previously. I won’t add to that here beyond noting again that the apparent size evolution is proportional to (1+z), in the sense that high redshift galaxies are apparently smaller than those of similar stellar mass locally. This (1+z) is the factor that relates the angular diameter distance of the Robsertson-Walker metric to that of Euclidean geometry. Consequently, we would not infer any size evolution if the geometry were Euclidean. It’s as if cosmology flunks the Tolman test. Weird.

There is a further element of mystery towards the end where the notion that “we don’t know why” comes up repeatedly. This is always true at some deep philosophical level, but it is also why we construct and test hypotheses. Why does MOND persistently make successful predictions that LCDM did not? Usually we say the reason why has to do with the successful hypothesis coming closer to the truth.

That’s it for now. There will be more to come as time permits.

On the timescale for galaxy formation

On the timescale for galaxy formation

I’ve been wanting to expand on the previous post ever since I wrote it, which is over a month ago now. It has been a busy end to the semester. Plus, there’s a lot to say – nothing that hasn’t been said before, somewhere, somehow, yet still a lot to cobble together into a coherent story – if that’s even possible. This will be a long post, and there will be more after to narrate the story of our big paper in the ApJ. My sole ambition here is to express the predictions of galaxy formation theory in LCDM and MOND in the broadest strokes.

A theory is only as good as its prior. We can always fudge things after the fact, so what matters most is what we predict in advance. What do we expect for the timescale of galaxy formation? To tell you what I’m going to tell you, it takes a long time to build a massive galaxy in LCDM, but it happens much faster in MOND.

Basic Considerations

What does it take to make a galaxy? A typical giant elliptical galaxy has a stellar mass of 9 x 1010 M. That’s a bit more than our own Milky Way, which has a stellar mass of 5 or 6 x 1010 M (depending who you ask) with another 1010 M or so in gas. So, in classic astronomy/cosmology style, let’s round off and say a big galaxy is about 1011 M. That’s a hundred billion stars, give or take.

An elliptical galaxy (NGC 3379, left) and two spiral galaxies (NGC 628 and NGC 891, right).

How much of the universe does it take to make one big galaxy? The critical density of the universe is the over/under point for whether an expanding universe expands forever, or has enough self-gravity to halt the expansion and ultimately recollapse. Numerically, this quantity is ρcrit = 3H02/(8πG), which for H0 = 73 km/s/Mpc works out to 10-29 g/cm3 or 1.5 x 10-7 M/pc3. This is a very small number, but provides the benchmark against which we measure densities in cosmology. The density of any substance X is ΩX = ρXcrit. The stars and gas in galaxies are made of baryons, and we know the baryon density pretty well from Big Bang Nucleosynthesis: Ωb = 0.04. That means the average density of normal matter is very low, only about 4 x 10-31 g/cm3. That’s less than one hydrogen atom per cubic meter – most of space is an excellent vacuum!

This being the case, we need to scoop up a large volume to make a big galaxy. Going through the math, to gather up enough mass to make a 1011 M galaxy, we need a sphere with a radius of 1.6 Mpc. That’s in today’s universe; in the past the universe was denser by (1+z)3, so at z = 10 that’s “only” 140 kpc. Still, modern galaxies are much smaller than that; the effective edge of the disk of the Milky Way is at a radius of about 20 kpc, and most of the baryonic mass is concentrated well inside that: the typical half-light radius of a 1011 M galaxy is around 6 kpc. That’s a long way to collapse.

Monolithic Galaxy Formation

Given this much information, an early concept was monolithic galaxy formation. We have a big ball of gas in the early universe that collapses to form a galaxy. Why and how this got started was fuzzy. But we knew how much mass we needed and the volume it had to come from, so we can consider what happens as the gas collapses to create a galaxy.

Here we hit a big astrophysical reality check. Just how does the gas collapse? It has to dissipate energy to do so, and cool to form stars. Once stars form, they may feed energy back into the surrounding gas, reheating it and potentially preventing the formation of more stars. These processes are nontrivial to compute ab initio, and attempting to do so obsesses much of the community. We don’t agree on how these things work, so they are the knobs theorists can turn to change an answer they don’t like.

Even if we don’t understand star formation in detail, we do observe that stars have formed, and can estimate how many. Moreover, we do understand pretty well how stars evolve once formed. Hence a common approach is to build stellar population models with some prescribed star formation history and see what works. Spiral galaxies like the Milky Way formed a lot of stars in the past, and continue to do so today. To make 5 x 1010 M of stars in 13 Gyr requires an average star formation rate of 4 M/yr. The current measured star formation rate of the Milky Way is estimated to be 2 ± 0.7 M/yr, so the star formation rate has been nearly constant (averaging over stochastic variations) over time, perhaps with a gradual decline. Giant elliptical galaxies, in contrast, are “red and dead”: they have no current star formation and appear to have made most of their stars long ago. Rather than a roughly constant rate of star formation, they peaked early and declined rapidly. The cessation of star formation is also called quenching.

A common way to formulate the star formation rate in galaxies as a whole is the exponential star formation rate, SFR(t) = SFR0 e-t/τ. A spiral galaxy has a low baseline star formation rate SFR0 and a long burn time τ ~ 10 Gyr while an elliptical galaxy has a high initial star formation rate and a short e-folding time like τ ~ 1 Gyr. Many variations on this theme are possible, and are of great interest astronomically, but this basic distinction suffices for our discussion here. From the perspective of the observed mass and stellar populations of local galaxies, the standard picture for a giant elliptical was a large, monolithic island universe that formed the vast majority of its stars early on then quenched with a short e-folding timescale.

Galaxies as Island Universes

The density parameter Ω provides another useful way to think about galaxy formation. As cosmologists, we obsess about the global value of Ω because it determines the expansion history and ultimate fate of the universe. Here it has a more modest application. We can think of the region in the early universe that will ultimately become a galaxy as its own little closed universe. With a density parameter Ω > 1, it is destined to recollapse.

A fun and funny fact of the Friedmann equation is that the matter density parameter Ωm → 1 at early times, so the early universe when galaxies form is matter dominated. It is also very uniform (more on that below). So any subset that is a bit more dense than average will have Ω > 1 just because the average is very close to Ω = 1. We can then treat this region as its own little universe (a “top-hat overdensity”) and use the Friedmann equation to solve for its evolution, as in this sketch:

The expansion of the early universe a(t) (blue line). A locally overdense region may behave as a closed universe, recollapsing in a finite time (red line) to potentially form a galaxy.

That’s great, right? We have a simple, analytic solution derived from first principles that explains how a galaxy forms. We can plug in the numbers to find how long it takes to form our basic, big 1011 M galaxy and… immediately encounter a problem. We need to know how overdense our protogalaxy starts out. Is its effective initial Ωm = 2? 10? What value, at what time? The higher it is, the faster the evolution from initially expanding along with the rest of the universe to decoupling from the Hubble flow to collapsing. We know the math but we still need to know the initial condition.

Annoying Initial Conditions

The initial condition for galaxy formation is observed in the cosmic microwave background (CMB) at z = 1090. Where today’s universe is remarkably lumpy, the early universe is incredibly uniform. It is so smooth that it is homogeneous and isotropic to one part in a hundred thousand. This is annoyingly smooth, in fact. It would help to have some lumps – primordial seeds with Ω > 1 – from which structure can grow. The observed seeds are too tiny; the typical initial amplitude is 10-5 so Ωm = 1.00001. That takes forever to decouple and recollapse; it hasn’t yet had time to happen.

The cosmic microwave background as observed by ESA’s Planck satellite. This is an all-sky picture of the relic radiation field – essentially a snapshot of the universe when it was just a few hundred thousand years old. The variations in color are variations in temperature which correspond to variations in density. These variations are tiny, only about one part in 100,000. The early universe was very uniform; the real picture is a boring blank grayscale. We have to crank the contrast way up to see these minute variations.

We would like to know how the big galaxies of today – enormous agglomerations of stars and gas and dust separated by inconceivably vast distances – came to be. How can this happen starting from such homogeneous initial conditions, where all the mass is equally distributed? Gravity is an attractive force that makes the rich get richer, so it will grow the slight initial differences in density, but it is also weak and slow to act. A basic result in gravitational perturbation theory is that overdensities grow at the same rate the universe expands, which is inversely related to redshift. So if we see tiny fluctuations in density with amplitude 10-5 at z = 1000, they should have only grown by a factor of 1000 and still be small today (10-2 at z = 0). But we see structures of much higher contrast than that. You can’t here from there.

The rich large scale structure we see today is impossible starting from the smooth observed initial conditions. Yet here we are, so we have to do something to goose the process. This is one of the original motivations for invoking cold dark matter (CDM). If there is a substance that does not interact with photons, it can start to clump up early without leaving too large a mark on the relic radiation field. In effect, the initial fluctuations in mass are larger, just in the invisible substance. (That’s not to say the CDM doesn’t leave a mark on the CMB; it does, but it is subtle and entirely another story.) So the idea is that dark matter forms gravitational structures first, and the baryons fall in later to make galaxies.

An illustration of the the linear growth of overdensities. Structure can grow in the dark matter (long dashed lines) with the baryons catching up only after decoupling (short dashed line). In effect, the dark matter gives structure formation a head start, nicely explaining the apparently impossible growth factor. This has been standard picture for what seems like forever (illustration from Schramm 1992).

With the right amount of CDM – and it has to be just the right amount of a dynamically cold form of non-baryonic dark matter (stuff we still don’t know actually exists) – we can explain how the growth factor is 105 since recombination instead of a mere 103. The dark matter got a head start over the stuff we can see; it looks like 105 because the normal matter lagged behind, being entangled with the radiation field in a way the dark matter was not.

This has been the imperative need in structure formation theory for so long that it has become undisputed lore; an element of the belief system so deeply embedded that it is practically impossible to question. I risk getting ahead of the story, but it is important to point out that, like the interpretation of so much of the relevant astrophysical data, this belief assumes that gravity is normal. This assumption dictates the growth rate of structure, which in turn dictates the need to invoke CDM to allow structure to form in the available time. If we drop this assumption, then we have to work out what happens in each and every alternative that we might consider. That definitely gets ahead of the story, so first let’s understand what we should expect in LCDM.

Hierarchical Galaxy formation in LCDM

LCDM predicts some things remarkably well but others not so much. The dark matter is well-behaved, responding only to gravity. Baryons, on the other hand, are messy – one has to worry about hydrodynamics in the gas, star formation, feedback, dust, and probably even magnetic fields. In a nutshell, LCDM simulations are very good at predicting the assembly of dark mass, but converting that into observational predictions relies on our incomplete knowledge of messy astrophysics. We know what the mass should be doing, but we don’t know so well how that translates to what we see. Mass good, light bad.

Starting with the assembly of mass, the first thing we learn is that the story of monolithic galaxy formation outlined above has to be wrong. Early density fluctuations start out tiny, even in dark matter. God didn’t plunk down island universes of galaxy mass then say “let there be galaxies!” The annoying initial conditions mean that little dark matter halos form first. These subsequently merge hierarchically to make ever bigger halos. Rather than top-down monolithic galaxy formation, we have the bottom-up hierarchical formation of dark matter halos.

The hierarchical agglomeration of dark matter halos into ever larger objects is often depicted as a merger tree. Here are four examples from the high resolution Illustris TNG50 simulation (Pillepich et al. 2019; Nelson et al. 2019).

Examples of merger trees from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019). Objects have been selected to have very nearly the same stellar mass at z=0. Mass is built up through a series of mergers. One large dark matter halo today (at top) has many antecedents (small halos at bottom). These merge hierarchically as illustrated by the connecting lines. The size of the symbol is proportional to the halo mass. I have added redshift and the corresponding age of the universe for vanilla LCDM in a more legible font. The color bar illustrates the specific star formation rate: the top row has objects that are still actively star forming like spirals; those in the bottom row are “red and dead” – things that have stopped forming stars, like giant elliptical galaxies. In all cases, there is a lot of merging and a modest rate of growth, with the typical object taking about half a Hubble time (~7 Gyr) to assemble half of its final stellar mass.

The hierarchical assembly of mass is generic in CDM. Indeed, it is one of its most robust predictions. Dark matter halos start small, and grow larger by a succession of many mergers. This gradual agglomeration is slow: note how tiny the dark matter halos at z = 10 are.

Strictly speaking, it isn’t even meaningful to talk about a single galaxy over the span of a Hubble time. It is hard to avoid this mental trap: surely the Milky Way has always been the Milky Way? so one imagines its evolution over time. This is monolithic thinking. Hierarchically, “the galaxy” refers at best to the largest progenitor, the object that traces the left edge of the merger trees above. But the other protogalactic chunks that eventually merge together are as much part of the final galaxy as the progenitor that happens to be largest.

This complicated picture is complicated further by what we can see being stars, not mass. The luminosity we observe forms through a combination of in situ growth (star formation in the largest progenitor) and ex situ growth through merging. There is no reason for some preferred set of protogalaxies to form stars faster than the others (though of course there is some scatter about the mean), so presumably the light traces the mass of stars formed traces the underlying dark mass. Presumably.

That we should see lots of little protogalaxies at high redshift is nicely illustrated by this lookback cone from Yung et al (2022). Here the color and size of each point corresponds to the stellar mass. Massive objects are common at low redshift but become progressively rare at high redshift, petering out at z > 4 and basically absent at z = 10. This realization of the observable stellar mass tracks the assembly of dark mass seen in merger trees.

Fig. 2 from Yung et al. (2022) illustrating what an observer would see looking back through their simulation to high redshift.

This is what we expect to see in LCDM: lots of small protogalaxies at high redshift; the building blocks of later galaxies that had not yet merged. The observation of galaxies much brighter than this at high redshift by JWST poses a fundamental challenge to the paradigm: mass appears not to be subdivided as expected. So it is entirely justifiable that people have been freaking out that what we see are bright galaxies that are apparently already massive. That shouldn’t happen; it wasn’t predicted to happen; how can this be happening?

That’s all background that is assumed knowledge for our ApJ paper, so we’re only now getting to its Figure 1. This combines one of the merger trees above with its stellar mass evolution. The left panel shows the assembly of dark mass; the right pane shows the growth of stellar mass in the largest progenitor. This is what we expect to see in observations.


Fig. 1 from McGaugh et al (2024): A merger tree for a model galaxy from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019, left panel) selected to have M ≈ 9 × 1010 M at z = 0; i.e., the stellar mass of a local L giant elliptical galaxy (Driver et al. 2022). Mass assembles hierarchically, starting from small halos at high redshift (bottom edge) with the largest progenitor traced along the left of edge of the merger tree. The growth of stellar mass of the largest progenitor is shown in the right panel. This example (jagged line) is close to the median (dashed line) of comparable mass objects (Rodriguez-Gomez et al. 2016), and within the range of the scatter (the shaded band shows the 16th – 84th percentiles). A monolithic model that forms at zf = 10 and evolves with an exponentially declining star formation rate with τ = 1 Gyr (purple line) is shown for comparison. The latter model forms most of its stars earlier than occurs in the simulation.

For comparison, we also show the stellar mass growth of a monolithic model for a giant elliptical galaxy. This is the classic picture we had for such galaxies before we realized that galaxy formation had to be hierarchical. This particular monolithic model forms at zf = 10 and follows an exponential star formation rate with τ = 1 Gyr. It is one of the models published by Franck & McGaugh (2017). It is, in fact, the first model I asked Jay to construct when he started the project. Not because we expected it to best describe the data, as it turns out to do, but because the simple exponential model is a touchstone of stellar population modeling. It was a starter model: do this basic thing first to make sure you’re doing it right. We chose τ = 1 Gyr because that was the typical number bandied about for elliptical galaxies, and zf = 10 because that seemed ridiculously early for a massive galaxy to form. At the time we built the model, it was ludicrously early to imagine a massive galaxy would form, from an LCDM perspective. A formation redshift zf = 10 was, less than a decade ago, practically indistinguishable from the beginning of time, so we expected it to provide a limit that the data would not possibly approach.

In a remarkably short period, JWST has transformed z = 10 from inconceivable to run of the mill. I’m not going to go into the data yet – this all-theory post is already a lot – but to offer one spoiler: the data are consistent with this monolithic model. If we want to “fix” LCDM, we have to make the red line into the purple line for enough objects to explain the data. That proves to be challenging. But that’s moving the goalposts; the prediction was that we should see little protogalaxies at high redshift, not massive, monolith-style objects. Just look at the merger trees at z = 10!

Accelerated Structure Formation in MOND

In order to address these issues in MOND, we have to go back to the beginning. What is the evolution of a spherical region (a top-hat overdensity) that might collapse to form a galaxy? How does a spherical region under the influence of MOND evolve within an expanding universe?

The solution to this problem was first found by Felten (1984), who was trying to play the Newtonian cosmology trick in MOND. In conventional dynamics, one can solve the equation of motion for a point on the surface of a uniform sphere that is initially expanding and recover the essence of the Friedmann equation. It was reasonable to check if cosmology might be that simple in MOND. It was not. The appearance of a0 as a physical scale makes the solution scale-dependent: there is no general solution that one can imagine applies to the universe as a whole.

Felten reasonably saw this as a failure. There were, however, some appealing aspects of his solution. For one, there was no such thing as a critical density. All MOND universes would eventually recollapse irrespective of their density (in the absence of the repulsion provided by a cosmological constant). It could take a very long time, which depended on the density, but the ultimate fate was always the same. There was no special value of Ω, and hence no flatness problem. The latter obsessed people at the time, so I’m somewhat surprised that no one seems to have made this connection. Too soon*, I guess.

There it sat for many years, an obscure solution for an obscure theory to which no one gave credence. When I became interested in the problem a decade later, I started methodically checking all the classic results. I was surprised to find how many things we needed dark matter to explain were just as well (or better) explained by MOND. My exact quote was “surprised the bejeepers out of us.” So, what about galaxy formation?

I started with the top-hat overdensity, and had the epiphany that Felten had already obtained the solution. He had been trying to solve all of cosmology, which didn’t work. But he had solved the evolution of a spherical region that starts out expanding with the rest of the universe but subsequently collapses under the influence of MOND. The overdensity didn’t need to be large, it just needed to be in the low acceleration regime. Something like the red cycloidal line in the second plot above could happen in a finite time. But how much?

The solution depends on scale and needs to be solved numerically. I am not the greatest programmer, and I had a lot else on my plate at the time. I was in no rush, as I figured I was the only one working on it. This is usually a good assumption with MOND, but not in this case. Bob Sanders had had the same epiphany around the same time, which I discovered when I received his manuscript to referee. So all credit is due to Bob: he said these things first.

First, he noted that galaxy formation in MOND is still hierarchical. Small things form first. Crudely speaking, structure formation is very similar to the conventional case, but now the goose comes from the change in the force law rather than extra dark mass. MOND is nonlinear, so the whole process gets accelerated. To compare with the linear growth of CDM:

A sketch of how structures grow over time under the influence of cold dark matter (left, from Schramm 1992, same as above) and MOND (right, from Sanders & McGaugh 2002; see also this further discussion and previous post). The slow linear growth of CDM (long-dashed line, left panel) is replaced by a rapid, nonlinear growth in MOND (solid lines at right; numbers correspond to different scales). Nonlinear growth moderates after cosmic expansion begins to accelerate (dashed vertical line in right panel).

The net effect is the same. A cosmic web of large scale structure emerges. They look qualitatively similar, but everything happens faster in MOND. This is why observations have persistently revealed structures that are more massive and were in place earlier than expected in contemporaneous LCDM models.

Simulated structure formation in ΛCDM (top) and MOND (bottom) showing the more rapid emergence of similar structures in MOND (note the redshift of each panel). From McGaugh (2015).

In MOND, small objects like globular clusters form first, but galaxies of a range of masses all collapse on a relatively short cosmic timescale. How short? Let’s consider our typical 1011 M galaxy. Solving Felten’s equation for the evolution of a sphere numerically, peak expansion is reached after 300 Myr and collapse happens in a similar time. The whole galaxy is in place speedy quick, and the initial conditions don’t really matter: a uniform, initially expanding sphere in the low acceleration regime will behave this way. From our distant vantage point thirteen billion years later, the whole process looks almost monolithic (the purple line above) even though it is a chaotic hierarchical mess for the first few hundred million years (z > 14). In particular, it is easy to form half of the stellar mass early on: the mass is already assembled.

The evolution of a 1011 M sphere that starts out expanding with the universe but decouples and collapses under the influence of MOND (dotted line). It reaches maximum expansion after 300 Myr and recollapses in a similar time, so the entire object is in place after 600 Myr. (A version of this plot with a logarithmic time axis appears as Fig. 2 in our paper.) The inset shows the evolution of smaller shells within such an object (Fig. 2 from Sanders 2008). The inner regions collapse first followed by outer shells. These oscillate and cross, mixing and ultimately forming a reasonable size galaxy – see Sanders’s Table 1 and also his Fig. 4 for the collapse times for objects of other masses. These early results are corroborated by Eappen et al. (2022), who further demonstrate that the details of feedback are not important in MOND, unlike LCDM.

This is what JWST sees: galaxies that are already massive when the universe is just half a billion years old. I’m sure I should say more but I’m exhausted now and you may be too, so I’m gonna stop here by noting that in 1998, when Bob Sanders predicted that “Objects of galaxy mass are the first virialized objects to form (by z=10),” the contemporaneous prediction of LCDM was that “present-day disc [galaxies] were assembled recently (at z<=1)” and “there is nothing above redshift 7.” One of these predictions has been realized. It is rare in science that such a clear a priori prediction comes true, let alone one that seemed so unreasonable at the time, and which took a quarter century to corroborate.


*I am not quite this old: I was still an undergraduate in 1984. I hadn’t even decided to be an astronomer at that point; I certainly hadn’t started following the literature. The first time I heard of MOND was in a graduate course taught by Doug Richstone in 1988. He only mentioned it in passing while talking about dark matter, writing the equation on the board and saying maybe it could be this. I recall staring at it for a long few seconds, then shaking my head and muttering “no way.” I then completely forgot about it, not thinking about it again until it came up in our data for low surface brightness galaxies. I expect most other professionals have the same initial reaction, which is fair. The test of character comes when it crops up in their data, as it is doing now for the high redshift galaxy community.