The local missing baryon problem

Last time, we started talking about the data in the recent paper The Baryonic Mass-Halo Mass Relation of Extragalactic Systems. Here, we’ll put on our dark matter hat, and use the data to make an accounting of the mass – both the dark matter and the baryons in all their various forms. From this conventional perspective we will obtain a method for relating what we see to what we don’t. In the context of LCDM cosmology, this provides an alternative approach to abundance matching. It also provides a test: are the two consistent?

The conventional picture we have in mind is a baryonic galaxy residing in a dark matter halo bathed in a background of intergalactic matter.

**Fig. 1** of McGaugh et al. (2026): Conceptual elements of a galaxy: the stars (yellow/blue) and atomic gas (green) of NGC 6946 (Spitzer 3.6µ and 21 cm data: F. Walter et al. 2008) are shown embedded in an extended dark matter halo (black). The dark matter density decreases continuously with radius so the halo has no hard edge, but for convenience we adopt the common convention that the radius r₂₀₀ marks the boundary of the dark matter halo and the dividing line between the circumgalactic medium (CGM) and the intergalactic medium (IGM; orange). The stars and atomic gas illustrated here appear within r < 20 kpc while r₂₀₀ ≈ 220 kpc (not shown to scale).

I’ve talked here about the stars and gas a lot because that’s what we see. These are the essential components that define a galaxy and comprise the mass that correlates with rotation velocity to make the baryonic Tully-Fisher relation (BTFR). I’ve talked a bit about the stuff between the galaxies, the intergalactic medium (IGM), but I don’t think I’ve previously had cause to talk much about the circumgalactic medium (CGM). As the name implies, this is gas in the vicinity of a galaxy, but not in the galaxy itself – at least not the part we can readily see. In the notional picture above, the distinction between the CGM and the IGM is the boundary of the dark matter halo that nominally demarcates gravitationally bound from unbound material.

Notional is doing a lot of work here. There’s a lot of gas in the IGM, and some of it is certainly in the vicinity of galaxies, so in that regard counts as circum-galactic. But there’s no hard and fast distinction between these components just as there’s no hard edge to a dark matter halo. Our brains don’t like that, so we impose notional boundaries and proceed as if these are meaningful.

Proceeding thus, we expect our dark matter halo* to contain its fair share of the cosmic baryon fraction, f_b = M_b/M₂₀₀ = 0.157 according to the Planck flavor of LCDM cosmology. We can test this by adding up all the baryons and comparing that to the total mass enclosed by r₂₀₀. This is straightforward for the stars and gas we see, but not for the stuff we don’t see – both dark matter and the gas in the CGM.

There are some measurements of the CGM, but these tend to be statistical in nature (if we stack data for a bunch of galaxies, we sorta see something), not the precise, individual, galaxy-by-galaxy measurements that we have for the stars and atomic gas. The stars and atomic gas are the mass in the extended Tully-Fisher relations we discussed previously, and are the bulk of the normal material in the galaxies we see. The bulk of the CGM lies at much larger radii, beyond the stars and atomic gas, but within the notional edge of the dark matter halo, as depicted above. Since we don’t measure it directly in individual galaxies, we’re gonna leave the mass of the CGM as an open question rather than something to be included in the sum of known baryonic mass.

The situation is even murkier for the dark matter, which we don’t see at all, so we don’t have a good way to measure the “total” mass of dark matter halos. This isn’t even a well-defined quantity in principle since halos are not expected to have a hard edge. Conventionally, we adopt the mass within a radius that contains a density two hundred times the cosmic critical density, r₂₀₀, as the notional edge. There are obscure historical reasons for this choice that I do not have the patience to describe. One could make other choices, arguably better choices, but r₂₀₀ is the most common choice used in the literature so we’ll stick with it here. The halo mass is the mass enclosed by this radius, M₂₀₀. If one goes through the math, it turns out that the circular speed of a test particle, V₂₀₀, orbiting at r₂₀₀ scales with the Hubble parameter [h = H₀/(100 km/s/Mpc)] such that V₂₀₀ = h r₂₀₀ when V₂₀₀ is in km/s and r₂₀₀ is in kpc. The dynamical mass (rV²/G) can then be written

M_{200} = (3.3 \times 10^5\;\mathrm{M}_{\odot}\,\mathrm{km}^{-3}\,\mathrm{s}^3)\,V_{200}^3.

That is a lot of huffing and puffing to get a way to relate the halo mass to something we can (kinda sorta) measure. The flat rotation velocity V_f has always been taken as the signature of the dark matter halo. One therefore expects V₂₀₀ ~ V_f. Indeed, these quantities cannot differ by much if dark matter is what explains flat rotation curves. However, the notional radius of the dark matter halo where V₂₀₀ occurs is much larger, by roughly an order of magnitude or more, than the radius where V_f is measured. So they need not be identical, depending on the halo model. So to relate what we measure to what we’d like to know we define a little ol’ fudge factor, f_v, such that:

V_f = f_v V_{200}

If a rotation curve stays flat indefinitely (as our empirical experience suggests), f_v = 1. If instead dark matter halos behave as they should in LCDM, then the rotation speed should gradually decline as we approach the halo’s edge so that f_v > 1. How much greater?

One way to estimate the fudge factor f_v is to fit dark matter halo models to data. This process does not directly measure V₂₀₀, but it does provide an estimate of that quantity based on the data available a smaller radii. One can do this for as many halo models as one has the patience to consider. For example, here are the results for two common halo models, the traditional pseudo-isothermal halo first adopted to explain flat rotation curves and the CDM-expected NFW halo:

**Figure 2** from McGaugh et al. (2026): The observed flat velocity V_f as it relates to the fitted V₂₀₀ for pseudo-isothermal (left panel) and NFW (right panel) halos (Li et al. 2020). Filled points have formal uncertainties <20% in V₂₀₀; open points are less accurate fits. The solid line shows V_f = V₂₀₀. The gray line in the right panel shows Equation (2a) of Katz et al. (2019), which corresponds roughly to f_v ≈ 1.4.

The result for pseudo-isothermal halos is consistent with f_v = 1, as expected – this model was adopted to make flat rotation curves. There is nevertheless some scatter. This typically happens because the observed rotation is not observed to be flat over a large enough range of radii to enforce flatness further out (as often happens in dwarf galaxies) or because the stars account for so much of the mass over the observed range that the inferred dark matter component is still rising (as often happens in bright, high surface brightness galaxies). This sort of haziness is inevitable when one only measures the inner few percent of the notional virial radius.

The result for NFW halos is approximately f_v = 1.4, albeit with a lot more scatter. This happens for the same reasons as above, with the additional problem that the dark matter profile in real galaxies rarely looks like NFW. Of all the many halo models considered by Li et al. (2020), NFW consistently performs the worst. One is forcing a fit of a function that would rather not. One signature of this misfit is the occurrence of very large V₂₀₀ for dwarf galaxies with small V_f. Taken literally, this would mean that some of the smallest dwarf galaxies reside in dark matter halos that outweigh those of giants like the Milky Way. This seems absurd, and it is. For example, by this approach, the dwarf galaxy NGC 3109 residing just outside the Local Group outweighs the Local Group and both its giants, Andromeda and the Milky Way, put together. But it is pretty clear from the local velocity field that the entire Local Group is not orbiting this little dwarf.

The estimation of huge V₂₀₀ for galaxies with small V_f happens because of the cusp-core problem. The density cusp predicted by NFW expects a curved shape for the inner rotation curve while the data show a more gradual, quasi-linear rise. Any decent fitting program will realize that it can make a curve look like a straight line if it stretches it out enough, so it does exactly this by making the halo very large. That sorta fits the data, but it makes no physical sense. Between this systematic effect and the large scatter induced by the other effects discussed above, one is better off inferring V₂₀₀ from V_f with a fixed fudge factor. So we’ll do that, leaving the exact value of f_v as an open question, but noting that for most objects it almost certainly resides in the narrow range

1 \le f_v \le 1.4.

That’s a lot of words to say the observed flat rotation speed gives us our best kinematic estimator or the dark matter halo mass. In this context, bear in mind the small scatter in the extended Tully-Fisher relations. This contrasts with the large scatter seen in the fits above. This strongly implies that V_f is more closely tied to the underlying mass^{^} than are the model-specific halo fits to the entire rotation curve. That might seem counterintuitive given that V_f is only a portion of the rotation curve (albeit a well-defined portion). However, it makes more sense when one considers that rotation curve fits must consider the contribution of stars as well as dark matter. Since the stellar mass-to-light ratio is never perfectly known, there is a degeneracy between the two that contributes to the scatter seen above. That variation is not real, it’s just an artifact of the fitting procedure. But when we get to large radii, beyond the confounding effects of the stellar population, the signature of the dominant mass becomes apparent in the flat rotation speed.

We saw above that we expect the halo mass M₂₀₀ to correlate with V₂₀₀. We observe that baryonic mass M_b correlates with the flat rotation velocity V_f. The natural assumption is that the stuff we see is proportional to the total (mostly dark) mass while the observed flat velocity is a property of the halo. Hence M_b ~ M₂₀₀ and V_f ~ V₂₀₀. This simple argument has been the basis for many papers claiming to explain the Tully-Fisher relation over the course of many years. This would be entirely satisfactory if it weren’t so completely wrong.

Here we need to introduce another fudge factor, m_b, that relates the mass we see to the halo that spawned each galaxy:

M_b = m_b\,M_{200}

The obvious assumption is that m_b is a constant for all galaxies, in which case Tully-Fisher follows because M_b ~ M₂₀₀ ~ V₂₀₀³ and V₂₀₀ ~ V_f. The wee problem is that this predicts a Tully-Fisher relation with slope 3: M_b ~ V_f³ when we observe one with slope 4: M_b ~ V_f⁴. In order to reconcile these two, our new fudge factor cannot be a constant. Worse, we need to fine tune it to transform the predicted power law into the observed one: m_b ~ V_f. That… doesn’t make any sense.

We can refrain from thinking and plunge ahead to simply plot the baryon fraction. While we’re at it, let’s also plot the stellar mass fraction m_* = M_*/M₂₀₀ because that is more commonly discussed in the literature. (Often stellar masses are available for galaxies without the corresponding gas mass measurements.) These fractions have to be increasing functions of circular velocity, or equivalently, mass (m_b ~ V_f ~ M_b^1/4):

**Figure 4** from McGaugh et al. (2026): The stellar mass fraction as a function of stellar mass (top) and the baryonic mass fraction as a function of baryonic mass (bottom). Data and symbols as in Figure 3 with the additional distinction that large squares in the top panel represent the sum of the stellar mass of all galaxies in a group or cluster while small squares are the stellar mass of the brightest galaxy only. The horizontal line is the cosmic baryon fraction f_b = 0.157 (Planck Collaboration et al. 2020). The colored lines in the top panels show the stellar mass–halo mass relations from abundance matching given by B. P. Moster et al. (2013; dashed–dotted green line), P. S. Behroozi et al. (2013; dashed–triple dotted pink line), and A. V. Kravtsov et al. (2018; red dashed line). The black line in the lower panel is m_b = f_b tanh(M_b/M₀)^1/4 where f_b is the cosmic baryon fraction (0.157) and M₀ = 5 x 10¹³ M_☉.

To be specific, I’ve computed the halo mass assuming f_v = 1. Different assumptions just slide the data up and down; the trend persists. This is discussed more in the paper if you’re interested in such details.

This gives a nifty way to relate what we can see to what we can’t. There’s a simple formula:

m_b = f_b \tanh\left(\frac{M_b}{M_0}\right)^{1/4}

where f_b = 0.157 is the cosmic baryon fraction and and M₀ = 5 x 10¹³ M_☉ is the scale where the function bends, transitioning from the M_b ~ V_f⁴ of the BTFR that holds over most of the mass range to the m_b = f_b of rich galaxy clusters. The precise value of the turnover mass is not well constrained, as it happens in the one place that is not well sampled by the available data. Indeed, there is nothing special about the functional form; it is simply a choice that transitions nicely from one regime to the other. There’s no physics in it^&. Still, this is a useful way to estimate the halo mass of pretty much any extragalactic object just by summing up its observed baryonic mass.

Indeed, this kinematic mass-matching relation is better than the widely used abundance matching relations in that it has less scatter. Abundance matching generally relies on stellar mass; that results in more scatter for the same reasons discussed for Tully-Fisher. This is particularly apparent at the low mass end of the top panel above, where galaxies of the same circular velocity (halo mass) have very different stellar masses. This goes away when baryonic mass is used instead.

There is reasonable agreement between abundance matching and kinematics at intermediate masses. The lines representing various abundance matching relations parallel the kinematic data. The offsets that are apparent can be cured by an appropriate choice of f_v. Always a free parameter to the rescue there is.

At the high mass end, things go amiss again. Partly this is because abundance matching relations reference the stellar mass of the “central” galaxy. The picture is that each halo contains one central galaxy with many satellite galaxies in subhalos, so what matters is the stellar mass of the central. This is overly simplistic: galaxy clusters are messy, the brightest galaxy isn’t necessarily at the center, and most have substructure with multiple groups rather than a single hierarchy. Besides that, the stellar mass tells you little about the halo mass without further environmental context: a galaxy with M_* ~ 4 x 10¹¹ M_☉ could reside in halo masses spanning a couple of orders of magnitude.

Setting aside the issue of centrals, there is a serious tension for individual high mass galaxies. The stellar mass fraction suggested by kinematics keeps going up where that of abundance matching turns over. This is due to the linearity of the Tully-Fisher relation compared to the knee in the Schechter function shape of the stellar mass function. The two don’t match up, as discussed previously. This same tension has long been with us; in the ’90s we were concerned with the difference between “the luminosity function normalization” and “the Tully-Fisher normalization.” This tension never went away. Still, the tension between abundance matching and kinematics doesn’t seem tragic, and might be remedied with some appropriate finagling of both the baryon fraction and the velocity fudge factor.

But where are all the baryons? They’re all accounted for in clusters, which reach the cosmic baryon fraction. But in no other system is the checksum complete. There is a missing baryon problem locally in each and every dark matter halo below the cluster scale. To confound matters further, there is a fine-tuning problem: the amount of missing baryons scales precisely with the amount of observed baryons.

The logarithmic plot above may understate the magnitude of the problem. To clarify this, we can plot the ratio of missing-to-observed baryons on a linear scale, at least in part:

**Figure 7** from McGaugh et al. (2026): The ratio of missing-to-observed baryonic mass as a function of baryonic mass. Data and symbols are the same as above. The ratio is linear in the bottom half of the diagram, then switches to logarithmic in the top half. Spiral galaxies are shown twice: once with f_v = 1.0 (solid blue circles) and again with f_v = 1.4 (small open circles). The Milky Way is the yellow point at the top of the gray band, which shows the range from zero CGM to that required to explain all of the locally missing baryons when f_v = 1. Stars represent the CGM measurements of Milky Way–mass galaxies by Miller & Bregman (2015), Bregman et al. (2022), and Zhang et al. (2026) from bottom to top. These suffice to explain the missing baryons provided that f_v ≈ 1.4. This explanation becomes progressively less plausible for lower mass galaxies.

The scatter blows up when we plot linear ratios; this is an artifact of error propagation. Nevertheless, it is helpful to see that the local missing baryon problem is not subtle. It is already a factor of ~2 for groups and ~3 for bright galaxies. It’s not as if we’ve misplaced a few percent of the baryons. Most of the baryons that should be associated with galaxy dark matter halos are not in evidence.

This problem has been known for a while, but doesn’t seem to be acknowledged to be a problem. Not all baryons need condense down into the central galaxy; some might be left behind, still mixed in with the dark matter halo. The widespread assumption seems to be that the missing baryons are probably in the CGM.

Accounting for the missing baryons with gas in the CGM almost works in bright galaxies like the Milky Way where we need “only” a factor of a few. Recent estimates suggest that the CGM is comparable in mass to the stars, or even somewhat more. These are very uncertain, as this mass is dispersed in diffuse gas over an enormous volume, and the total mass estimates often involve large extrapolations: the CGM is detected most readily nearby the central galaxy, but most of its implied mass is way far out near r₂₀₀. Accepting these estimates at face value leads to the star symbols in the plot above. This makes the checksum complete provided the halo is not too massive, as happens if f_v ≈ 1.4. This is what we expect for NFW halos, so it might work out if those were viable. However, there is a bigger issue.

The local missing baryon problem gets progressively worse for lower mass galaxies. For 10¹⁰ M_☉ galaxies – not all that much smaller than the Milky Way (M_b = 7 x 10¹⁰ M_☉), the problem isn’t a factor of two or three: there are ~6 baryons missing for every one that is observed. For 10⁹ M_☉ galaxies, the deficit is an order of magnitude. For even lower mass galaxies, the difference is so large we have to abandon the linear plot lest the interesting parts for bright galaxies get scrunched into invisibility. By the time we get to small dwarf galaxies of 10⁶ M_☉, the ratio of missing-to-observed baryons approaches 100:1. It is not plausible to imagine that the CGM of dwarf galaxies explains this deficit. (And yes, we’ve looked.)

A common explanation for this variation is that low mass dark matter halos have shallower potential wells, so have a harder time holding onto their baryons. Supernova can drive material out of galaxies; these go off with the same energy regardless of the galaxy they’re in so they may be more effective at blowing baryons out of lower mass systems. There is sufficient energy (IF properly^% distributed) to completely unbind the baryons, so they might wind up in the IGM, defeating any hope of completing the checksum. This is the sort of argument that sounds clever but fails to address the real problem. The difficulty isn’t just ridding ourselves of these meddlesome baryons, it is getting rid of exactly the right amount each and every time.

As awkward as it is to realize that most of the baryons that should be in low mass halos are not in evidence, it is not difficult to imagine ways in which this might happen, like the aforementioned supernova-driven galactic winds. The more dire aspect of the problem is the fine-tuning. Galaxies of the same observed baryonic mass are always missing the same amount of baryons, whether that’s a factor of 2 or 10 or 100. If the visible parts of a dwarf galaxy are only 1% of the available baryons, you’d expect a lot of scatter. Sometimes a halo of that mass might have 2% or even 3% of its baryons condense to the parts we see. That would show up in the scatter in a way it does not: galaxies of the same circular velocity (halo mass) have the same baryonic mass every time. They don’t vary by factors of two (or more). So while we can build models that makes the baryon fraction just so, the fact that we can write a simple equation for it with practically zero scatter is profoundly uncomfortable.

An extra bit of weirdness is that in LCDM, galaxies are built hierarchically by merging small objects into large ones. This poses a teleological problem. Consider a small halo at high redshift. If it remains alone, then it it will contain a dwarf galaxy at low redshift that has a low baryon fraction. But if it mergers into a larger system, then by the current time that larger system has to have a larger baryon fraction. In effect, a low mass halo has to know where it will end up some billions of years in the future. Will it remain alone and unmerged? Better blow out all those baryons! Will it merge into a larger system? Better hang on to the right amount of baryons. Does that system merge into a still larger object? Hope it held onto even more baryons, in exactly the right amount at every step along dozens of mergers.

I can imagine all this happening in a stochastic fashion with the net result being that more massive systems wind up with a higher baryon fraction, at least on average. I cannot give credence to this process resulting in the small observed scatter. As people are always telling me, “galaxies are complicated.” Indeed, they should be – in LCDM. But in reality they’re not! They obey simple scaling laws, laws that do not follow naturally from LCDM.

The local missing baryon problem encapsulates one of the fine-tuning problems that has never been satisfactorily explained. This alone would be considered fatal for most theories. For LCDM, it is just another problem to be addressed through the eternal tweaking of models and simulations.

*Strictly speaking, M₂₀₀ refers to all mass within r₂₀₀, baryons as well as dark matter. I’m going to call it halo mass anyway, because that’s what we mean, the baryons are a small fraction of the total, and because that’s what everybody does in the literature. If we make some other choice for the definition of the mass of the halo, M_Δ, then the inferred baryon fraction of an objects scales by M₂₀₀/M_Δ. The cosmic baryon fraction does not care what choice we make, so the implicit assumption is that one asymptotes to the cosmic fraction if one gets far enough out, irrespective of what r_Δ we adopt. While this is a sensible assumption – individual objects must merge into the larger cosmos at some point – there is no guarantee that the universe cooperates. For example, the baryon fraction in galaxies declines with increasing radius, but that in galaxy clusters increases with radius. I’ve seen hints that it doesn’t really settle down to the cosmic (or any particular) value. These are only hints – considerable extrapolation is involved – so we’ll ignore this inconvenience and assume that the baryon fractions of individual objects do in fact converge to the cosmic value far enough out.

^{^}It makes the most sense if the underlying total mass is the observed baryonic mass.

^&I made a very similar fit in McGaugh et al. (2010) but didn’t publish it because there was no physics in it. Since then the field has been awash in abundance matching relations that were similarly fit sans physics. There has been much ink spilled justifying it post-facto with feedback, but I have refrained from this exercise in intellectual onanism.

^%It is common to assume in simulations that a large fraction (50 – 100%) of the energy from supernovae is returned to the surrounding gas. This process is not resolved in cosmological simulations, all the energy return happens as part of the “subgrid” physics, so the feedback efficiency is set, in practice, to make things work out as well as possible.

Observationally, most of the SN energy finds its way out along the path of least resistance where the density of the surrounding gas is smallest (“chimneys”). This process couples to the surrounding gas with only a few percent efficiency.

Yep, it’s a religion

I have been concerned for years that dark matter was morphing from legitimate science into a cold, dark religion. I have been reluctant to put it that way, because there are lots of scientists who work on dark matter that have not fallen entirely down that rabbit hole and who continue to make valuable contributions working in that context. But a recent experience reminded me that my concerns were not misplaced, and there are plenty of scientists who have fallen irredeemably down this rabbit hole. No matter what answer the future holds to be correct, many current scientists will have gone to their graves in denial of it.

Where is the boundary between science and religion? It is hard to assess where the borderline is. But it is easy to see when people are far over the line – so far over that it doesn’t really matter where exactly the line is. One can attend any conference on the subject to find people who unabashedly assert that dark matter exists without question. Not just that acceleration discrepancies have been amply demonstrated empirically, but that the only possible interpretation is dark matter. If asked whether this invisible mass is in the room with us now, they will enthusiastically^# answer yes! Since dark matter has not been detected in the laboratory, this assertion is an expression of faith – the hallmark of religion – not of an established scientific fact. What we have established is that there are discrepancies between what we see and what we get when we assume Newtonian gravity (or GR, if needed). What we don’t know is whether the cause of these discrepancies is some form of invisible mass (dark matter) or if the equations we employ are inadequate (modified gravity [or more generally, dynamics]).

Indeed, these days many people will assert that dark matter has already been detected, usually citing astronomical evidence that used to be considered too feeble to merit a Nobel prize. Funny how repeating a mantra long enough morphs an aspiration into accepted reality. Modern physics is not providing a strong falsification of the supposition that science is a social construct.

A prominent example of an observation of the sky that is frequently cited as absolutely requiring cold dark matter is the acoustic power spectrum of the cosmic microwave background. Quoting clayton from a few years ago:

the primary reason to believe in the phenomenon of cold dark matter is the very high precision with which we measure the CMB power spectrum, especially modes beyond the second acoustic peak. There is a stone-cold, qualitative, crystal clear prediction of CDM about the relative sizes of the second and third peaks that modified gravity profoundly and irredeemably gets wrong: it thinks the third peak should be relatively larger* than the second… whereas CDM thinks they should be about the same

I would accept that this were conclusive proof of dark matter if this were the unique prediction of dark matter: that there was no other way to do it, so all other approaches were indeed irredeemable. (Quite the strong language, eh?) The problem is that CDM is not the one unique was to fit these data. Skordis & Zlosnik showed that it is possible to write a modified gravity theory that also fits the CMB data:

*CMB power spectrum observed by Planck fit by AeST (Skordis & Zlosnik 2021).*

This does not prove the AeST theory of Skordis & Zlosnik is correct, but it does demonstrate that it is possible to write a modified gravity theory that does indeed do what it is frequently asserted to be impossible for a modified gravity theory to do. I’ve heard of a couple of other theories that can also do this (the relativistic Khronon theory of Blanchet and nonlocal MOND as discussed by Deffayet & Woodard), so clearly this success is not uniquely limited to cold dark matter, or even a particular modified gravity theory. The work of Skordis & Zlosnik (2021) was known and in the literature before clayton made the assertion above in late 2022, so either he wasn’t paying attention (likely) or is convinced that it is impossible so doesn’t even consider the possibility (also likely). The former just says we’re all too busy, but the latter is a mark of religious thinking: my god is the only god, thou shalt have no other hypotheses before^& me.

Many people are very impressed with the quality of the LCDM fit to the CMB. That is indeed very good, but there are enough free parameters that we were going to get a fit to any physically plausible power spectrum. If not, we’ve never been shy about making up new parameters. (Evolving dark energy, anyone? How about a running power spectrum? There’s a whole bag of possibilities!) What I’ve been more impressed with is the consistency of the fit to the CMB data with the many independent constraints on conventional cosmology. Or at least it was, until it wasn’t.

The Hubble tension has gotten steadily worse (in terms of statistical significance), and it really does not look like local measurements are to blame, nor is it the only tension. People seem to miss that it is the CMB-fitted value of the Hubble constant that has evolved over time to spoil the concordance that got us to believe in LCDM in the first place. But if the CMB is the cornerstone of your religion, all other data must inevitably be at fault and can be ignored: there is an entire community of cosmologists who choose to believe the best-fit Planck cosmology to the exclusion of all other data. It’s like the bad old days of the Hubble tension all over again, with the physics community choosing to believe the lower value of H₀ because it makes more sense for the aspects of cosmology that they care about while those in the astronomical community who actually measure H₀ find a persistently higher value.

A real tension in LCDM implies the need for new physics of the unknown variety. One doesn’t want to go there if it can be helped. I didn’t consider MOND until I was already concerned for the viability of dark matter. There are real problems for the paradigm that its more intense advocates simply deny, brush aside without real thought, or choose to remain ignorant of. When they are confronted with a problem, they are pretty creative about making stuff up on the spot. Anything to avoid having to confront the unspeakable – another hallmark of religion.

For example, cold dark matter is scale free. That’s foundational to the hypothesis. So the existence of an acceleration scale in the kinematic data is anathema to CDM. When I first pointed this contradiction out, there were a variety of assertions to the effect of “does too!” One example is provided by Kaplinghat & Turner, who claim to show “how Milgrom’s law comes about in the cold dark matter theory of structure formation.” That would, indeed, be ideal, and is a requirement for any theory to be successful.

Wee problem: they demonstrat no such thing. CDM is scale free, yet K&T claim that it explains Milgrom’s Law, which is predicated on the existence of an acceleration scale. Well, which is it? Is CDM scale free? Or does it explains the acceleration scale? We can’t have it both ways: their very premise is self-contradictory. It is absurd on its face.

The acceleration scale is defined by baryons, for which K&T have no model. To connect baryons with dark matter, they make a hand-waving argument about galaxies reaching a₀ at the edge of their disks. This is not even a concept of a model and does not begin to suffice as an explanation for many reasons, a prominent one being that low surface brightness galaxies have accelerations less than a₀ everywhere:

Centripetal acceleration curves color coded by galaxy surface brightness. Low surface brightness galaxies (blue colors) have low (sub-a₀) accelerations everywhere: there is no edge at which they reach a₀. (Adapted from McGaugh 2020.)

Milgrom pointed out this and many other shortcomings of their scenario, so I feel no need to elaborate further. Milgrom eviscerated their paper so thoroughly that the proper course of action would have been to retract it. Instead, they simply never acknowledge the criticism, and persist to this day in pushing it as some sort of valid scientific explanation. It is not; it does not withstand even mild critical scrutiny. But it doesn’t need to: it reassures the faithful that all is well. They hear what they want to hear without questioning its veracity. That’s another hallmark of religion.

I have refrained from saying these things in the past because I’m too nice. For example, a few years ago I started then abandoned the draft text below, which I simply cut & paste:

One of the things that attracted me to a career in science is the notion of objectivity. I grew up for a time in the bible belt, where people earnestly believed things that were obviously untrue, even to the eyes of a small child. On the occasions that I had the temerity to point out the obvious, the contradictions posed by facts never had an impact on their belief system. Rather, it inevitably earned me a warning that I was going to hell. No few of these people seemed to think it was their religious duty to send me there prematurely, or at least to make life on Earth a living hell.

Scientists eschew such behavior, but are also human, so often engage in it anyway. I’ve encountered it a lot. I get it; I went through the same denial, grief, and anger over the prospect of losing my good friend cold dark matter. The stages of grief never brought something back from the dead, but it has engendered a lot of blame-the-messenger.

Here’s an example, from a review by Mike Turner:

There is a lot of misinformation packed into this short paragraph.

The first clue is right there at the beginning, in red: the heading “False starts.” This is false framing, a classic tool of propagandists. It starts from the outset by asserting that the topic to be discussed is wrong at a level of knowledge so common it requires no justification. This is not the way one starts an objective discussion, much less a scientific one.

Turner then misconstrues what Milgrom did. He didn’t notice the scale a₀ in the data, for which there was scant evidence at the time. Rather, Milgrom made the obvious statement that the inference of dark matter relied on the assumption that dynamics, as encapsulated by the laws of inertia and gravity, is the same on the very different scales of galaxies as in the solar system where they were established, so we ought to consider if dynamics might change in some way. He quickly excluded a size dependence as a possibility. How he settled on acceleration is beyond the scope of this post, and not for me to say. Neither is it for Turner to say.

After a brief and incomplete description of what MOND is, Turner allows that “this one-parameter model fits all the rotation-curve data”. Even in making this admission, he chooses to call it a model rather than a theory. A model is something specific you build in the context of a theory, like a halo model in CDM. MOND is more than that.

Turner quickly moves on without contemplating any meaning that rotation curves might hold. Let’s pause to consider that.

First, I would not say that MOND fits all the rotation curve data. It fits most galaxies, but there are a minority of weird cases that are not well fit. The weird cases inevitably don’t make sense in terms of dark matter either, so on the whole I interpret this to be the usual price of dealing with astronomical data – some of it is just goofy. Setting such cases aside, I can and have fit the same data with all sorts of dark matter halo models. MOND requires fewer parameters, which is important, but the difference isn’t in the fitting. The difference is in predictive ability. I can use MOND to predict the dynamics of galaxies a priori, and have done so many times. I cannot use any flavor of dark matter theory to do the same, and it’s not for lack of trying.

The predictive power of MOND must be telling us something, even if it is something about the nature of dark matter or the process of galaxy formation. There are many papers written on this, some deep and profound, others absurd and banal. Turner cites none of them, nor displays any awareness that such work exists. I would venture to guess that is because acknowledging such work would imply that there is something to debate here, something he would apparently rather not admit.

That’s where I left off. It’s exhausting deciphering other people’s false assertions. Moreover, I just don’t like criticizing other people, no matter how richly they deserve it. (Turner has never refrained from criticizing me in ad hominem terms: on one occasion^$ he showed my picture to an audience and called me “the enemy.”) A large segment of the particle physics and cosmology community appears to think this way, and has succumbed to a scientific version of bible thumping in which you can assert any absurd thing so long as it falls within the framework of the holy LCDM. They really need to find something better to do.

I had hoped we were past this, but I heard a talk last week that was exactly in this mode. To paraphrase, the talk went

We’re sure dark matter exists. We have been sure about it for decades. In that time, we have been repeatedly proven wrong about what it is. Rather than re-think our paradigm in the face of these repeated failures, we double down yet again on the existence of this invisible, undetected mass, asserting aggressively^% that it must be true while eliding or misrepresenting the evidence that it is not. This enables us to make up a whole lot of exciting new possibilities for what the dark matter might be and conceive of ever more grandiose experiments to continue not to detect it. You must believe in dark matter!

This was not a science talk so much as an indoctrination session. It was as if I had stumbled into a revivalist tent where some hothead was preaching to the choir. This is the kind of talk that misled an entire generation into wasting their careers at the bottom of a mine shaft searching for WIMPs. At least WIMPs were a well-motivated hypothesis; this kind of talk could lead a new generation down an even greater variety of garden paths.

I am well aware that I might fall prey to this attitude myself. That’s why I set criteria by which I would change my mind: detect dark matter already, or at least provide a satisfactory explanation as to how MOND comes about. Neither of those criteria have been met. There are claims to do the latter, but so far these are just variations on models I tried and found to fail long ago. If I thought these could work, I would have said so. At the same time, I don’t see any dark matter advocates taking up the challenge to specify what would change their minds. When I ask them what could falsify dark matter, I get dumbfounded looks – the deer-in-the-headlight face one gets when the immediate response why would you even ask that? is checked by a distant memory that scientific theories are supposed to be falsifiable.

Personally, I found it humbling to encounter MOND in my own data. I too thought we understood the universe with dark matter. But who ordered this? Certainly not me: my own conventional, dark-matter based predictions were falsified. No one else working in the context of dark matter had got it right at the time either. Only Milgrom ordered this.

And what is this? There is a direct connection between what we see and what we get. Even in ignorance of MOND, the radial acceleration relation encodes a one-to-one relation between the distribution of baryons and the effective force. This is so direct that one can right down a single equation connecting the two:

g_{obs} = F(g_N/a_0)\,g_N.

The observed acceleration is a simple function of that predicted by Newton for the stars and gas that we see. There is no mention of unseen mass; everything is specified by what we can see is there.

I’ve sometimes heard astronomers complain about the reductionist ethos of physics, trying to cram all the complexity of the entire universe into a theory of everything. But here it is appropriate: there is a single, apparently universal force-law at work in galaxies. That’s telling us something profound. And yet if questioned about this, the physicists are the ones who will complain that galaxies are complicated, so they should be exempted from having to explain them. Galaxies should be complicated – in LCDM. But they’re observed not to be, in the sense that a single equation suffices to describe their kinematics. The problem isn’t that galaxies are inexplicably complicated, it’s that they should be but aren’t.

I am deeply disappointed that many scientists apparently lack the physical intuition to immediately recognize the import of the simple relation between what we see and what we get. It is the same sort of thing Newton noticed in the solar system: everything happens as if the gravitational force is proportional to the product of the masses and the inverse square of their separation. He didn’t understand why at the time, and was criticized for indulging in magical thinking: how can there be action at a distance? But that’s what the data were saying, and the same applies now. We might not yet understand the why, but that the data look as if MOND is what’s happening in this universe.

^#The framing has morphed over the years. A recent advent is that some people have started proactively asserting that invisible mass is in the room with us now in order to avoid having to answer it as a question that makes them sound like loonies.

*He means the third peak should be smaller than the second, not larger, if by “it” he means modified gravity with the baryon density expected from big bang nucleosynthesis, which was the hypothesis that correctly predicted the first-to-second peak ratio but does indeed get the second-to-third peak ratio wrong. Funny how the CMB community was able to completely ignore the successful prediction for several years, but were then suddenly all over the latter failure. The third peak falsifies the ansatz on which that particular prediction was built, not the entire concept of modified gravity. This would be like asserting that all possible forms of dark matter are excluded because we haven’t yet detected WIMPs. It is a classic failure of objectivity, which is another hallmark of faith-based argumentation: we know His name is [insert favorite deity], not [insert any other deity].

^&Or after me. Dark matter was my first hypothesis, and I’m here to tell you that True Believers do not suffer second hypotheses or those who stray from the fold. I guess that’s why so many scientists who are MOND-curious keep it on the down low. Wise, perhaps (that’s why tenure needs to be a thing), but hardly the ideal of the open and free exchange of scientific ideas.

^$I wasn’t there, but one audience member (not someone I knew) thought it was so over the top that he told me about it, sharing a link with a video. (I did not retain that link, and doubt the hosting conference website is still active.)

^%Argument weak here. RAISE VOICE!

Paradigm Shifts in Modern Astrophysics

I see that I’ve been posting once a month so far in 2026. I’ve lots to say but no time to say it. Some of it good, some of it bad, maybe sometime I’ll get around to it. No guarantees. On the good side, I’ve been working on a big project or two; may have something to say about those soon. I’ve also been meaning to write about the Planet 9 anomaly for months stretching into years now. Fascinating stuff related to MOND but not something I’ve worked on myself. On the bad side, I’ve been obliged to waste yet more time on my university administration’s insistence on merging our department into physics based on a snap decision made by a disinterested leader who employed all the forethought typically reserved for bombing a random country in the Middle East.

So I have had no time for novel posts lately, and today is no different. However, I thought readers of this blog would appreciate the post Paradigm Shifts in Modern Astrophysics: Applying Thomas Kuhn’s The Structure of Scientific Revolutions to Dark Matter at Heritage Diner that was pointed out to me by Moti Milgrom. Since I wouldn’t have seen it had he not mentioned it, perhaps that’s the case for you as well. I’m not gonna re-post it verbatim – you can read it there yourself – but I am going to offer a running commentary with a few observations, both personal and historical. So bring it up in a separate browser window and let’s read along…

This post riffs off of Kuhn’s The Structure of Scientific Revolutions as it pertains to dark matter and MOND. If you’re not familiar with it, Kuhn’s work on the philosophy of science is foundational to the way in which a lot of physical scientists approach their field (whether they realize it or not). Philosophers of science have done a lot more since then, but I’m not going to attempt to go there. I will look back to Popper* to note that I’ve heard Kuhn depicted as being some sort of antithesis to Popper. I don’t see it that way. To be pithy, Popper tells us how science should be done while Kuhn tells us how it is done. Who could have imagined that a human endeavor would be messy in practice and not always live up to its ideal?

I’m not sure how to do this; I guess I’ll excerpt relevant quotes and riff off those. The basic thesis is that dark matter is on the brink of a Kuhnian paradigm shift.

We are living through exactly that moment in modern astrophysics.

I certainly hope so! This moment in the history of science is taking a long damn time. A century ago, we went from “classical physics explains everything” to “quantum mechanics, WTF?’ in the space of about a decade. I’ve been working on matters related to MOND for over thirty years now, dark matter longer than that, and of course Milgrom started more than a decade before I did.

The essay discusses the “cartography of collapse,” which includes crisis and revolution:

The third stage is crisis — triggered when anomalies accumulate beyond the paradigm’s absorptive capacity. And the fourth is revolution, in which a new framework displaces the old not through incremental persuasion but through a gestalt shift, what Kuhn famously described as seeing the same duck-rabbit drawing and suddenly recognizing a rabbit where you had always seen a duck.

This resonated with me because I had exactly this experience. I started my career as much a believer in dark matter as anyone. I was barely aware that MOND existed (this seems to remain a common condition). But it reared its ugly head in my own data for low surface brightness galaxies. Try as I might – and I tried mighty hard, for a long time – I could not reconcile how the shapes of rotation curves depended on surface brightness as they should according to Newton while simultaneously lying exactly on the Tully-Fisher relation without any hint of dependence on surface brightness⁺. I could explain one or another, but not both simultaneously – at least, not without engaging in some form of tautology that made it so. I came up with a lot of those, and that has been a full-time occupation for many theorists ever since.

For me, this gradually became a genuine crisis. I pounded my head against the wall for months. Then, as I was wrestling with this problem, I happened to attend a talk by Milgrom. I almost didn’t go. I remember thinking “modified gravity? Who wants to hear about that?” But I did, and in a few short lines on the board, Milgrom derived from MOND exactly the result I found so confusing in terms of dark matter. This chance meeting in Middle Earth (Cambridge, UK) changed how I saw the universe. The change wasn’t immediate – it had to ferment a while – but ultimately I found myself asking myself over and over how this stupid theory could have its predictions come true when there was so much evidence for dark matter. Finally I realized that the evidence for dark matter assumes that gravity is normal; really it was just evidence of a discrepancy, and it could be that the assumption was at fault. That realization was sudden: where I’d always seen a duck, suddenly I could also see a rabbit.

Most scientists have not had this experience. What constitutes a crisis serious enough to contemplate a paradigm change is a highly personal matter of judgement. It happened in my data, so I took it seriously, but others didn’t care. So I made predictions for their data. Some of those came true, but they rejected the evidence of their own data. It just could not be so! At what point does a mere problem amount to a true anomaly?

Part of the sociological issue is that the dark matter paradigm has been in a constant state of crisis since its inception. The reasons vary over time. Sometimes valid solutions have been found to the crisis du jour, other times we’ve chosen to just live with it. It is much easier to live with a bad solution than to rethink one’s entire world view.

The problem with being in a constant state of crisis makes is that it seems like nothing can ever be a genuine crisis. Every foundational change is just another new normal. We complain, say it can’t be so, argue, offer bad ideas, reject them, get used to them, then eventually accept that one of them maybe isn’t so bad, so that must be what is going on. After a few years It is Known and people convince themselves that we expected just that all along.

It takes a lot of evidentiary weight for a paradigm to change, and it takes a lot of time for that to accumulate. But, as Kuhn recognized, mere facts are not enough. Humans and their attitudes matter. As Feyerabend noted,

The normal component [i.e. the accepted paradigm and its adherents] is large and well entrenched. Hence, a change of the normal component is very noticeable. So is the resistance of the normal component to change. This resistance becomes especially strong and noticeable in periods where a change seems to be imminent.

P. Feyerabend in Criticism and Growth of Knowledge

The post correctly points out that dark matter itself was an anomaly going back to Zwicky in 1933. This is often depicted^ as the first detection of dark matter, but it was also noted by Oort in 1932. Zwicky was aware of Oort’s work and cited him, but they’re very different results. Oort was worried about a factor of ~2 discrepancy in stellar dynamics in our local chunk of the Milky Way; Zwicky discovered a discrepancy of a factor of ~1000 in the Coma cluster of galaxies. These both imply the need for unseen mass, but the results are not at all the same. In retrospect, Oort’s discrepancy is a subtle detection of a flat rotation curve while Zwicky’s discrepancy was (at least) two distinct discrepancies: what we now consider the usual cosmic dark matter, but also missing baryons: most of the normal matter in clusters is in the hot, diffuse intracluster medium, not in the stars in the galaxies that Zwicky could see and account for. The modern discrepancy is only a factor of ~6, which is rather less than 1,000. (The distance scale also played a role in exaggerating Zwicky’s result.)

This all seemed crazy in the 1930s, even in the immediate aftermath of the quantum revolution. Consequently, Zwicky’s work was mostly ignored^$. The subject of dark matter didn’t really take off until the 1970s. Considerable credit goes (rightly) to Vera Rubin, though many others made essential contributions – just on the subject of rotation curves, Albert Bosma, Mort Roberts, and Seth Shostak all made important contributions, the relative importance of which depends on who you ask.

An important aspect of scientific revolutions is persistence. Vera was persistent. She was fond of relating the story of showing her first (1970) flat rotation curve of Andromeda to Alan Sandage, only to have him dismiss it as “the effect of looking at a bright galaxy.” What the heck did that mean? Nothing, of course – it is the sort of stupid thing that smart people say when confronted with the inconceivable. So Vera persisted, and by the end of the decade had shown that flat rotation curves were the rule, not some strange exception. They became accepted as a de facto law of nature, and the dark matter interpretation was solidly in place by 1980.

The scientific community absorbed this anomaly not by questioning Newtonian gravity or Einstein’s general relativity, but by proposing an invisible scaffolding — a halo of non-luminous, non-interacting matter surrounding every galaxy. Dark matter became not a crisis but a patch.

Indeed, this seemed the most appropriate (scientifically conservative) course of action at the time, as summarized in this exchange (also from the early 1980s):

To emphasize the essence of what is said here:

Tohline: I might be so bold as to suggest that the validity of Newton’s law should now be seriously questioned.

Rubin: The point you raise is worth keeping in mind although I believe most of us would rather alter Newtonian gravitational theory only as a last resort.

This was a very reasonable attitude, at the time. But I’ve heard the phrase “only as a last resort” many times now over the course of many years from many different scientists. At what point have we reached the last resort? In the case of dark matter, once we’ve convinced ourselves that invisible mass has to exist, how can we possibly disabuse ourselves of that notion, should it happen to be wrong?

In Kuhnian terms the last resort is reached when the weight of anomalies in the standard paradigm become too great to sustain. But that point is never reached for many die-hard adherents. Whatever the right answer about dark matter turns out to be, I’m sure many brilliant people will go to their graves in denial. Hence the more cynical phrase

Science progresses one funeral at a time.^%

But does it? What if the adherents of an ingrained but incorrect paradigm breed faster than they go away? I’ve seen True Believers train graduate students who’ve gone on to train students of their own. Each generation seems to accept without serious examination the inadequate explanations for the anomalies made by their antecedents, so the weight of the anomalies doesn’t accumulate; instead, each one gets swept separately under the proverbial rug and forgotten. Forgetting is important: when new anomalies come to light, hands are waved and new explanations are promulgated; no one chekcs if the new explanations contradict the previous generation of explanations. What passed before is a solved problem, and we need never speak of it again.

This is not a recipe for a scientific revolution, but for a thousand years of dark epicycles.

Returning to the post,

By the late 1980s and early 1990s, dark matter had been formally incorporated into the reigning cosmological framework. Lambda-CDM — where Lambda refers to the cosmological constant (a proxy for dark energy) and CDM stands for Cold Dark Matter — became the standard model of cosmology.

The essence of this statement is correct but some of the details are not. Dark matter was widely accepted by 1980. That’s still a little before my time, but my impression is that the magnitude of the discrepancy was at first a factor of two, so it could simply have been normal baryons that were hard to see. However, the discrepancy rapidly snowballed to an order of magnitude, so we needed something non-baryonic. This was happening simultaneously with talk of supersymmetry and grand unified theories in particle physics that could readily provide new particles to be candidates for the dark matter, leading to the shotgun marriage of particle physics and cosmology, two communities that had had little to do with each other before then, and which still make an odd couple. Cosmology as traditionally practiced by astronomers needed dark matter but didn’t much care what it was; particle physics was all about the possibility of new particles but didn’t care about the details of the astronomical evidence.

To rephrase the above quote, I think it is fair to say that “by the late 1980s and early 1990s, cold dark matter had been formally incorporated into the reigning cosmological framework.” But that framework was not yet LCDM, it was Ω_m = 1 SCDM. The Lambda only came to prominence by the end of the 1990s, as I’ve related elsewhere. This process is depicted by many scientists as a revolution in itself, and in many regards it was. The cosmological constant had been very far out of favor; rehabilitating it was a grueling experience and no trivial matter. But it wasn’t really a scientific revolution in the sense that Kuhn meant: our picture didn’t fundamentally change, we just learned to accept a parameter^& that was already there but that we didn’t like.

The post goes on to note the absence of dark matter detections:

This silence is itself an anomaly… as the silence deepens, the null result itself becomes harder to dismiss.

This is correct, and yet… Physicists have built many experiments that have achieved extraordinary sensitivities. If cold dark matter was composed of WIMPs as originally hypothesized, we would have detected them long ago. Initially, the reaction was to modify WIMPs. Did we say the cross-section would be 10^-39 cm²? We meant 10^-44 cm². When that was excluded, we slid the cross section still lower, but people also started giving themselves permission to think the unthinkable. By unthinkable I mean a particle that can’t be detected, not modified gravity. That’s more unthinkable. So the anomaly isn’t dismissed, but it is treated with less gravity than it should be, and certainly with less import than a positive detection would have been granted. Did we say WIMPs? We didn’t mean just WIMPs. It could be anything. (They damn well meant WIMPs and only WIMPs^#. Anyone who tells you otherwise is gaslighting^*% you, and probably themselves.)

The post goes on to talk about MOND. It gives me too much credit for the gravitational lensing work. This was done by Tobias Mistele, and our work is based on that of Brouwer et al. But it is correct to note that these data are a problem for the dark matter paradigm. Rotation curves remain flat beyond where dark matter halos should end. If correct, this is a genuine anomaly. Perhaps in some distant future it will be recognized** as such in retrospect; at present it seems mostly to be ignored.

It goes on to talk about the JWST observations. Yeah, that part is correct. The community seems to be in the usual process of gaslighting itself into denial of the anomaly. For the first two years after JWST started returning images of the deep universe, people were aghast. How can this be so? It was all anyone could talk about. But then the unexpected became the new normal. Hands were waved, star formation was accepted to be absurdly efficient, and people accepted the impossible. I no longer hear the talk of how problematic the JWST observations are; this chatter simply stopped.

Anomalies don’t weigh a paradigm down if we don’t accept that they’re anomalies. But I’ve lived through the revolution, it’s hard to see a positive outcome while it is still ongoing. For it is certainly true that

What waits on the other side of the dark matter revolution — if that is what is coming — we cannot yet know.

The future is the unknown territory. We don’t know, and can never know, if dark matter doesn’t exist – it is impossible to prove the negative. But we do know MOND works much better than it should in a universe made of dark matter. That demands a scientific explanation that is still wanting. But MOND by itself is not a complete answer, so we are like the parable of the blind men and the elephant, each sensing a different part of reality but as yet unable to see the whole.

Still, there is reason for optimism. The article closes by noting that

Kuhn’s deepest insight was not that science changes. It is that the change, when it comes, is never merely technical. It is a reorganization of the world itself — the universe seen suddenly whole in a configuration it has always had, but that we had simply lacked the paradigm to perceive.

Not knowing how things ultimately work out is good, actually. One way or the other, there is still fundamental science to be done. We have not reached the stage of looking for our discoveries in the sixth place of decimals.

*Trivia I just learned looking at Popper’s wikipedia page: he was spending his last days in London around the same time I was a postdoc in Cambridge just starting to struggle with the scientific and philosophical implications of the dark matter-MOND miasma.

Unrelated trivia: I was at a workshop in Jerusalem early in the century but missed the opportunity to meet Jacob Bekenstein because I was too shy to bother the great man.

⁺If you do not find this confusing, you are not thinking clearly.

^A nice, brief summary of this early history is related by Einasto. This is the first place I’ve seen the citation to Opik (1915) written out. I’ve only heard mentioned verbally before, so I’ll have to try looking that up later.

The full story is way more complicated than this sounds, and still gets debated off and on. The amplitude of the Oort discrepancy is much smaller today. Locally, the 3D density of mass seems to be accounted for by known stars, gas, and stellar remnants (which were still a new thing in the 1930s). So this Oort limit shows no discrepancy. There remains a modest discrepancy in the 2D dynamical surface density. It appears to me to boil down to the vertical restoring force having a (sometimes ignored) term that depends on the gradient of the rotation curve. Were that falling in a normal Newtonian way, there would be no discrepancy. But it isn’t; this deviation from Newton in the radial direction leads to the Oort discrepancy in the vertical direction. Instead of being as negative as Newton predicts, dV/dR is close to zero, hence my description of this as an indirect detection of a[n almost] flat rotation curve. (dV/dR = -1.7 km/s/kpc, so not exactly zero, but a lot closer to zero than Newton without dark matter would have it be.) The vertical discrepancy is nevertheless much reduced, now being well below a factor of two.

^$To his apparently great embitterment. He had some choice things to say about astronomers of his time. I am inclined to suspect that those who praise Zwicky the loudest today would have been among those he had reason to complain about had they been contemporaries.

^%This is attributed to Planck, but he had a lot more nuanced things to say about it in his Nobel Prize lecture.

^&Einstein disavowed the cosmological constant as his “greatest blunder,” so one argument against it was (for a long time) that it should never have been a part of the theory of General Relativity in the first place. I wonder how things might have gone had that been the case – that he had never introduced Lambda. Perhaps then the data that led to us accepting Lambda would have required a genuine revolution, but it isn’t obvious that we would have accepted it (we might still be debating it), nor is it apparent that LCDM is what comes out of such a revolution. But we don’t get to do that experiment: the Great Man had suggested Lambda, so it was OK to bring it back: we weren’t wrecking his theory by introducing a crazy new entity, we were just admitting an unlikely (antigravity-like) component thereof.

^#Or axions! Or warm or self-interacting dark matter. Or macros nee strange nuggets! Or or or… Sure, there have been lots of ideas for what the dark matter could be. But when we say that “by the late 1980s and early 1990s, cold dark matter had been formally incorporated into the reigning cosmological framework” what the vast majority of scientists working on the topic (including myself) meant was that CDM == WIMPs. We were aggressively derisive of other ideas, and these are only dredged up again now because of the experimental non-detection of WIMPs. WIMPs are still a better dark matter candidate than the others for the same reasons that we were derisive of the others back in the day. We haven’t been looking as hard for the others, so comparable experimental limits do not yet exist. To quote myself,

The concept of dark matter is not falsifiable. If we exclude one candidate, we are free to make up another one. After WIMPs, the next obvious candidate is axions. Should those be falsified, we invent something else. (Particle physicists love to do this. The literature is littered with half-baked dark matter candidates invented for dubious reasons, often to explain phenomena with obvious astrophysical causes. The ludicrous uproar over the ATIC and PAMELA cosmic ray experiments is a good example.)

McGaugh (2008)

^*%An easy way to deflate such gaslighting is to ask why so many experiments have been built to search for WIMPs but not all these other allegedly great dark matter candidates. After a pause and dismayed stare, you’ll probably get an answer about “looking under the lamp post” because that’s where it is possible to make detections. That’s sorta true, but it isn’t the real reason. The real reason is that we all drank the Kool-Aid of the WIMP miracle, so genuinely believed that the dark matter had to be WIMPS, not merely that they were a convenient experimental target. (I did not chug the kool-aid as hard as the people who based entire careers on building WIMP detection experiments, but I did buy into the idea to the exclusion of other possibilities for dark matter – as did most everyone else.)

**In retrospect, Galileo’s observations of the angular size and phases of Venus were utterly fatal to the geocentric paradigm. That’s easy to say now; at the time it was just another piece of evidence.

Very thin galaxies

The stability of spiral galaxies was a foundational motivation to invoke dark matter: a thin disk of self-gravitating stars is unstable unless embedded in a dark matter halo. Modified dynamics can also stabilize galactic disks. A related test is provided by how thin such galaxies can be.

Thin galaxies exist

Spiral galaxies seen edge-on are thin. They have a typical thickness – their short-to-long axis ratio – of q ≈ 0.2. Sometimes they’re thicker, sometimes they’re thinner, but this is often what we assume when building mass models of the stellar disk of galaxies that are not seen exactly* edge-on. One can employ more elaborate estimators, but the results are not particularly sensitive to the exact thickness so long as it isn’t the limit of either razor thin (q = 0) or a spherical cow (q = 1).

Sometimes galaxies are very thin. Behold the “superthin” galaxy UGC 7321:

*UGC 7321 as seen in optical colors by the Sloan Digital Sky Survey.*

It also looks very thin in the infrared, which is the better tracer of stellar mass:

**Fig. 1** from Matthews et al (1999): *H-band (1.6 micron) image of UGC 7321. Matthews (2000) finds a near-IR axis ratio of 14:1. That’s super thin (q = 0.07)!*

UGC 7321 is very thin, would be low surface brightness if seen face-on (Matthews estimates a central B-band surface brightness of 23.4 mag arcsec^-2), has no bulge component thickening the central region, and contains roughly as much mass in gas as stars. All of these properties dispose a disk to be fragile (to perturbations like mergers and subhalo crossings) and unstable, yet there it is. There are enough similar examples to build a flat galaxy catalog, so somehow the universe has figured out a way for galaxy disks to remain thin and dynamically cold^# for the better part of a Hubble time.

We see spiral galaxies at various inclinations to our line of sight. Some will appear face on, others edge-on, and everything in between. If we observe enough of them, we can work out what the intrinsic distribution is based on the projected version we see.

First, some definitions. A 3D object has three principle axes of lengths a, b, and c. By convention, a is the longest and c the shortest. An oblate model imagines a galaxy like a frisbee: it is perfectly round seen face-on (a = b); seen edge-on q = c/a. More generally, an object can be triaxial, with a ≠ b ≠ c. In this case, a galaxy would not appear perfectly round even when seen perfectly face-on^{^} because it is intrinsically oval (with similar axis lengths a ≈ b but not exactly equal). I expect this is fairly common among dwarf Irregular galaxies.

The observed and intrinsic distribution of disk thicknesses

Benevides et al. (2025) find that the distribution of observed axis ratios q is pretty flat. This is a consequence of most galaxies being seen at some intermediate viewing angle. One can posit an intrinsic distribution, model what one would see at a bunch of random viewing angles, and iterate to extract the true distribution in nature, which they do:

**Figure 6** from Benevides et al. (2025): Comparison between the observed (projected) $q$ distribution and the inferred intrinsic 3D axis ratios for a subsample of dwarfs in the GAMA survey with $M_{⋆} = 10^{9}$ – $10^{9.5} M_{⊙}$ . The observed shapes are shown with the solid black line and are used to derive an intrinsic $c / a$ (long-dashed) and $b / a$ (dotted) distribution when projected. Solid color lines in each panel corresponds to the $q$ values obtained from the 3D model after random projections. Note that a wide distribution of $q$ values is generated by a much narrower intrinsic $c / a$ distribution. For example, the blue shaded region in the left panel shows that an observed $5 %$ of galaxies with $q < 0.2$ requires $41 %$ of galaxies to have an intrinsic $c / a < 0.2$ for an oblate model. Similarly, for a triaxal model (right panel, red curve) $43 %$ of galaxies are required to be thinner than $c / a = 0.2$ . The additional freedom of $b \neq a$ in the triaxial model helps to obtain a better fit to the projected $q$ distribution, but the changes mostly affect large $q$ values and changes little the $c / a$ frequency derived from highly elongated objects.

That we see some thin galaxies implies that they they have to be common, as most of them are not seen edge-on. For dwarf^$ galaxies of a specific mass range, which happens to include UGC 7321, Benevides et al. (2025) infer a lot^% of thin galaxies, at least 40% with q < 0.2. They also infer a little bit of triaxiality, a ≈ b.

The existence and numbers of thin dwarfs seems to come as a surprise to many astronomers. This is perhaps driven in part by theoretical expectations for dwarf galaxies to be thick: a low surface brightness disk has little self-gravity to hold stars in a narrow plane. This expectation is so strong that Benevides et al. (2025) feel compelled to provide some observed examples, as if to say look, really:

**Figure 8** – images of real galaxies from Benevides et al. (2025): Examples of $10$ highly elongated dwarf galaxies with $q \leq 0.2$ and $M_{⋆} = 10^{7}$ – $10^{8.5} M_{⊙}$ . They resemble thin edge-on disks and can be found even among the faintest dwarfs in our sample. Legends in each panel quote the stellar mass, the shape parameter $q$ , as well as the GAMA identifier. Objects are sorted by increasing $M_{⋆}$ , left to right.

As an empiricist who has spent a career looking at low mass and low surface brightness galaxies, this does not come as a surprise to me. These galaxies look normal. That’s what the universe of late type dwarf^$ galaxies looks like.

Edge-on galaxies in LCDM simulations

Thin galaxies do not occur naturally in the hierarchical mergers of LCDM (e.g., Haslbauer et al. 2022), where one would expect a steady bombardment by merging masses to mess things up. The picture above is not what galaxy-like objects in LCDM simulations look like. Scraping through a few simulations to find the flattest galaxies, Benevides et al. (2025) find only a handful of examples:

**Figure 11** – images of simulated galaxies from Benevides et al. (2025): *Edge-on projection of examples of the flattest galaxies in the TNG50 simulation, in different bins of stellar mass.*

Note that only the four images on the left here occupy the same stellar mass range as the images of reality above. These are as close as it gets. Not terrible, but also not representative^&. The fraction of galaxies this thin is a tiny fraction of the simulated population whereas they are quite common in reality. Here the two are compared: three different surveys (solid lines) vs. three different simulations (dashed lines).

**Figure 9** from Benevides et al. (2025): Fraction of galaxies that are derived to be intrinsically thinner than $c / a \leq 0.2$ as a function of stellar mass. Thick solid lines correspond to our observational samples while dashed lines are used to display the results of cosmological simulations. Different colors highlight the specific survey or simulation name, as quoted in the legend. In all observational surveys, the frequency of thin galaxies peaks for dwarfs with $M_{⋆} \sim 10^{9} M_{⊙}$ , almost doubling the frequency observed on the scale of MW-mass galaxies. Thin galaxies do not disappear at lower masses: we infer a significant fraction of dwarf galaxies with $M_{⋆} < 10^{9} M_{⊙}$ to have $c / a < 0.2$ . This is in stark contrast with the negligible production of thin dwarf galaxies in all numerical simulations analyzed here.

Note that the thinnest galaxies in nature are dwarfs of mass comparable to UGC 7321. Thin disks aren’t just for bright spirals like the Milky Way with log(M_*) > 10.5. They are also common^*$ for dwarfs with log(M_*) = 9 and even log(M_*) = 8, which are often gas dominated. In contrast, the simulations produce almost no galaxies that are thin at these lower masses.

The simulations simply do not look like reality. Again. And again, etc., etc., ad nauseam. It’s almost as if the old adage applies: garbage in, garbage out. Maybe it’s not the resolution or the implementation of the simulations that’s the problem. One could get all that right, but it wouldn’t matter if the starting assumption of a universe dominated by cold dark matter was the input garbage.

Galaxy thickness in Newton and MOND

Thick disks are not merely a product of simulations, they are endemic to Newtonian dynamics. As stars orbit around and around a galaxy’s center, they also oscillate up and down, bobbing in and out of the plane. How far up they get depends on how fast they’re going (the dynamical temperature of the stellar population) and how strong the restoring force to the plane of the disk is.

In the traditional picture of a thin spiral galaxy embedded in a quasi-spherical dark matter halo, the restoring force is provided by the stars in the disk. The dark matter halo is there to boost the radial force to make the rotation curve flat, and to stabilize the disk, for which it needs to be approximately spherical. The dark matter halo does not contribute much to the vertical restoring force because it adds little mass near the disk plane. In order to do that, the halo would have to be very squashed (small q) like the disk, in which case we revive the stability problem the halo was put there to solve.

This is why we expect low surface brightness disks to be thick. Their stars are spread thin, the surface mass density is low, so the restoring force to the disk should be small. Disks as thin as UGC 7321 shouldn’t be possible unless they are extremely cold^*# dynamically – a situation that is unlikely to persist in a cosmogony built by hierarchical merging. The simulations discussed above corroborate this expectation.

In MOND, there is no dark matter halo, but the modified force should boost the vertical restoring force as well as the radial force. One thus expects thinner disks in MOND than in Newton.

I pointed this out in McGaugh & de Blok (1998) along with pretty much everything else in the universe that people tell me I should consider without bothering to check if I’ve already considered. Here is the plot I published at the time:

**Figure 9** of McGaugh & de Blok (1998): Thickness q = z₀/h expected for disks of various central surface densities ₀. Shown along the top axis is the equivalent B-band central surface brightness ₀ for _* = 2. Parameters chosen for illustration are noted in the figure (a typical scale length h and two choices of central vertical velocity dispersion _z). Other plausible values give similar results. The solid lines are the Newtonian expectation and the dashed lines that of MOND. The Newtonian and MOND cases are similar at high surface densities but differ enormously at low surface densities. Newtonian disks become very thick at low surface brightness. In contrast, MOND disks can remain reasonably thin to low surface density.

There are many approximations that have to be made in constructing the figure above. I assumed disks were plane-parallel slabs of constant velocity dispersion, which they are not. But this suffices to illustrate the basic point, that disks should remain thinner^&% in MOND than in Newton as surface density decreases: as one sinks further into the MOND regime, there is relatively more restoring force keep disks thin. To duplicate this effect in Newton, one must invent two kinds of dark matter: a dissipational kind of dark matter that forms a dark matter disk in addition to the usual dissipationless cold dark matter that makes a quasi-spherical dark matter halo.

The idea of the plot above was to illustrate the trend of expected thickness for galaxies of different central surface brightness. One can also build a model to illustrate the expected thickness as a function of radius for a pair of galaxies, one high surface brightness (so it starts in the Newtonian regime at small radii) and one of low surface brightness (in the MOND regime everywhere). I have chosen numbers^** resembling the Milky Way for the high surface brightness galaxy model, and scaled the velocity dispersion of the low surface brightness model so it has very nearly the same thickness in the Newtonian regime. In MOND, both disks remain thin as a function of radius (they flare a lot in Newton) and the lower surface brightness disk model is thinner thanks to the relatively stronger restoring force that follows from being deeper in the MOND regime.

The thickness of two model disks, one high surface brightness (solid lines) and the other low surface brightness (dashed lines), as a function of radius. The two are similar in Newton (black), but differ in MOND *(blue)*. The restoring force to the disk is stronger in MOND, so there is less flaring with increasing radius. The low surface brightness galaxy is further in the MOND regime, leading naturally to a thinner disk.

These are not realistic disk models, but they again suffice to illustrate the point: thin disks occur naturally in MOND. Low surface brightness disks should be thick in LCDM (and in Newtonian dynamics in general), but can be as thin as UGC 7321 in MOND. I didn’t aim to make q ≈ 0.1 in the model low surface brightness disk; it just came out that way for numbers chosen to be reasonable representations of the genre.

What the distribution of thicknesses is depends on the accretion and heating history of each individual disk. I don’t claim to understand that. But the mere existence of dwarf galaxies with thin disks is a natural outcome in MOND that we once again struggle to comprehend in terms of dark matter.

*Seeing a galaxy highly inclined minimizes the inclination correction to the kinematic observations [V_rot = V_obs/sin(i)] but to build a mass model we also need to know the face-on surface density profile of the stars, the correction for which depends on 1/cos(i). So as a practical matter, the competition between sin(i) and cos(i) makes it difficult to analyze galaxies at either extreme.

^#Dynamically cold means the random motions (quantified by the velocity dispersion of stars σ) are small compared to ordered rotation (V) in the disk, something like V/σ ≈ 10. As a disk heats (higher σ) it thickens, as some of that random motion goes in the vertical direction perpendicular to the disk. Mergers heat disks because they bring kinetic energy in from random directions. Even after an object is absorbed, the splash it made is preserved in the vertical distribution of the stars which, once displaced, never settle back into a thin disk. (Gas can settle through dissipation, but point masses like stars cannot.)

^Oval distortions are a major source of systematic error in galaxy inclination estimates, especially for dwarf Irregulars. It is an asymmetric error: a galaxy with a mild oval distortion can be inferred to have an inclination (i > 0) even when seen face-on (i = 0), but it can never have an inclination more face-on (i < 0) than exactly face-on. This is one of the common drivers of claims that low mass galaxies fall off the Tully-Fisher relation. (Other common problems include a failure to account for gas mass, bad distance estimates, or not measuring V_flat.)

^$In a field with abominable terminology, what is meant by a “dwarf” galaxy is one of the worst offenders. One of my first conference contributions thirty years ago griped about the [mis]use of this term, and matters have not improved. For this particular figure, Benevides et al. (2025) define it to mean galaxies with stellar masses in the range 9 < log(M_*) < 9.5, which seems big to me, but at least it is below the mass of a typical L* spiral, which has log(M_*) ~ 10.5. For comparison, see Fig. 6 of the review of Bullock & Boylan-Kolchin (2017), who define “bright dwarfs” to have 7 < log(M_*) < 9, and go lower from there, but not higher into the regime that we’re calling dwarf right now. So what a dwarf galaxy is depends on context.

^%Note that the intrinsic distribution peaks below q = 0.2, so arguably one should perhaps adopt as typical the mode of the distribution (q ≈ 0.17).

^&Another way in which even the thin simulated objects are not representative of reality is that they are dynamically hot, as indicated by the κ_rot parameter printed with the image. This is the fraction of kinetic energy in rotation. One of the more favorable cases with κ_rot = 0.67 corresponds to V/σ = 2.5. That happens in reality, but higher values are common. Of course, thin disks and dynamical coldness go hand in hand. Since the simulations involve a lot of mergers, the fraction of kinetic energy in rotation is naturally small. So I’m not saying the simulations are wrong in what they predict given the input physics that they assume, but I am saying that this prediction does not match reality.

^*$The fraction of thin galaxies observed by DESI is slightly higher than found in the other surveys. Having looked at all these data, I am inclined to suspect the culprit is image quality: that of DESI is better. Regardless of the culprit for this small discrepancy between surveys, thin disks are much more common in reality than in the current generation of simulations.

^*#There seems to be a limit to how cold disks get, with a minimum velocity dispersion around ~7 km/s observed in face-on dwarfs when the appropriate number, according to Newton, would be more like 2 km/s, tops. I remember this number from observations in the ’80s and ’90s, along with lots of discussion then to the effect of how can it be so? but it is the new year and I’m feeling too lazy to hunt down all the citations so you get a meme instead.

^&%In an absolute sense, all other things being equal, which they’re not, disks do become thicker to lower surface brightness in both Newton and MOND. There is less restoring force for less surface mass density. It is the relative decline in restoring force and consequent thickening of the disk that is much more precipitous in Newton.

^**For the numerically curious, these models are exponential disks with surface density profiles Σ(R) = Σ₀ e^-R/R_d. Both models have a scale length R_d = 3 kpc. The HSB has Σ₀ = 866 M_☉ pc^-2; this is a good match to the Eilers et al. (2019) Milky Way disk; see McGaugh (2019). The LSB has Σ₀ = 100 M_☉ pc^-2, which corresponds roughly to what I consider the boundary of low surface brightness, a central B-band surface brightness of ~23 mag. arcsec^-2. For the velocity dispersion profile I also assume an exponential with scale length 2R_d (that’s what supposed to happen). The central velocity dispersion of the HSB is 100 km/s (an educated guess that gets us in the right ballpark) and that of the LSB is 33 km/s – the mass is down by a factor of ~9 so the velocity dispersion should be lower by a factor of $\sqrt{9}$ . (I let it be inexact so the solid and dashed Newtonian lines wouldn’t exactly overlap.)

These models are crude, being single-population (there can be multiple stellar populations each with their own velocity dispersion and vertical scale height) and lacking both a bulge and gas. The velocity dispersion profile sometimes falls with a scale length twice the disk scale length as expected, sometimes not. In the Milky Way, R_d ≈ 2.5 or 3 kpc, but the velocity dispersion falls off with a scale length that is not 5 or 6 kpc but rather 21 or 25 kpc. I have also seen the velocity dispersion profile flatten out rather than continue to fall with radius. That might itself be a hint of MOND, but there are lots of different aspects of the problem to consider.

The odd primordial halo of the Milky Way

The mass distribution of dark matter halos that we infer from observations tells us where the dark matter needs to be now. This differs form the mass distribution it had to start, as it gets altered by the process of galaxy formation. It is the primordial distribution that dark matter-only simulations predict most robustly. We* reverse-engineer the collapse of the baryons that make up the visible Galaxy to infer the primordial distribution, which turns out to be… odd.

The Gaia rotation curve and the mass of the Milky Way

As we discussed a couple of years ago, Gaia DR3 data indicate a declining rotation curve for the Milky Way. This decline becomes more steep, nearly Keplerian, in the outskirts of the Milky Way (17 < R < 30 kpc). This is may or may not be consistent with data further out, which gets hard to interpret as the LMC (at 50 kpc) perturbs orbits and the observed motions may not correspond to orbits in dynamical equilibrium. So how much do the data inform us about the gravitational potential?

Milky Way rotation curve (various data) including Gaia DR3 (multiple analyses). Also shown is the RAR model (blue line) that was fit to the terminal velocities from 3 < R < 8.2 kpc (gray points) and predates other data illustrated here.

I am skeptical of the Keplerian portion of this result (as discussed at length at the time) because other galaxies don’t do that. However, I am a big fan of listening to the data, and the people actually doing the work. Taken at face value, the Gaia data show a Keplerian decline with a total mass around 2 x 10¹¹ M_☉. If correct, this falsifies MOND.

How does dark matter fare? There is an implicit assumption made by many in the community that any failing of MOND is an automatic win for dark matter. However, it has been my experience that observations that are problematic for MOND are also problematic for dark matter. So let’s check.

Short answer: this is really weird in terms of dark matter. How weird? For starters, most recent non-Gaia dynamical analyses suggest a total mass closer to 10¹² M_☉, a factor of five higher than the Gaia value. I’m old enough to remember when the accepted mass was 2 x 10¹² M_☉, an order of magnitude higher. Yet even this larger mass is smaller than suggested by abundance matching recipes, which give more like 4 x 10¹² M_☉. So somewhere in the range 2 – 40 x 10¹¹ M_☉.

The Milky Mass has been adjusted so often, have we finally hit it?

The guy was all over the road. I had to swerve a number of times before I hit him.
Boston Driver’s Handbook (1982 edition)^&

If it sounds like we’re all over the map, that’s because we are. It is very hard to constrain the total mass of a dark matter halo. We can’t see it, nor tell where it ends. We infer, indirectly, that the edge is way out beyond the tracers we can see. Heck, even speaking of an “edge” is ill-defined. Theoretically, we expect it to taper off with the density of dark matter falling as ρ ~ r^-3, so there is no definitive edge. Somewhat arbitrarily,** we adopt the radius that encloses a density 200 times the average density of the universe as the “virial” radius. This is all completely notional, and it gets worse, as the process of forming a galaxy changes the initial mass distribution. What we observe today is the changed form, not the primordial initial condition for which the notional mass is defined.

Adiabatic compression during galaxy formation

To form a visible galaxy, baryons must dissipate and sink to the center of their parent dark matter halo. This process changes the mass distribution and alters the halo from its primordial state. In effect, the gravity of the sinking baryons drags some dark matter along^# with them.

The change to the dark matter halo is often called adiabatic compression. The actual process need not be adiabatic, but that’s how we approximate it. We’ve tested this approximation with detailed numerical simulations, and it works pretty well, at least if you do it right (there are boring debates about technique). What happens makes sense intuitively: the response of the primordial halo to the infall of baryons is to become more dense at the center. While this makes sense physically, it is problematic for LCDM as it takes an NFW halo that is already too dense at the center to be consistent with data and makes it more dense. This has been known forever, so opposing this is one thing feedback is invoked to do, which it may or may not do, depending on how it really works. Even if feedback can really turn a compressed cusp into a core, it is widely to expected to be important only in low mass galaxies where the gravitational potential well isn’t too deep. It isn’t supposed to be all that important in galaxies as massive as the Milky Way, though I’m sure that can change as needed.

There are a variety of challenges to implementing an accurate compression computation, so we usually don’t bother: the standard practice is to assume a halo model and fit it to the data. That will, at best, given a description of the current dark matter halo, not what it started as, which is our closest point of comparison with theory. To give an example of the effect, here is a Milky Way model I built a decade ago:

**Figure 13** from McGaugh (2016): Milky Way rotation curve from the data of Luna et al. (2006, red points) and McClure-Griffiths & Dickey (2007, gray points) together with a bulgeless baryonic mass model (black line). The total rotation is approximately fit (blue line) with an adiabatically compressed NFW halo (solid green line) using the procedure implemented by Sellwood & McGaugh (2005). The primordial halo before compression is shown as the dashed line. The parameters of the primordial halo are a concentration c = 7 and a mass *M₂₀₀ = 6 x 10¹¹ M_☉*. *Fitting NFW to the present halo instead gives c = 14, M₂₀₀ = 4 x 10¹¹ M_☉, so the difference is appreciable and depend on the quality and radial extent of the available data.*

The change from the green dashed line to the solid green line is the difference compression makes. That’s what happens if a baryon distribution like that of the Milky Way settles in an NFW halo. The inferred mass M₂₀₀ is lower and the concentration c higher than it originally was – and it is the original version that we should compare to the expectations of LCDM.

When I built this model, I considered several choices for the bulge/bar fraction: something reasonable, something probably too large, and something definitely too small (zero). The model above is the last case of zero bulge/bar. I show it because it is the only case for which the compression procedure worked. If there is a larger central concentration of baryons – i.e., a bulge and/or a bar – then the compression is greater. Too great, in fact: I could not obtain a fit (see also Binney & Piffl and this related discussion).

The calculation of the compression requires knowledge of the primordial halo parameters, which is what one is trying to obtain. So one has to guess an initial state, run the code, check how close it came, then iterate the initial guess. This is computationally expensive, so I was just eyeballing the fit above. Pengfei has done a lot of work to implement a method that iteratively computes the compression and rigorously fits it to data. So we decided to apply it to the newer Gaia DR3 data.

Fitting the Gaia rotation curve with adiabatically compressed halos

We need two inputs here: one, the rotation curve to fit, and two, the baryonic distribution of the Milky Way. The latter is hard to specify given our location within the Milky Way, so there are many different estimates. We tried a dozen.

Another challenge of doing this is deciding which data rotation curve data to fit. We chose to focus on the rotation curve of Jiao et al. (2023) because they made estimates of the systematic as well as random errors. The statistics of Gaia are so good it is practically impossible to fit any equilibrium model to them. There are aspects of the data for which we have to consider non-equilibrium effects (spiral arms, the bar, “snails” from external perturbations) so the usual assumptions are at best an approximation, plus there can always be systematic errors. So the approach is to believe the data, but with the uncertainty estimate of Jiao et al. (2023) that includes systematics.

For a halo model, we started with the boilerplate LCDM NFW halo^$. This doesn’t fit the data. Indeed, all attempts to fit NFW halos fail in similar ways for all of the different baryonic mass models we tried. The quasi-Keplerian part of the Gaia rotation curve simply cannot be fit: the NFW halo inevitably requires more mass further out.

Here are a few examples of the NFW fits:

**Fig. A.3** from Li et al. (2025). Fits of Galactic circular velocities using the NFW model implementing adiabatic halo contraction using 3 baryonic models. [Another 9 appear in the paper.] Data points with errors are the rotation velocities from Jiao et al. (2023), while open triangles show the data from Eilers et al. (2019), which are not fitted. [The radius ranges from 5 to 30 kpc.] Blue, purple, green and black solid lines correspond to the contributions by the stellar disk, central bar, gas (and dust if any), and compressed dark matter halo, respectively. The total contributions are shown using red solid lines. Black dashed lines are the inferred primordial halos.

LCDM as represented by NFW suffers the same failure mode as seen in MOND (plot at top): both theories overshoot the Gaia rotation curve at R > 17 kpc. This is an example of how data that are problematic for MOND are also problematic for dark matter.

We do have more freedom in the case of dark matter. So we tried a different halo model, Einasto. (For this and many other halo models, see Pengfei’s epic compendium of dark matter halo fits.) Where NFW has two parameters, a concentration c and mass M₂₀₀, Einasto has a third parameter that modulates the shape of the density profile^%. For a very specific choice of this third parameter (α = 0.17), it looks basically the same as NFW. But if we let α be free, then we can obtain a fit. Of all the baryonic models, the RAR model+compressed Einasto fits best:

**Fig. 1** from Li et al. (2025). Example of a circular velocity fit using the McGaugh19^$$ model for baryonic mass distributions. The purple, blue, and green lines represent the contributions of the bar, disk, and gas components, respectively. The solid and dashed black lines show the current and primordial dark matter halos, respectively. The solid red line indicates the total velocity profile. The black points show the latest Gaia measurements (Jiao et al. 2023), and the gray upward triangles and squares show the terminal velocities from (McClure-Griffiths & Dickey 2007, 2016), and Portail et al. (2017), respectively. The data marked with open symbols were not fit because they do not consider the systematic uncertainties.

So it is possible to obtain a fit considering adiabatic compression. But at what price? The parameters of the best-fit primordial Einasto halo shown above are c = 5.1, M₂₀₀ = 1.2 x 10¹¹ M_☉, and α = 2.75. That’s pretty far from the α = 0.17 expected in LCDM. The mass is lower than low. The concentration is also low. There are expectation values for all these quantities in LCDM, and all of them miss the mark.

**Fig. 2** from Li et al. (2025). Halo masses and concentrations of the primordial Galactic halos derived from the Gaia circular velocity fits using 12 baryonic models. The red and blue stars with errors represent the halos with and without adiabatic contraction, respectively. The predicted halo mass-concentration relation within 1 σ from simulations (Dutton & Macciò 2014) is shown as the declining band. The vertical band shows the expected range of the MW halo mass according to the abundance-
matching relation (Moster et al. 2013). The upper and lower limits are set by the highest stellar mass model plus 1 σ and the lowest stellar mass model minus 1 σ, respectively.

The expectation for mass and concentration is shown as the bands above. If the primordial halo were anything like what it should be in LCDM, the halo parameters represented by the red stars should be where the bands intersect. They’re nowhere close. The same goes for the shape parameter. The halo should have a density profile like the blue band in the plot below; instead it is more like the red band.

**Fig. 3** from Li et al. (2025). Structure of the inferred primordial and current Galactic halos, along with predictions for the cold and warm dark matter. The density profiles are scaled so that there is no need to assume or consider the masses or concentrations for these halos. The gray band indicates the range of the current halos derived from the Gaia velocity fits using the 12 baryonic models, and the red band shows their corresponding primordial halos within 1σ. The blue band presents the simulated halos with cold dark matter only (Dutton & Macciò 2014). The purple band shows the warm dark matter halos (normalized to match the primordial Galactic halo) with a core size spanning from 4.56 kpc (WDM5 in Macciò et al. 2012) to 7.0 kpc, corresponding to a particle mass of 0.05 keV and lower.

So the primordial halo of the Milky Way is pretty odd. From the perspective of LCDM, the mass is too low and the concentration is too low. The inner profile is too flat (a core rather than a cusp) and the outer profile is too steep. This outer steepness is a large part of why the mass comes out so low; there just isn’t a lot of halo out there. The characteristic density ρ_s is at least in the right ballpark, so aside from the inner slope, the outer slope, the mass, and the concentration, LCDM is doing great.

What if we ignore the naughty bits?

It is really hard for any halo model to fit the steep decline of the Gaia rotation curve at R > 17 kpc. Doing so is what makes the halo mass so small. I’m skeptical about this part of the data, so do things improve if we don’t sweat that part?

Ignoring the data at R > 17 kpc allows the mass to be larger, consistent with other dynamical determinations if not quite with abundance matching. However, the inner parts of the rotation curve still prefer a low density core. That is, something like the warm dark matter halo depicted as the purple band above rather than NFW with its dense central cusp. Or self-interacting dark matter. Or cold dark matter with just-so feedback. Or really anything that obfuscates the need to confront the dangerous question: why does MOND perform better?

*This post is based on the recently published paper by my former student Pengfei Li, who is now faculty at Nanjing University. They have a press release about it.

^&A few months after reading this in the Boston Driver’s Handbook, this exact thing happened to me.

**This goes back to BBKS in 1986 when the bedrock assumption was that the universe had Ω_m = 1, for which the virial radius was 188 times the critical density. 200 was close enough, and stuck, even though for LCDM the virial radius is more like an overdensity close to 100, which is even further out.

^#This is one of many processes that occur in simulations, which are great for examining the statistics of simulated galaxy-like objects but completely useless for modeling individual galaxies in the real universe. There may be similar objects, but one can never say “this galaxy is represented by that simulated thing.” To model a real galaxy requires a customized approach.

^$NFW halos consistently perform worse in fitting data than any other halo model, of which there are many. It has been falsified as a viable representation of reality so many times that I can’t recall them all, and yet they remain the go-to model. I think that’s partly thanks to their simplicity – it is mathematically straightforward to implement – and to the fact that is what simulations predict: LCDM halos should look like NFW. People, including scientists, often struggle to differentiate simulation from reality, so we keep flogging the dead horse.

^%The density profile of the NFW halo model asymptotes to power laws at both small and large radii: ρ → r^-1 as r → 0 and ρ → r^-3 as r → ∞. The third parameter of Einasto allows a much wider ranges of shapes.

^$$The McGaugh19 model user here is the one with a reasonable bulge/bar. This dense component can be fit in this case because we start with a halo model with a core rather than a cusp (closer to α = 1 than to the α = 0.17 of NFW/LCDM).

Non-equilibrium dynamics in galaxies that appear to have lots of dark matter: ultrafaint dwarfs

This is a long post. It started focused on ultrafaint dwarfs, but can’t avoid more general issues. In order to diagnose non-equilibrium effects, we have to have some expectation for what equilibrium would be. The Tully-Fisher relation is a useful empirical touchstone for that. How the Tully-Fisher relation comes about is itself theory-dependent. These issues are intertwined, so in addition to discussing the ultrafaints, I also review some of the many predictions for Tully-Fisher, and how our theoretical expectation for it has evolved (or not) over time.

In the last post, we discussed how non-equilibrium dynamics might make a galaxy look like it had less dark matter than similar galaxies. That pendulum swings both ways: sometimes non-equilibrium effects might stir up the velocity dispersion above what it would nominally be. Some galaxies where this might be relevant are the so-called ultrafaint dwarfs (not to be confused with ultradiffuse galaxies, which are themselves often dwarfs). I’ve talked about these before, but more keep being discovered, so an update seems timely.

Galaxies and ultrafaint dwarfs

It’s a big universe, so there’s a lot of awkward terminology, and the definition of an ultrafaint dwarf is somewhat debatable. Most often I see them defined as having an absolute magnitude limit M_V > -8, which corresponds to a luminosity less than 100,000 suns. I’ve also seen attempts at something more physical, like being a “fossil” whose star formation was entirely before cosmic reionization, which ended way back at z ~ 6 so all the stars would be at least^{*&^#} 12.5 Gyr old. While such physics-based definitions are appealing, these are often tied up with theoretical projection: the UV photons that reionized the universe should have evaporated the gas in small dark matter halos, so these tiny galaxies can only be fossils from before that time. This thinking pervades much of the literature despite it being obviously wrong, as counterexamples^! exist. For example, Leo P is practically an ultrafaint dwarf by luminosity, but has ample gas (so a larger baryonic mass) and is currently forming stars.

A luminosity-based definition is good enough for us here; I don’t really care exactly where we make the cut. Note that ultrafaint is an appropriate moniker: a luminosity of 10⁵ L_☉ is tiny by galaxy standards. This is a low-grade globular cluster, and some ultrafaints are only a few hundred solar luminosities, which is barely even^# a star cluster. At this level, one has to worry about stochastic effects in stellar evolution. If there are only a handful of stars, the luminosity of the entire system changes markedly as a single star evolves up the red giant branch. Consequently, our mapping from observed quantities to stellar mass is extremely dodgy. For consistency, to compare with brighter dwarfs, I’ve adopted the same boilerplate M_*/L_V = 2 M_☉/L_☉. That makes for a fair comparison luminosity-to-luminosity, but the uncertainty in the actual stellar mass is ginormous.

It gets worse, as the ultrafaints that we know about so far are all very nearby satellites of the Milky Way. They are not discovered in the same way as other galaxies, where one plainly sees a galaxy on survey plates. For example, NGC 7757:

A faint galaxy in the night sky, surrounded by numerous distant star-like points. — *The spiral galaxy NGC 7757 as seen on plates of the Palomar Sky Survey.*

While bright, high surface brightness galaxies like NGC 7757 are easy to see, lower surface brightness galaxies are not. However, they can usually still be seen, if you know where to look:

A faint galaxy amidst numerous distant stars in a dark sky, illustrating the challenges of observing low surface brightness galaxies. — *UGC 1230 as seen on the Palomar Sky Survey. It’s in the middle.*

I like to use this pair as an illustration, as they’re about the same distance from us and about the same angular size on the sky – at least, once you crank up the gain for the low surface brightness UGC 1230:

Comparison of two astronomical images: the left side shows a spiral galaxy with visible structure and brightness, while the right side features a lower surface brightness galaxy, appearing more diffuse and less distinct. — Zoom in on deep CCD images of NGC 7757 (left) and UGC 1230 (right) with the contrast of the latter enhanced. The chief difference between the two is surface brightness – how spread out their stars are. They have a comparable physical diameter, they both have star forming regions that appear as knots in their spiral arms, etc. These galaxies are clearly distinct from the emptiness of the cosmic void around them, being examples of giant stellar systems that gave rise to the term “island universe.”

In contrast to objects that are obvious on the sky as independent island universes, ultrafaint dwarfs are often invisible to the eye. They are recognized as a subset of stars near each other on the sky that also share the same distance and direction of motion in a field that might otherwise be crowded with miscellaneous, unrelated stars. For example, here is Leo IV:

Wide field image of the Ultra-Faint Dwarf Galaxy Leo IV, featuring a zoomed-in view of its faint structure surrounded by numerous background stars and galaxies. — *The ultrafaint dwarf Leo IV as identified by the Sloan Digital Sky Survey and the Hubble Space Telescope.*

See it?

I don’t. I do see a number of background galaxies, including an edge-on spiral near the center of the square. Those are not the ultrafaint dwarf, which is some subset of the stars in this image. To decide which ones are potentially a part of such a dwarf, one examines the color magnitude diagram of all the stars to identify those that are consistent with being at the same distance, and assigns membership in a probabilistic way. It helps if one can also obtain radial velocities and/or proper motions for the stars to see which hang together – more or less – in phase space.

Part of the trick here is deciding what counts as hanging together. A strong argument in favor of these things residing in dark matter halos is that the velocity differences between the apparently-associated stars are too great for them to remain together for any length of time otherwise. This is essentially the same situation that confronted Zwicky in his observations of galaxies in clusters in the 1930s. Here are these objects that appear together in the sky, but they should fly apart unless bound together by some additional, unseen force. But perhaps some of these ultrafaints are not hanging together; they may be in the process of coming apart. Indeed, they may have so few stars because they are well down the path of dissolution.

Since one cannot see an ultrafaint dwarf in the same way as an island universe, I’ve heard people suggest that being bound by a dark matter halo be included in the definition of a galaxy. I see where they’re coming from, but find it unworkable. I know a galaxy when I see one. As did Hubble, as did thousands of other observers since, as can you when you look at the pictures above. It is absurd to make the definition of an object that is readily identifiable by visual inspection be contingent on the inferred presence of invisible stuff.

So are ultrafaints even galaxies? Yes and no. Some of the probabilistic identifications may be mere coincidences, not real objects. However, they can’t all be fakes, and I think that if you put them in the middle of intergalactic space, we would recognize them as galaxies – provided we could detect them at all. At present we can’t, but hopefully that situation will improve with the Rubin Observatory. In the meantime, what we have to work with are these fragmentary systems deep in the potential well of the seventy billion solar mass cosmic gorilla that is the Milky Way. We have to be cognizant that they might have gotten knocked around, as we can see in more massive systems like the Sagittarius dwarf. Of course, if they’ve gotten knocked around too much, then they shouldn’t be there at all. So how do these systems evolve under the influence of a comic gorilla?

Let’s start by looking at the size-mass diagram, as we did before. Ultrafaint dwarfs extend this relation to much lower mass, and also to rather small sizes – some approaching those of star clusters. They approximately follow a line of constant surface density, ~0.1 M_☉ pc^-2 (dotted line)..

A graph illustrating the size-mass relationship of galaxies, plotting effective radius (Re) against stellar mass (M*). Black squares represent data points of larger galaxies, while green squares indicate ultrafaint dwarfs. The dotted line suggests a correlation between size and mass. — *The size and stellar mass of Local Group dwarfs* as discussed previously, with the addition of ultrafaint dwarfs^$ (small gray squares).

This looks weird to me. All other types of galaxies scatter all over the place in this diagram. The ultrafaints are unique in following a tight line in the size-mass plane, and one that follows a line of constant surface brightness. Every element of my observational experience screams that this is likely to be an artifact. Given how these “galaxies” are identified as the loose association of a handful of stars, it is easy to imagine that this trend might be an artifact of how we define the characteristic size of a system that is essentially invisible. It might also arise for physical reasons to do with the cosmic gorilla; i.e., it is a consequence of dynamical evolution. So maybe this correlation is real, but the warning lights that it is not are flashing red.

The Baryonic Tully-Fisher relation as a baseline

Ideally, we would measure accelerations to test theories, particularly MOND. Here, we would need to use the size to estimate the acceleration, but I straight up don’t believe these sizes are physically meaningful. The stellar mass, dodgy as it is, seems robust by comparison. So we’ll proceed as if we know that much – which we don’t, really – but let’s at least try.

With the stellar mass (there is no gas in these things), we are halfway to constructing the baryonic Tully-Fisher relation (BTFR), which is the simplest test of the dynamics that we can make with the available data. The other quantity we need is the characteristic circular speed of the gravitational potential. For rotating galaxies, that is the flat rotation speed, V_f. For pressure supported dwarfs, what is usually measured is the velocity dispersion σ. We’ve previously established that for brighter dwarfs in the Local Group, a decent approximation is V_f = 2σ, so we’ll start by assuming that this should apply to the ultrafaints as well. This allows us to plot the BTFR:

A scatter plot showing the relationship between velocity (Vf in km/s) and baryonic mass (Mb in solar masses), with data points represented by different shapes and colors for various galaxy types. — The baryonic mass and characteristic circular speeds of both rotationally supported galaxies (circles) and pressure supported dwarfs (squares). The colored points follow the same baryonic Tully-Fisher relation (BTFR), but the data for low mass ultrafaint dwarfs (gray squares) *flattens ou*t, *having nearly the same characteristic speed over several decades in mass.*

The BTFR is an emprical relation of the form V_f ~ M_b^1/4 over about six decades in mass. Somewhere around the ultrafaint scale, this no longer appears to hold, with the observed velocity flattening out to become approximately constant for these lowest mass galaxies. I’m not sure this is real, as there many practical caveats to interpreting the observations. Measuring stellar velocities is straightforward but demanding at this level of accuracy. There are many potential systematics, pretty much all of which cause the intrinsic velocity dispersion to be overestimated. For example, observations made with multislit masks tend to return larger dispersions than observations of the same object with fibers. That’s likely because it is hard to build a mask so well that all of the stars perfectly hit the centers of the slitlets assigned to them; offsets within the slit shift the spectrum in a way that artificially adds to the apparent velocity dispersion. Fibers are less efficient in their throughput, but have the virtue of blending the input light in a way that precludes this particular systematic. Another concern is physical – some of the stars that are observed are presumably binaries, and some of the velocity will be due to motion within the binary pair and nothing to do with the gravitational potential of the larger system. This can be addressed with repeated observations to see if some velocities change, but it is hard to do that for each and every system, especially when it is way more fun to discover and explore new systems than follow up on the same one over and over and over again.

There are lots of other things that can go wrong. At some level, some of them probably do – that’s the nature of observational astronomy^&. While it seems likely that some of the velocity dispersions are systematically overestimated, it seems unlikely that all of them are. Let’s proceed as if the bulk of the data is telling us something, even if we treat individual objects with suspicion.

MOND

MOND makes a clear prediction for the BTFR of isolated galaxies: the baryonic mass goes as the fourth power of the flat rotation speed. Contrary to Newtonian expectation, this holds irrespective of surface brightness, which is what attracted my attention to the theory in the first place. So how does it do here?

A graph depicting the relationship between the flat rotation speed (Vf in km/s) and the baryonic mass (Mb in solar masses), showing data points for various galaxies, including ultrafaint dwarfs highlighted with unique markers. — *The same data as above with the addition of the line predicted by MOND (Milgrom 1983).*

Low surface density means low acceleration, so low surface brightness galaxies would make great tests of MOND if they were isolated. Oh, right – they already did. Repeatedly. MOND also correctly predicted the velocities of low mass, gas-rich dwarfs that were unknown when the prediction was made. These are highly nontrivial successes of the theory.

The ultrafaints we’re discussing here are not isolated, so they do not provide the clean tests that isolated galaxies provide. However, galaxies subject to external fields should have low velocities relative to the BTFR, while the ultrafaints have higher velocities. They’re on the wrong side of the relation! Taking this at face value (i.e., assuming equilibrium), MOND fails here.

Whenever MOND has a problem, it is widely seen as a success of dark matter. In my experience, this is rarely true: observations that are problematic for MOND usually don’t make sense in terms of dark matter either. For each observational test we also have to check how LCDM fares.

LCDM

How LCDM fares is often hard to judge because its predictions for the same phenomena are not always clear. Different people predict different things for the same theory. There have been lots of LCDM-based predictions made for both dwarf satellite galaxies and the Tully-Fisher relation. Too many, in fact – it is a practical impossibility to examine them all. Nevertheless, some common themes emerge if we look at enough examples.

The halo mass-velocity relation

The most basic prediction of LCDM is that the mass of a dark matter halo scales with the cube of the circular velocity of a test particle at the virial radius (conventionally taken to be the radius R₂₀₀ that encompasses an average density 200 times the critical density of the universe. If that sounds like gobbledygook to you, just read “halo” for “200”): M₂₀₀ ~ V₂₀₀³. This is a very basic prediction that everyone seems to agree to.

There is a tiny problem with testing this prediction: it refers to the dark matter halo that we cannot see. In order to test it, we have to introduce some scaling factors to relate the dark to the light. Specifically, M_b = f_d M₂₀₀ and V_f = f_v V₂₀₀, where f_d is the observed fraction of mass in baryons and f_v relates the observed flat velocity to the circular speed of our notional test particle at the virial radius. The obvious assumptions to make are that f_d is a constant (perhaps as much as but not more than the cosmic baryon fraction of 16%) and f_v is close to untiy. The latter requirement stems from the need for dark matter to explain the amplitude of the flat rotation speed, but f_v could be slightly different; plausible values range from 0.9 < f_v < 1.4. Values large than one indicate a rotation curve that declines before the virial radius is reached, which is the natural expectation for NFW halos.

Here is a worked example with f_d = 0.025 and f_v = 1:

A graph depicting the relationship between the flat rotation speed (Vf) in kilometers per second and the baryonic mass (Mb) in solar masses. The data points are shown with various markers, including gray squares, green squares, and blue circles, each representing different galaxy types, along with error bars. A solid gray line indicates a trend, while a dotted line marks a theoretical lower bound. — The same data as above with the addition of the nominal prediction of LCDM. The dotted line is the halo mass-circular velocity relation; the gray band is a simple model with f_d = 0.025 and f_v = 1 (e.g., *Mo, Mao, & White 1998)*.

I have illustrated the model with a fat grey line because f_d = 0.025 is an arbitrary choice^* I made to match the data. It could be more, it could be less. The detected baryon fraction can be anythings up to or less than the cosmic value, f_d < fb = 0.16 as not all of the baryons available in a halo cool and condense into cold gas that forms visible stars. That’s fine; there’s no requirement that all of the baryons have to become readily observable, but there is also no reason to expect all halos to cool exactly the same fraction of baryons. Naively one would expect at least some variation in f_d from halo to halo, so there could and probably should be a lot of scatter: the gray line could easily be a much wider band than depicted.

In addition to the rather arbitrary value of f_d, this reasoning also predicts a Tully-Fisher relation with the wrong slope. Picking a favorable value of f_d only matches the data over a narrow range of mass. It was nevertheless embraced for many years by many people. Selection effects bias samples to bright galaxies. Consequently, the literature is rife with TF samples dominated by galaxies with M_b > 10¹⁰ M_☉ (the top right corner of the plot above); with so little dynamic range, a slope of 3 looks fine. Once you look outside that tiny box, it does not look fine.

Personally, I think a slope of 3 is an oversimplification. That is the prediction for dark matter halos; there can be effects that vary systematically with mass. An obvious one is adiabatic compression, the effect by which baryons drag some dark matter along with them as they settle to the center of their halos. This increases f_v by an amount that depends on the baryonic surface density. Surface density correlates with mass, so I would nominally expect higher velocities in brighter galaxies; this drives up the slope. There are various estimates of this effect; typically one gets a slope like 3.3, not the observed 4. Worse, it predicts an additional effect: at a given mass, galaxies of higher surface brightness should also have higher velocity. Surface brightness should be a second parameter in the Tully-Fisher relation, but this is not observed.

The easiest way to reconcile the predicted and observed slopes are to make f_d a function of mass. Since Mb = f_d M₂₀₀ and M₂₀₀ ~ V₂₀₀³, Mb ~ f_d V₂₀₀³. Adopting f_v = 1 for simplicity, Mb ~ V_f⁴ follows if f_d ~ V_f. Problem solved, QED.

There are [at least] two problems with this argument. One is that the scaling f_d ~ V_f must hold perfectly without introducing any scatter. This is a fine-tuning problem: we need one parameter to vary precisely with an another, unrelated parameter. There is no good reason to expect this; we just have to insert the required dependence by hand. This is much worse than choosing an arbitrary value for f_d: now we’re making it a rolling fudge factor to match whatever we need it to. We can make it even more complicated by invoking some additional variation in f_v, but this just makes the fine-tuning worse as the product f_df_v^-3 has to vary just so. Another problem is that what we’re doing all this to adjust the prediction of one theory (LCDM) to match that of a different theory (MOND). It is never a good sign when we have to do that, whether we admit it or not.

Abundance matching

The reasoning leading to a slope 3 Tully-Fisher relation assumes a one-to-one relation between baryonic and halo mass (f_d = constant). This is an eminently reasonable assumption. We spent a couple of decades trying to avoid having to break this assumption. Once we do so and make f_d a freely variable parameter, then it can become a rolling fudge factor that can be adjusted to fit anything. Everyone agrees that is Bad. However, it might be tolerable if there is an independent way of estimating this variation. Rather than make f_d just be what we need it to be as described above, we can instead estimate it with abundance matching.

Abundance matching comes from equating the observed number density of galaxies as a function of mass with the number density of dark matter halos. This process gives f_d, or at least the stellar fraction, f_*, which is close to f_d for bright galaxies. Critically, it provides a way to assign dark matter halo masses to galaxies independently of their kinematics. This replaces an arbitrary, rolling fudge factor with a predictive theory.

Abundance matching models generically introduce curvature into the prediction for the BTFR. This stems from the mismatch in the shape of the galaxy stellar mass function (a Schechter function) and the dark halo mass function (a power law on galaxy scales). This leads to a bend in relations that map between visible and dark mass.

The transition from the M ~ V³ reasoning to abundance matching occurred gradually, but became pronounced circa 2010. There are many abundance matching models; I already faced the problem of the multiplicity of LCDM predictions when I wrote a lengthy article on the BTFR in 2012. To get specific, let’s start with an example from then, the model of Trujillo-Gomez-et al. (2011):

Scatter plot showing the relationship between gravitational potential flat rotation speed (Vf in km/s) and baryonic mass (Mb in solar masses). The plot features varying data points marked with blue circles, green squares, and gray squares, indicating different galaxy types or observational methods. A red curve is drawn, illustrating an empirical relationship fitting the data. — *The same data as above with the addition of the line predicted by LCDM in the model of Trujillo-Gomez-et al. (2011).*

One thing Trujillo-Gomez-et al. (2011) say in their abstract is “The data present a clear monotonic LV relation from ∼50 km s⁻¹ to ∼500 km s⁻¹, with a bend below ∼80 km s⁻¹“. By LV they mean luminosity-velocity, i.e., the regular Tully-Fisher relation. The bend they note is real; that’s what happens when you consider only the starlight and ignore the gas. The bend goes away if you include that gas. This was already known at the time – our original BTFR paper from 2000 has nearly a thousand citations, so it isn’t exactly obscure. Ignoring the gas is a choice that makes no sense empirically but makes a lot of sense from the perspective of LCDM simulations. By 2010, these had become reasonably good at matching the numbers of stars observed in galaxies, but the gas properties of simulated galaxies remained, hmmmmmmm, wanting. It makes sense to utilize the part that works. It makes less sense to pretend that this bend is something physically meaningful rather than an artifact of ignoring the gas. The pressure-supported dwarfs are all star dominated, so this distinction doesn’t matter here, and they follow the BTFR, not the stars-only version.

An old problem in galaxy formation theory is how to calibrate the number density of dark matter halos to that of observed galaxies. For a long time, a choice that people made was to match either the luminosity function or the kinematics. These didn’t really match up, so there was occasional discussion of the virtues and vices of the “luminosity function calibration” vs. the “Tully-Fisher calibration.” These differed by a factor of ~2. This tension between remains with us. Mostly simulations have opted to adopt the luminosity function calibration, updated and rebranded as abundance matching. Again, this makes sense from the perspective of LCDM simulations, because the number density of dark matter halos is something that simulations can readily quantify while the kinematics of individual galaxies are much harder to resolve^**.

The nonlinear relation between stellar mass and halo mass obtained from abundance matching inevitably introduces curvature into the corresponding Tully-Fisher relation predicted by such models. That’s what you see in the curved line of Trujillo-Gomez-et al. (2011) above. They weren’t the first to obtain such a result, and the certainly weren’t the last: this is a feature of LCDM with abundance matching, not a bug.

The line of Trujillo-Gomez-et al. (2011) matches the data pretty well at intermediate masses. It diverges to higher velocities at both small and large galaxy masses. I’ve written about this tension at high masses before; it appears to be real, but let’s concentrate on low masses here. At low masses, the velocity of galaxies with M_b < 10⁸ M_☉ appears to be overestimated. But the divergence between model and reality has just begun, and it is hard to resolve small things in simulations, so this doesn’t seem too bad. Yet.

Moving ahead, there are the “Latte” simulations of Wetzel et al. (2016) that use the well-regarded FIRE code to look specifically at simulated dwarfs, both isolated and satellites – specifically satellites of Milky Way-like systems. (Milky Way. Latte. Get it? Nerd humor.) So what does that find?

A graph displaying the relationship between circular velocity (Vf in km/s) and baryonic mass (Mb in solar masses), featuring various data points distinguished by shape and color, including gray squares, green squares, orange triangles, and blue circles to represent different types of galaxies. — *The same data as above with the addition of* simulated dwarfs (orange triangles) from the Latte LCDM simulation of Wetzel et al. (2016), specifically the simulated satellites in the top panel of their Fig. 3. Note that we plot V_f = 2σ for pressure supported systems, both real and simulated.

The individual simulated dwarf satellites of Wetzel et al. (2016) follow the extrapolation of the line predicted by Trujillo-Gomez-et al. (2011). To first order, it is the same result to higher resolution (i.e., smaller galaxy mass). Most of the simulated objects have velocity dispersions that are higher than observed in real galaxies. Intriguingly, there are a couple of simulated objects with M_* ~ 5 x 10⁶ M_☉ that fall nicely among the data where there are both star-dominated and gas-rich galaxies. However, these two are exceptions; the rule appears to be characteristic speeds that are higher than observed.

The lowest mass simulated satellite objects begin to approach the ultrafaint regime, but resolution continues to be an issue: they’re not really there yet. This hasn’t precluded many people from assuming that dark matter will work where MOND fails, which seems like a heck of a presumption given that MOND has been consistently more successful up until that point. Where MOND underpredicts the characteristic velocity of ultrafaints, LCDM hasn’t yet made a clear prediction, and it overpredicts velocities for objects of slightly larger mass. Ain’t no theory covering itself in glory here, but this is a good example where objects that are a problem for MOND are also a problem for dark matter, and it seems likely that non-equilibrium dynamics play a role in either case.

Comparing apples with apples

A persistent issue with comparing simulations to reality is extracting comparable measures. Where circular velocities are measured from velocity fields in rotating galaxies and estimated from measured velocity dispersions in pressure supported galaxies, the most common approach to deriving rotation curves from simulated objects is to sum up particles in spherical shells and assume V² = GM/R. These are not the same quantities. They should be proxies for one another, but equality holds only in the limit of isotropic orbits in spherical symmetry. Reality is messier than that, and simulations aren’t that simple either^%.

Sales et al. (2017) make the effort to make a better comparison between what is observed given how it is observed, and what the simulations would show for that quantity. Others have made a similar effort; a common finding is that the apparent rotation speeds of simulated gas disks do not trace the gravitational potential as simply as GM/R. That’s no surprise, but most simulated rotation curves do not look like those of real galaxies^{^}, so the comparison is not straightforward. Those caveats aside, Sales et al. (2017) are doing the right thing in trying to make an apples-to-apples comparison between simulated and observed quantities. They extract from simulations a quantity V_out that is appropriate for comparison with what we observe in the outer parts of rotation curves. So here is the resulting prediction for the BTFR:

A graph plotting the baryonic mass (Mb in solar masses) against the characteristic flat rotation speed (Vf in km/s) for various galaxies, showing a curve that describes the baryonic Tully-Fisher relation. The scatter points include different types of galaxies, with green squares indicating specific categories. — *The same data as above with the addition of the line predicted by LCDM in the model of* Sales et al. (2017), specifically the formula for V_out in their Table 2 which is *their proxy for the observable rotation speed.*

That’s pretty good. It still misses at high masses (those two big blue points at the top are Andromeda and the Milky Way) and it still bends away from the data at low masses where there are both star-dominated and gas-rich galaxies. (There are a lot more examples of the latter that I haven’t used here because the plot gets overcrowded.) Despite the overshoot, the use of an observable aspect of the simulations gets closer to the data, and the prediction flattens out in the same qualitative sense. That’s good, so one might see cause for hope that this problem is simply a matter of making a fair comparison between simulations and data. We should also be careful not to over-interpret it: I’ve simply plotted the formula they give; the simulations to which they fit it surely do not resolve ultrafaint dwarfs, so really the line should stop at some appropriate mass scale.

Nevertheless, it makes sense to look more closely at what is observed vs. what is simulated. This has recently been done in greater detail by Ruan et al. (2025). They consider two simulations that implement rather different feedback; both wind up producing rotating, gas rich dwarfs that actually fall on the BTFR.

Scatter plot illustrating the baryonic Tully-Fisher relation, showing the relationship between characteristic circular velocity (Vf) and baryonic mass (Mb) for various galaxy types, including data points for ultrafaint dwarfs. — *The same data as above with the addition of* simulated dwarfs of Ruan et al. (2025), specifically from the top right panel of their Fig. 6. The orange circles are their “massives” and the red triangles the “marvels” (the distinction refers to different feedback models).

Finally some success after all these years! Looking at this, it is tempting to declare victory: problem solved. It was just a matter of doing the right simulation all along, and making an apples-to-apples comparison with the data.

That sounds too goo to be true. Is it repeatable in other simulations? What works now that didn’t before?

These are high resolution simulations, but they still don’t resolve ultrafaints. We’re talking here about gas-rich dwarfs. That’s also an important topic, so let’s look more closely. What works now is in the apples-to-apples assessment: what we would measure for V_out is less than V_max (related to V₂₀₀) of the halo:

A graph displaying two panels: the top panel shows the relation between the ratio of mid-outward velocity to maximum velocity (Vout, mid / Vmax, mid) and the logarithm of baryonic mass (Mbar), with data points represented as circles and triangles. The bottom panel illustrates the relationship between the ratio of outer radius to maximum radius (Rout, mid / Rmax, mid) and the logarithm of baryonic mass, also featuring similar data points. — Two panels from Fig. 7 of *Ruan et al. (2025)* showing the ratio of the velocity we might observe relative to the characteristic circular velocity of the halo (top) and the ratio of the radii where these occur (bottom).

The treatment of cold gas in simulations has improved. In these simulations, V_out(R_out) is measured where the gas surface density falls to 1 M_☉ pc^-2, which is typical of many observations. But the true rotation curve is still rising for objects with M_b < a few x 10⁸ M_☉; it has not yet reached a value that is characteristic of the halo. So the apparent velocity is low, even if the dark matter halos are doing basically the same thing as before:

Graph showing the baryonic Tully-Fisher relation, with velocity Vf (km/s) plotted against baryonic mass Mb (solar masses). Data points include various galaxies and dwarf galaxies, with error bars indicating measurement uncertainties. A red line represents the best-fit relation. — As above, but with the addition of the true V_max *(small black dots*) of the simulated halos discussed by *Ruan et al. (2025)*, which follow the relation of *Sales et al. (2017)* (line for V_max in their Table 2).

I have mixed feelings about this. On the one hand, there are many dwarf galaxies with rising rotation curves that we don’t see flatten out, so it is easy to imagine they might keep going up, and I find it plausible that this is what we would find if we looked harder. So plausible that I’ve spend a fair amount of time doing exactly this. Not all observations terminate at 1 M_☉ pc^-2, and whenever we push further out, we see the same damn thing over and over: the rotation curve flattens out and stays flat^!!. That’s been my anecdotal experience; getting beyond that systematically is the point of the MOHNGOOSE survey. This was constructed to detect much lower atomic gas surface densities, and routinely detects gas at the 0.1 M_☉ pc^-2 level where Ruan et al. suggest we should see something closer to V_max. So far, we don’t.

I don’t want to sound too negative, because how we map what we predict in simulations to what we measure in observations is a serious issue. But it seems a bit of a stretch for a low-scatter power law BTFR to be the happenstance of observational sensitivity that cuts in at a convenient mass scale. So far, we see no indication of that in more sensitive observations. I’ll certainly let you know if that changes.

Survey says…

At this juncture, we’ve examined enough examples that the reader can appreciate my concern that LCDM models can predict rather different things. What does the theory really predict? We can’t really test it until we agree what it should do^!!!.

I thought it might be instructive to combine some of the models discussed above. It is.

Graph illustrating the correlation between the characteristic flat rotation speed (Vf) and baryonic mass (Mb) of galaxies. The plot features data points in different colors representing various galaxy types, with lines indicating theoretical trends and empirical relations. — Some of the LCDM predictions discussed above shown together. The dotted line to the right of the data is the halo mass-velocity relation, which is the one thing we all agree LCDM predicts but which is observationally inaccessible. The grey band is a *Mo, Mao, & White*-type model with f_d = 0.025. The red dotted line is the model of *Trujillo-Gomez-et al. (2011)*; the solid red line that of *Sales et al. (2017)* for V_max.

The models run together, more or less, for high mass galaxies. Thanks to observational selection effects, these are the objects we’ve always known about and matched our theories to. In order to test a theory, one wants to force it to make predictions in new regimes it wasn’t built for. Low mass galaxies do that, as do low surface brightness galaxies, which are often but not always low mass. MOND has done well for both, down to the ultrafaints we’re discussing here. LCDM does not yet explain those, or really any of the intermediate mass dwarfs.

What really disturbs me about LCDM models is their flexibility. It’s not just that they miss, it’s that it is possible to miss the data on either side of the BTFR. The older f_d = constant models predict velocities that are too low for low mass galaxies. The more recent abundance matching models predict velocities that are too high for low mass galaxies. I have no doubt that a model can be constructed that gets it right, because there is obviously enough flexibility to do pretty much anything. Adding new parameters until we get it right is an example of epicyclic thinking, as I’ve been pointing out for thirty years. I don’t know what could be worse for an idea like dark matter that is not falsifiable.

We still haven’t come anywhere close to explaining the ultrafaints in either theory. In LCDM, we don’t even know if we should draw a curved line that catches them as if they’re in equilibrium, or start from a power-law BTFR and look for departures from that due to tidal effects. Both are possible in LCDM, both are plausible, as is some combination of both. I expect theorists will pick an option and argue about it indefinitely.

Tidal effects

The typical velocity dispersion of the ultrafaint dwarfs is too high for them to be in equilibrium in MOND. But there’s also pretty much no way these tiny things could be in equilibrium, being in the rough neighborhood dominated by our home, the cosmic gorilla. That by itself doesn’t make an explanation; we need to work out what happens to such things as they evolve dynamically under the influence of a pronounced external field. To my knowledge, this hasn’t been addressed in detail in MOND any more than in LCDM, though Brada & Milgrom addressed some of the relevant issues.

There is a difference in approach required for the two theories. In LCDM, we need to increase the resolution of simulations to see what happens to the tiniest of dark matter halos and their resident galaxies within the larger dark matter halos of giant galaxies. In MOND we have to simulate the evolution along the orbit of each unique individual. This is challenging on multiple levels, as each possible realization of a MOND theory requires its own code. Writing a simulation code for AQUAL requires a different numerical approach than QUMOND, and those are both modifications of gravity via the Poisson euqation. We don’t know which might be closer to reality; heck, we don’t even know [yet] if MOND is a modification of gravity or intertia, the latter being even harder to code.

Cold dark matter is scale-free, so crudely I expect ultrafaint dwarfs in LCDM to do the same as larger dwarf satellites that have been simulated: their outer dark matter halos are gradually whittled away by tidal stripping for many Gyr. At first the stars are unaffected, but eventually so little dark matter is left that the stars start to be lost impulsively during pericenter passages. Though the dark matter is scale free, the stars and the baryonic physics that made them are not, so that’s where it gets tricky. The apparent dark-to-luminous mass ratio is huge, so one possibility is that the ultrafaints are in equilibrium despite their environment; they just made ridiculously few stars from the amount of mass available. That’s consistent with a wild extrapolation of abundance matching models, but how it comes about physically is less clear. For example, at some low mass, a galaxy would make so few stars that none are massive enough to result in a supernova, so there is no feedback, which is what is preventing too many stars from forming. Awkward. Alternately, the constant exposure to tidal perturbation might stir things up, with the velocity dispersion growing and stars getting stripped to form tidal streams, so they may have started as more massive objects. Or some combination of both, plus the evergreen possibility of things that don’t occur to me offhand.

Equilibrium for ultrafaint satellites is not an option in MOND, but tidal stirring and stripping is. As a thought experiment, let’s imagine what happens to a low mass dwarf typical of the field that falls towards the Milky Way from some large distance. Initially gas-rich, the first environmental effect that it is likely to experience is ram pressure stripping by the hot coronal gas around the Milky Way. That’s a baryonic effect that happens in either theory; it’s nothing to do with the effective law of gravity. A galaxy thus deprived of much of its mass will be out of equilibrium; its internal velocities will be typical of the original mass but the stripped mass is less. Consequently, its structure must adjust to compensate; perhaps dwarf Irregulars puff up and are transformed into dwarf Spheroidals in this way. Our notional infalling dwarf may have time to equilibrate to its new mass before being subject to strong tidal perturbation by the Milky Way, or it may not. If not, it will have characteristic internal velocities that are too high for its new mass, and reside above the BTFR. I doubt this suffices to explain [m]any of the ultrafaints, as their masses are so tiny that some stellar mass loss is also likely to have occurred.

Let’s suppose that our infalling dwarf has time to [approximately] equilibrate, or it simply formed nearby to begin with. Now it is a pressure supported system [more or less] on the BTFR. As it orbits the Milky Way, it feels an extra force from the external field. If it stays far enough out to remain in quasi-equilibrium in the EFE regime, then it will oscillate in size and velocity dispersion in phase with the strength of the external field it feels along its orbit.

If instead a satellite dips too close, it will be tidally disturbed and depart from equilibrium. The extra energy may stir it up, increasing its velocity dispersion. It doesn’t have the mass to sustain that, so stars will start to leak out. Tidal disruption will eventually happen, with the details depending on the initial mass and structure of the dwarf and on the eccentricity of its orbit, the distance of closest approach (pericenter), whether the orbit is prograde or retrograde relative to any angular momentum the dwarf may have… it’s complicated, so it is hard to generalize^##. Nevertheless, we (McGaugh & Wolf 2010) anticipated that “the deviant dwarfs [ultrafaints] should show evidence of tidal disruption while the dwarfs that adhere to the BTFR should not.” Unlike LCDM where most of the damage is done at closest approach, we anticipate for MOND that “stripping of the deviant dwarfs should be ongoing and not restricted to pericenter passage” because tides are stronger and there is no cocoon of dark matter to shelter the stars. The effect is still maximized at pericenter, its just not as impulsive as in the some of the dark matter simulations I’ve seen.

This means that there should be streams of stars all over the sky. As indeed there are. For example:

A color-coded map of the northern sky displaying various stellar streams, indicated by labels such as 'Gaia-1*', 'Gaia-3*', and 'GD-1'. The color gradient represents velocity in kilometers per second, with colors ranging from blue for lower velocities to red for higher velocities. — *Stellar streams in the Milky Way identified using Gaia (Malhan et al. 2018).*

As a tidally influence dwarf dissolves, the stars will leak out and form a trail. This happens in LCDM too, but there are differences in the rate, coherence, and symmetry of the resulting streams. Perhaps ultrafaint dwarfs are just the last dregs of the tidal disruption process. From this perspective, it hardly matters if they originated as external satellites or are internal star clusters: globular clusters native to the Milky Way should undergo a similar evolution.

Evolutionary tracks

Perhaps some of the ultrafaint dwarfs are the nuggets of disturbed systems that have suffered mass loss through tidal stripping. That may be the case in either LCDM or MOND, and has appealing aspects in either case – we went through all the possibilities in McGaugh & Wolf (2010). In MOND, the BTFR provides a reference point for what a stable system in equilibrium should do. That’s the starting point for the evolutionary tracks suggested here:

A graph plotting flat rotation speed (Vf) in km/s against baryonic mass (Mb) in solar masses. The data points include various galaxies represented as blue circles and green squares, with error bars indicating measurement uncertainty. A solid black line demonstrates the overall trend, while red curves suggest alternative theoretical predictions. — *BTFR with conceptual evolutionary tracks (red lines) for tidally-stirred ultrafaint dwarfs.*

Objects start in equilibrium on the BTFR. As they become subject to the external field, their velocity dispersions first decreases as they transition through the quasi-Newtonian regime. As tides kick in, stars are lost and stretched along the satellite’s orbit, so mass is lost but the apparent velocity dispersion increases as stars gradually separate and stretch out along a stream. Their relative velocities no longer represent a measure of the internal gravitational potential; rather than a cohesive dwarf satellite they’re more an association of stars in similar orbits around the Milky Way.

This is crudely what I imagine might be happening in some of the ultrafaint dwarfs that reside above the BTFR. Reality can be more complicated, and probably is. For example, objects that are not yet disrupted may oscillate around and below the BTFR before becoming completely unglued. Moreover, some individual ultrafaints probably are not real, while the data for others may suffer from systematic uncertainties. There’s a lot to sort out, and we’ve reached the point where the possibility of non-equilibrium effects cannot be ignored.

As a test of theories, the better course remains to look for new galaxies free from environmental perturbation. Ultrafaint dwarfs in the field, far from cosmic gorillas like the Milky Way, would be ideal. Hopefully many will be discovered in current and future surveys.

^!Other examples exist and continue to be discovered. More pertinent to my thinking is that the mass threshold at which reionization is supposed to suppress star formation has been a constantly moving goal post. To give an amusing anecdote, while I was junior faculty at the University of Maryland (so at least twenty years ago), Colin Norman called me up out of the blue. Colin is an expert on star formation, and had a burning question he thought I could answer. “Stacy,” he says as soon as I pick up, “what is the lowest mass star forming galaxy?” Uh, Hi, Colin. Off the cuff and totally unprepared for this inquiry, I said “um, a stellar mass of a few times 10⁷ solar masses.” Colin’s immediate response was to laugh long and loud, as if I had made the best nerd joke ever. When he regained his composure, he said “We know that can’t be true as reionization will prevent star formation in potential wells that small.” So, after this abrupt conversation, I did some fact-checking, and indeed, the number I had pulled out of my arse on the spot was basically correct, at that time. I also looked up the predictions, and of course Colin knew his business too; galaxies that small shouldn’t exist. Yet they do, and now the minimum known is two orders of magnitude lower in mass, with still no indication that a lower limit has been reached. So far, the threshold of our knowledge has been imposed by observational selection effects (low luminosity galaxies are hard to see), not by any discernible physics.

More recently, McQuinn et al. (2024) have made a study of the star formation histories of Leo P and a few similar galaxies that are near enough to see individual stars so as to work out the star formation rate over the course of cosmic history. They argue that there seems to be a pause in star formation after reionization, so a more nuanced version of the hypothesis may be that reionization did suppress star forming activity for a while, but these tiny objects were subsequently able to re-accrete cold gas and get started again. I find that appealing as a less simplistic thing that might have happened in the real universe, and not just a simple on/off switch that leaves only a fossil. However, it isn’t immediately clear to me that this more nuanced hypothesis should happen in LCDM. Once those baryons have evaporated, they’re gone, and it is far from obvious that they’ll ever come back to the weak gravity of such a small dark matter halo. It is also not clear to me that this interpretation, appealing as it is, is unique: the reconstructed star formation histories also look consistent with stochastic star formation, with fluctuations in the star formation rate being a matter of happenstance that have nothing to do with the epoch of reionization.

^#So how are ultrafaint dwarfs different from star clusters? Great question! Wish we had a great answer.

Some ultrafaints probably are star clusters rather than independent satellite galaxies. How do we tell the difference? Chiefly, the velocity dispersion: star clusters show no need for dark matter, while ultrafaint dwarfs generally appear to need a lot. This of course assumes that their measured velocity dispersions represent an equilibrium measure of their gravitational potential, which is what we’re questioning here, so the opportunity for circular reasoning is rife.

^$Rather than apply a strict luminosity cut, for convenience I’ve kept the same “not safe from tidal disruption” distinction that we’ve used before. Some of the objects in the 10⁵ – 10⁶ M_☉ range might belong more with the classical dwarfs than with the ultrafaints. This is a reminder that our nomenclature is terrible more than anything physically meaningful.

^&Astronomy is an observational science, not a laboratory science. We can only detect the photons nature sends our way. We cannot control all the potential systematics as can be done in an enclosed, finite, carefully controlled laboratory. That means there is always the potential for systematic uncertainties whose magnitude can be difficult to estimate, or sometimes to even be aware of, like how local variations impact Jeans analyses. This means we have to take our error bars with a grain of salt, often such a big grain as to make statistical tests unreliable: goodness of fit is only as meaningful as the error bars.

I say this because it seems to be the hardest thing for physicists to understand. I also see many younger astronomers turning the crank on fancy statistical machinery as if astronomical error bars can be trusted. Garbage in, garbage out.

^*This is an example of setting a parameter in a model “by hand.”

^**The transition to thinking in terms of the luminosity function rather than Tully-Fisher is so complete that the most recent, super-large, Euclid flagship simulation doesn’t even attempt to address the kinematics of individual galaxies while giving extraordinarily detailed and extensive details about their luminosity distributions. I can see why they’d do that – they want to focus on what the Euclid mission might observe – but it is also symptomatic of the growing tendency to I’ve witnessed to just not talk about those pesky kinematics.

^%Halos in dark matter simulations tend to be rather triaxial, i.e., a 3D bloboid that is neither spherical like a soccer ball nor oblate like a frisbee nor prolate like an American football: each principle axis has a different length. If real halos were triaxial, it would lead to non-circular orbits in dark matter-dominated galaxies that are not observed.

The triaxiality of halos is a result from dark matter-only simulations. Personally, I suspect that the condensation of gas within a dark matter halo (presuming such things exist) during the process of galaxy formation rounds-out the inner halo, making it nearly spherical where we are able to make measurements. So I don’t see this as necessarily a failure of LCDM, but rather an example of how more elaborate simulations that include baryonic physics are sometimes warranted. Sometimes. There’s a big difference between this process, which also compresses the halo (making it more dense when it already starts out too dense), and the various forms of feedback, which may or may not further alter the structure of the halo.

^{^}There are many failure modes in simulated rotation curves, the two most common being the cusp-core problem in dwarfs and sub-maximal disks in giants. It is common for the disks of bright spiral galaxies to be nearly maximal in the sense that the observed stars suffice to explain the inner rotation curve. They may not be completely maximal in this sense, but they come close for normal stellar populations. (Our own Milky Way is a good example.) In contrast, many simulations produce bright galaxies that are absurdly sub-maximal; EAGLE and SIMBA being two examples I remember offhand.

Another common problem is that LCDM simulations often don’t produce rotation curves that are as flat as observed. This was something I also found in my early attempts at model-building with dark matter halos. It is easy to fit a flat rotation curve given the data, but it is hard to predict a priori that rotation curves should be flat.

^!!Gravitational lensing indicates that rotation curves remain flat to even larger radii. However, these observations are only sensitive to galaxies more massive than those under discussion here. So conceivably there could be another coincidence wherein flatness persists for galaxies with M_b > 10¹⁰ M_☉, but not those with M_b < 10⁹ M_☉.

^!!!Many in the community seem to agree that it will surely work out.

^##I’ve tried to estimate dissolution timescales, but find the results wanting. For plausible assumptions, one finds timescales that seem plausible (a few Gyr) but with some minor fiddling one can also find results that are no-way that’s-too-short (a few tens of millions of years), depending on the dwarf and its orbit. These are crude analytic estimates; I’m not satisfied that these numbers were particularly meaningful. Still, this is a worry with the tidal-stirring hypothesis: will perturbed objects persist long enough to be observed as they are? This is another reason we need detailed simulations tailored to each object.

^{*&^#}Note added after initial publication: While I was writing this, a nice paper appeared on exactly this issue of the star formation history of a good number of ultrafaint dwarfs. They find that 80% of the stellar mass formed 12.48 ± 0.18 Gyr ago, so 12.5 was a good guess. Formally, at the one sigma level, this is a little after reionization, but only a tiny bit, so close enough: the bulk of the stars formed long ago, like a classical globular cluster, and these ultrafaints are consistent with being fossils.

Intriguingly, there is a hint of an age difference by kinematic grouping, with things that have been in the Milky Way being the oldest, those on first infall being a little younger (but still very old), and those infalling with the Large Magellanic Cloud a tad younger still. If so, then there is more to the story than quenching by cosmic reionization.

They also show a nice collection of images so you can see more examples. The ellipses trace out the half-light radii, so can see the proclivity for many (not all!) of these objects to be elongated, perhaps as a result of tidal perturbation:

**Figure 2** from Durbin et al. (2025): *Footprints of all HST observations (blue filled patches) overlaid on DSS2 imaging cutouts. Open black ellipses show the galaxy profiles at one half-light radius.*

The Deuterium-Lithium tension in Big Bang Nucleosynthesis

There are many tensions in the era of precision cosmology. The most prominent, at present, is the Hubble tension – the difference between traditional measurements, which consistently obtain H₀ = 73 km/s/Mpc, and best fit* to the acoustic power spectrum of the cosmic microwave background (CMB) observed by Planck, H₀ = 67 km/s/Mpc. There are others of varying severity that are less widely discussed. In this post, I want to talk about a persistent tension in the baryon density implied by the measured primordial abundances of deuterium and lithium⁺. Unlike the tension in H₀, this problem is not nearly as widely discussed as it should be.

Framing

Part of the reason that this problem is not seen as an important tension has to do with the way in which it is commonly framed. In most discussions, it is simply the primordial lithium problem. Deuterium agrees with the CMB, so those must be right and lithium must be wrong. Once framed that way, it becomes a trivial matter specific to one untrustworthy (to cosmologists) observation. It’s a problem for specialists to sort out what went wrong with lithium: the “right” answer is otherwise known, so this tension is not real, making it unworthy of wider discussion. However, as we shall see, this might not be the right way to look at it.

It’s a bit like calling the acceleration discrepancy the dark matter problem. Once we frame it this way, it biases how we see the entire problem. Solving this problem becomes a matter of finding the dark matter. It precludes consideration of the logical possibility that the observed discrepancies occur because the force law changes on the relevant scales. This is the mental block I struggled mightily with when MOND first cropped up in my data; this experience makes it easy to see when other scientists succumb to it sans struggle.

Big Bang Nucleosynthesis (BBN)

I’ve talked about the cosmic baryon density here a lot, but I’ve never given an overview of BBN itself. That’s because it is well-established, and has been for a long time – I assume you, the reader, already know about it or are competent to look it up. There are many good resources for that, so I’ll only give enough of a sketch necessary to the subsequent narrative – a sketch that will be both too little for the experts and too much for the subsequent narrative that most experts are unaware of.

Primordial nucleosynthesis occurs in the first few minutes after the Big Bang when the universe is the right temperature and density to be one big fusion reactor. The protons and available neutrons fuse to form helium and other isotopes of the light elements. Neutrons are slightly more massive and less numerous than protons to begin with. In addition, free neutrons decay with a half-life of roughly ten minutes, so are outnumbered by protons when nucleosynthesis happens. The vast majority of the available neutrons pair up with protons and wind up in ⁴He while most of the protons remain on their own as the most common isotope of hydrogen, ¹H. The resulting abundance ratio is one alpha particle for every dozen protons, or in terms of mass fractions^&, X_p = 3/4 hydrogen and Y_p = 1/4 helium. That is the basic composition with which the universe starts; heavy elements are produced subsequently in stars and supernova explosions.

Though ¹H and ⁴He are by far the most common products of BBN, there are traces of other isotopes that emerge from BBN:

The time evolution of the relative numbers of light element isotopes through BBN. As the universe expands, nuclear reactions “freeze-out” and establish primordial abundances for the indicated species. The precise outcome depends on the baryon density, Ω_b. This plot illustrates a particular choice of *Ω_b*; different *Ω_b* result in observationally distinguishable abundances. (Figures like this are so ubiquitous in discussions of the early universe that I have not been able to identify the original citation for this particular version.)

After hydrogen and helium, the next most common isotope to emerge from BBN is deuterium, ²H. It is the first thing made (one proton plus one neutron) but most of it gets processed into ⁴He, so after a brief peak, its abundance declines. How much it declines is very sensitive to Ω_b: the higher the baryon density, the more deuterium gets gobbled up by helium before freeze-out. The following figure illustrates how the abundance of each isotope depends on Ω_b:

“Schramm diagram” adopted from Cyburt et al (2003) showing the abundance of ⁴He by mass fraction (top) and the number relative to hydrogen of deuterium (D = ²H), helium-3, and lithium as a function of the baryon-to-photon ratio. We measure the photon density in the CMB, so this translates directly to the baryon density^$ *Ω_b*h² (top axis).

If we can go out and measure the primordial abundances of these various isotopes, we can constrain the baryon density.

The Baryon Density

It works! Each isotope provides an independent estimate of Ω_bh², and they agree pretty well. This was the first and for a long time the only over-constrained quantity in cosmology. So while I am going to quibble about the exact value of Ω_bh², I don’t doubt that the basic picture is correct. There are too many details we have to get right in the complex nuclear reaction chains coupled to the decreasing temperature of a universe expanding at the rate required during radiation domination for this to be an accident. It is an exquisite success of the standard Hot Big Bang cosmology, albeit not one specific to LCDM.

Getting at primordial, rather than current, abundances is an interesting observational challenge too involved to go into much detail here. Suffice it to say that it can be done, albeit to varying degrees of satisfaction. We can then compare the measured abundances to the theoretical BBN abundance predictions to infer the baryon density.

The Schramm diagram with measured abundances (orange boxes) for the isotopes of the light elements. The thickness of the box illustrates the uncertainty: tiny for deuterium and large for *⁴He* because of the large zoom on the axis scale. The lithium abundance could correspond to either low or high baryon density. ³He is omitted because its uncertainty is too large to provide a useful constraint.

Deuterium is considered the best baryometer because its relic abundance is very sensitive to Ω_bh²: a small change in baryon density corresponds to a large change in D/H. In contrast, ⁴He is a great confirmation of the basic picture – the primordial mass fraction has to come in very close to 1/4 – but the precise value is not very sensitive to Ω_bh². Most of the neutrons end up in helium no matter what, so it is hard to distinguish^# a few more from a few less. (Note the huge zoom on the linear scale for ⁴He. If we plotted it logarithmically with decades of range as we do the other isotopes, it would be a nearly flat line.) Lithium is annoying for being double-valued right around the interesting baryon density so that the observed lithium abundance can correspond to two values of Ω_bh². This behavior stems from the trade off with ⁷Be which is produced at a higher rate but decays to ⁷Li after a few months. For this discussion the double-valued ambiguity of lithium doesn’t matter, as the problem is that the deuterium abundance indicates Ω_bh² that is even higher than the higher branch of lithium.

BBN pre-CMB

The diagrams above and below show the situation in the 1990s before CMB estimates became available. Consideration of all the available data in the review of Walker et al. led to the value Ω_bh² = 0.0125 ± 0.0025. This value** was so famous that it was Known. It formed the basis of my predictions for the CMB for both LCDM and no-CDM. This prediction hinged on BBN being correct, and that we understood the experimental bounds on the baryon density. A few years after Walker’s work, Copi et al. provided the estimate⁺⁺ 0.009 < Ω_bh² < 0.02. Those were the extreme limits of the time, as illustrated by the green box below:

The baryon density as it was known before detailed observations of the acoustic power spectrum of the CMB. BBN was a mature subject before 1990; the massive reviews of Walker et al. and Copi et al. creak with the authority of a solved problem. The controversial tension at the time was between the high and low deuterium measurements from Hogan and Tytler, which were at the extreme ends of the ranges indicated by the bulk of the data in the reviews.

Up until this point, the constraints on BBN had come mostly from helium observations in nearby galaxies and lithium measurements in metal poor stars. It was only just then becoming possible to obtain high quality spectra of sufficiently high redshift quasars to see weak deuterium lines associated with strongly damped primary hydrogen absorption in intergalactic gas along the line of sight. This is great: deuterium is the most sensitive baryometer, the redshifts were high enough to be early in the history of the universe close to primordial times, and the gas was in the middle of intergalactic nowhere so shouldn’t be altered by astrophysical processes. These are ideal conditions, at least in principle.

First results were binary. Craig Hogan obtained a high deuterium abundance, corresponding to a low baryon density. Really low. From my Walker et al.-informed confirmation bias, too low. It was a a brand new result, so promising but probably wrong. Then Tytler and his collaborators came up with the opposite result: low deuterium abundance corresponding to a high baryon density: Ω_bh² = 0.019 ± 0.001. That seemed pretty high at the time, but at least it was within the bound Ω_bh² < 0.02 set by Copi et al. There was a debate between these high/low deuterium camps that ended in a rare act of intellectual honesty by a cosmologist when Hogan^&& conceded. We seemed to have settled on the high-end of the allowed range, just under Ω_bh² = 0.02.

Enter the CMB

CMB data started to be useful for constraining the baryon density in 2000 and improved rapidly. By that point, LCDM was already well-established, and I had published predictions for both LCDM and no-CDM. In the absences of cold dark matter, one expects a damping spectrum, with each peak lower than the one before it. For the narrow (factor of two) Known range of possible baryon densities, all the no-CDM models run together to essentially the same first-to-second peak ratio.

Peak locations measured by WMAP in 2003 (points) compared to the a priori (1999) predictions of LCDM (red tone lines) and no-CDM (blue tone lines). Models are normalized in amplitude around the first peak.

Adding CDM into the mix adds a driver to the oscillations. This fights the baryonic damping: the CDM is like a parent pushing a swing while the baryons are the kid dragging his feet. This combination makes just about any pattern of peaks possible. Not all free parameters are made equal: the addition of a single free parameter, Ω_CDM, makes it possible to fit any plausible pattern of peaks. Without it (no-CDM means Ω_CDM = 0), only the damping spectrum is allowed.

For BBN as it was known at the time, the clear difference was in the relative amplitude^$$ of the first and second peaks. As can be seen above, the prediction for no-CDM was correct and that for LCDM was not. So we were done, right?

Of course not. To the CMB community, the only thing that mattered was the fit to the CMB power spectrum, not some obscure prediction based on BBN. Whatever the fit said was True; too bad for BBN if it didn’t agree.

The way to fit the unexpectedly small^## second peak was to crank up the baryon density. To do that, Tegmark & Zaldarriaga (2000) needed 0.022 < Ω_bh² < 0.040. That’s what the first blue point below. This was the first time that I heard it suggested that the baryon density could be so high.

The baryon density from deuterium (red triangles) before and after (dotted vertical line) estimates from the CMB (blue points). The horizontal dotted line is the pre-CMB upper limit of *Copi et al.*

The astute reader will note that the CMB-fit 0.022 < Ω_bh² < 0.040 sits entirely outside the BBN bounds 0.009 < Ω_bh² < 0.02. So we’re done, right? Well, no – the community simply ignored the successful a priori prediction of the no-CDM scenario. That was certainly easier than wrestling with its implications, and no one seems to have paused to contemplate why the observed peak ratio came in exactly at the one unique value that it could obtain in the case of no-CDM.

For a few years, the attitude seemed to be that BBN was close but not quite right. As the CMB data improved, the baryon density came down, ultimately settling on Ω_bh² = 0.0224 ± 0.0001. Part of the reason for this decline from the high initial estimate is covariance. In this case, the tilt plays a role: the baryon density declined as n_s = 1 → 0.965 ± 0.004. Getting the second peak amplitude right takes a combination of both.

Now we’re back in the ballpark, almost: Ω_bh² = 0.0224 is not ridiculously far above the BBN limit Ω_bh² < 0.02. Close enough for Spergel et al. (2003) to say “The remarkable agreement between the baryon density inferred from D/H values and our [WMAP] measurements is an important triumph for the basic big bang model.” This was certainly true given the size of the error bars on both deuterium and the CMB at the time. It also elides^*** any mention of either helium or lithium or the fact that the new Known was not consistent with the previous Known. Ω_bh² = 0.0224 was always the ally; Ω_bh² = 0.0125 was always the enemy.

Note, however, that deuterium made a leap from below Ω_bh² = 0.02 to above 0.02 exactly when the CMB indicated that it should do so. They iterated to better agreement and pretty much stayed there. Hopefully that is the correct answer, but given the history of the field, I can’t help worrying about confirmation bias. I don’t know if that is what’s going on, but if it were, this convergence over time is what it would look like.

Lithium does not concur

Taking the deuterium results at face value, there really is excellent agreement with the LCDM fit to the CMB, so I have some sympathy for the desire to stop there. Deuterium is the best baryometer, after all. Helium is hard to get right at a precise enough level to provide a comparable constraint, and lithium, well, lithium is measured in stars. Stars are tiny, much smaller than galaxies, and we know those are too puny to simulate.

Spite & Spite (1982) [those are names, pronounced “speet”; we’re not talking about spiteful stars] discovered what is now known as the Spite plateau, a level of constant lithium abundance in metal poor stars, apparently indicative of the primordial lithium abundance. Lithium is a fragile nucleus; it can be destroyed in stellar interiors. It can also be formed as the fragmentation product of cosmic ray collisions with heavier nuclei. Both of these things go on in nature, making some people distrustful of any lithium abundance. However, the Spite plateau is a sort of safe zone where neither effect appears to dominate. The abundance of lithium observed there is indeed very much in the right ballpark to be a primordial abundance, so that’s the most obvious interpretation.

Lithium indicates a lowish baryon density. Modern estimates are in the same range as BBN of old; they have not varied systematically with time. There is no tension between lithium and pre-CMB deuterium, but it disagrees with LCDM fits to the CMB and with post-CMB deuterium. This tension is both persistent and statistically significant (Fields 2011 describes it as “4–5σ”).

*The baryon density from lithium (yellow symbols) over time. Stars are measurements in groups of stars on the Spite plateau; the square represents the approximate value from the ISM of the SMC.*

I’ve seen many models that attempt to fix the lithium abundance, e.g., by invoking enhanced convective mixing via <<mumble mumble>> so that lithium on the surface of stars is subject to destruction deep in the stellar interior in a previously unexpected way. This isn’t exactly satisfactory – it should result in a mess, not a well-defined plateau – and other attempts I’ve seen to explain away the problem do so with at least as much contrivance. All of these models appeared after lithium became a problem; they’re clearly motivated by the assumption bias that the CMB is correct so the discrepancy is specific to lithium so there must be something weird about stars that explains it.

Another way to illustrate the tension is to use Ω_bh² from the Planck fit to predict what the primordial lithium abundance should be. The Planck-predicted band is clearly higher than and offset from the stars of the Spite plateau. There should be a plateau, sure, but it’s in the wrong place.

The lithium abundance in metal poor stars (points), the interstellar medium of the Small Magellanic Cloud (green band), and the primordial lithium abundance expected for the best-fit Planck LCDM. For reference, *[Fe/H] = -3* means an iron abundance that is one one-thousandth that of the sun.

An important recent observation is that a similar lithium abundance is obtained in the metal poor interstellar gas of the Small Magellanic Cloud. That would seem to obviate any explanation based on stellar physics.

The Schramm diagram with the Planck CMB-LCDM value added (vertical line). This agrees well with deuterium measurements made after CMB data became available, but not with those before, nor with the measured abundance of lithium.

We can also illustrate the tension on the Schramm diagram. This version adds the best-fit CMB value and the modern deuterium abundance. These are indeed in excellent agreement, but they don’t intersect with lithium. The deuterium-lithium tension appears to be real, and comparable in significance to the H₀ tension.

So what’s the answer?

I don’t know. The logical options are

A systematic error in the primordial lithium abundance
A systematic error in the primordial deuterium abundance
Physics beyond standard BBN

I don’t like any of these solutions. The data for both lithium and deuterium are what they are. As astronomical observations, both are subject to the potential for systematic errors and/or physical effects that complicate their interpretation. I am also extremely reluctant to consider modifications to BBN. There are occasional suggestions to this effect, but it is a lot easier to break than it is to fix, especially for what is a fairly small disagreement in the absolute value of Ω_bh².

I have left the CMB off the list because it isn’t part of BBN: it’s constraint on the baryon density is real, but involves completely different physics. It also involves different assumptions, i.e., the LCDM model and all its invisible baggage, while BBN is just what happens to ordinary nucleons during radiation domination in the early universe. CMB fits are corroborative of deuterium only if we assume LCDM, which I am not inclined to accept: deuterium disagreed with the subsequent CMB data before it agreed. Whether that’s just progress or a sign of confirmation bias, I also don’t know. But I do know confirmation bias has bedeviled the history of cosmology, and as the H0 debate shows, we clearly have not outgrown it.

The appearance of confirmation bias is augmented by the response time of each measured elemental abundance. Deuterium is measured using high redshift quasars; the community that does that work is necessarily tightly coupled to cosmology. It’s response was practically instantaneous: as soon as the CMB suggested that the baryon density needed to be higher, conforming D/H measurements appeared. Indeed, I recall when that first high red triangle appeared in the literature, a colleague snarked to me “we can do that too!” In those days, those of us who had been paying attention were all shocked at how quickly Ω_bh² = 0.0125 ± 0.0025 was abandoned for literally double that value, Ω_Bh² = 0.025 ± 0.001. That’s 4.6 sigma for those keeping score.

The primordial helium abundance is measured in nearby dwarf galaxies. That community is aware of cosmology, but not as strongly coupled to it. Estimates of the primordial helium abundance have drifted upwards over time, corresponding to higher implied baryon densities. It’s as if confirmation bias is driving things towards the same result, but on a timescale that depends on the sociological pressure of the CMB imperative.

**Fig. 8** from Steigman (2012) *showing the history of primordial helium mass fraction (Y_P) determinations as a function of time.*

I am not accusing anyone of trying to obtain a particular result. Confirmation bias can be a lot more subtle than that. There is an entire field of study of it in psychology. We “humans actively sample evidence to support prior beliefs” – none of us are immune to it.

In this case, how we sample evidence depends on the field we’re active in. Lithium is measured in stars. One can have a productive career in stellar physics while entirely ignoring cosmology; it is the least likely to be perturbed by edicts from the CMB community. The inferred primordial lithium abundance has not budged over time.

What’s your confirmation bias?

I try not to succumb to confirmation bias, but I know that’s impossible. The best I can do is change my mind when confronted with new evidence. This is why I went from being sure that non-baryonic dark matter had to exist to taking seriously MOND as the theory that predicted what I observed.

I do try to look at things from all perspectives. Here, the CMB has been a roller coaster. Putting on an LCDM hat, the location of the first peak came in exactly where it was predicted: this was strong corroboration of a flat FLRW geometry. What does it mean in MOND? No idea – MOND doesn’t make a prediction about that. The amplitude of the second peak came in precisely as predicted for the case of no-CDM. This was corroboration of the ansatz inspired by MOND, and the strongest possible CMB-based hint that we might be barking up the wrong tree with LCDM.

As an exercise, I went back and maxed out the baryon density as it was known before the second peak was observed. We already thought we knew LCDM parameters well enough to do this. We couldn’t. The amplitude of the second peak came as a huge surprise to LCDM; everyone acknowledged that at the time (if pressed; many simply ignored it). Nowadays this is forgotten, or people have gaslit themselves into believing this was expected all along. It was not.

**Fig. 45** from Famaey & McGaugh (2012): *WMAP data are shown with the a priori prediction of no-CDM (blue line) and the* *most favorable *prediction* *that could have been made ahead of time for* LCDM (red line).*

From the perspective of no-CDM, we don’t really care whether deuterium or lithium hits closer to the right baryon density. All plausible baryon densities predict essentially the same A_1:2 amplitude ratio. Once we admit CDM as a possibility, then the second peak amplitude becomes very sensitive to the mix of CDM and baryons. From this perspective, the lithium-indicated baryon density is unacceptable. That’s why it is important to have a test that is independent of the CMB. Both deuterium and lithium provide that, but they disagree about the answer.

Once we broke BBN to fit the second peak in LCDM, we were admitting (if not to ourselves) that the a priori prediction of LCDM had failed. Everything after that is a fitting exercise. There are enough free parameters in LCDM to fit any plausible power spectrum. Cosmologists are fond of saying there are thousands of independent multipoles, but that overstates the case: it doesn’t matter how finely we sample the wave pattern, it matters what the wave pattern is. That is not as over-constrained as it is made to sound. LCDM is, nevertheless, an excellent fit to the CMB data; the test then is whether the parameters of this fit are consistent with independent measurements. It was until it wasn’t; that’s why we face all these tensions now.

Despite the success of the prediction of the second peak, no-CDM gets the third peak wrong. It does so in a way that is impossible to fix short of invoking new physics. We knew that had to happen at some level; empirically that level occurs at L = 600. After that, it becomes a fitting exercise, just as it is in LCDM – only now, one has to invent a new theory of gravity in which to make the fit. That seems like a lot to ask, so while it remained as a logical possibility, LCDM seemed the more plausible explanation for the CMB if not dynamical data. From this perspective, that A_1:2 came out bang on the value predicted by no-CDM must just be one heck of a cosmic fluke. That’s easy to accept if you were unaware of the prediction or scornful of its motivation; less so if you were the one who made it.

Either way, the CMB is now beyond our ability to predict. It has become a fitting exercise, the chief issue being what paradigm in which to fit it. In LCDM, the fit follows easily enough; the question is whether the result agrees with other data: are these tensions mere hiccups in the great tradition of observational cosmology? Or are they real, demanding some new physics?

The widespread attitude among cosmologists is that it will be impossible to fit the CMB in any way other than LCDM. That is a comforting thought (it has to be CDM!) and for a long time seemed reasonable. However, it has been contradicted by the success of Skordis & Zlosnik (2021) using AeST, which can fit the CMB as well as LCDM.

AeST is a very important demonstration that one does not need dark matter to fit the CMB. One does need other fields⁺⁺⁺, so now the reality of those have to be examined. Where this show stops, nobody knows.

I’ll close by noting that the uniqueness claimed by the LCDM fit to the CMB is a property more correctly attributed to MOND in galaxies. It is less obvious that this is true because it is always possible to fit a dark matter model to data once presented with the data. That’s not science, that’s fitting French curves. To succeed, a dark matter model must “look like” MOND. It obviously shouldn’t do that, so modelers refuse to go there, and we continue to spin our wheels and dig the rut of our field deeper.

Note added in proof, as it were: I’ve been meaning to write about this subject for a long time, but hadn’t, in part because I knew it would be long and arduous. Being deeply interested in the subject, I had to slap myself repeatedly to refrain from spending even more time updating the plots with publication date as an axis: nothing has changed, so that would serve only to feed my OCD. Even so, it has taken a long time to write, which I mention because I had completed the vast majority of this post before the IAU announced on May 15 that Cooke & Pettini have been awarded the Gruber prize for their precision deuterium abundance. This is excellent work (it is one of the deuterium points in the relevant plot above), and I’m glad to see this kind of hard, real-astronomy work recognized.

The award of a prize is a recognition of meritorious work but is not a guarantee that it is correct. So this does not alter any of the concerns that I express here, concerns that I’ve expressed for a long time. It does make my OCD feels obliged to comment at least a little on the relevant observations, which is itself considerably involved, but I will tack on some brief discussion below, after the footnotes.

*These methods were in agreement before they were in tension, e.g., Spergel et al. (2003) state: “The agreement between the HST Key Project value and our [WMAP CMB] value, h = 0.72 ±0.05, is striking, given that the two methods rely on different observables, different underlying physics, and different model assumptions.”

⁺Here I mean the abundance of the primary isotope of lithium, ⁷Li. There is a different problem involving the apparent overabundance of ⁶Li. I’m not talking about that here; I’m talking about the different baryon densities inferred separately from the abundances of D/H and ⁷Li/H.

^&By convention, X, Y, and Z are the mass fractions of hydrogen, helium, and everything else. Since the universe starts from a primordial abundance of X_p = 3/4 and Y_p = 1/4, and stars are seen to have approximately that composition plus a small sprinkling of everything else (for the sun, Z ≈ 0.02), and since iron lines are commonly measured in stars to trace Z, astronomers fell into the habit of calling Z the metallicity even though oxygen is the third most common element in the universe today (by both number and mass). Since everything in the periodic table that isn’t hydrogen and helium is a small fraction of the mass, all the heavier elements are often referred to collectively as metals despite the unintentional offense to chemistry.

^$The factor of h² appears because of the definition of the critical density ρ_c = (3H₀²)/(8πG): Ω_b = ρ_b/ρ_c. The physics cares about the actual density ρ_b but Ω_bh² = 0.02 is a lot more convenient to write than ρ_b,now = 3.75 x 10^-31 g/cm³.

^#I’ve worked on helium myself, but was never able to do better than Y_p = 0.25 ± 0.01. This corroborates the basic BBN picture, but does not suffice as a precise measure of the baryon density. To do that, one must obtain a result accurate to the third place of decimals, as discussed in the exquisite works of Kris Davidson, Bernie Pagel, Evan Skillman, and their collaborators. It’s hard to do for both observational reasons and because a wealth of subtle atomic physics effects come into play at that level of precision – helium has multiple lines; their parent population levels depend on the ionization mechanism, the plasma temperature, its density, and fluorescence effects as well as abundance.

**The value reported by Walker et al. was phrased as Ω_bh₅₀² = 0.05 ± 0.01, where h₅₀ = H₀/(50 km/s/Mpc); translating this to the more conventional h = H₀/(100 km/s/Mpc) decreases these numbers by a factor of four and leads to the impression of more significant digits than were claimed. It is interesting to consider the psychological effect of this numerology. For example, the modern CMB best-fit value in this phrasing is Ω_bh₅₀² = 0.09, four sigma higher than the value Known from the combined assessment of the light isotope abundances. That seems like a tension – not just involving lithium, but the CMB vs. all of BBN. Amusingly, the higher baryon density needed to obtain a CMB fit assuming LCDM is close to the threshold where we might have gotten away without the dynamical need (Ω_m > Ω_b) for non-baryonic dark matter that motivated non-baryonic dark matter in the first place. (For further perspective at a critical juncture in the development of the field, see Peebles 1999).

The use of h₅₀ itself is an example of the confirmation bias I’ve mentioned before as prevalent at the time, that Ω_m = 1 and H₀ = 50 km/s/Mpc. I would love to be able to do the experiment of sending the older cosmologists who are now certain of LCDM back in time to share the news with their younger selves who were then equally certain of SCDM. I suspect their younger selves would ask their older selves at what age they went insane, if they didn’t simply beat themselves up.

⁺⁺Craig Copi is a colleague here at CWRU, so I’ve asked him about the history of this. He seemed almost apologetic, since the current “right” baryon density from the CMB now is higher than his upper limit, but that’s what the data said at the time. The CMB gives a more accurate value only once you assume LCDM, so perhaps BBN was correct in the first place.

^&&Or succumbed to peer pressure, as that does happen. I didn’t witness it myself, so don’t know.

^$$The absolute amplitude of the no-CDM model is too high in a transparent universe. Part of the prediction of MOND is that reionization happens early, causing the universe to be a tiny bit opaque. This combination came out just right for τ = 0.17, which was the original WMAP measurement. It also happens to be consistent with the EDGES cosmic dawn signal and the growing body of evidence from JWST.

^##The second peak was unexpectedly small from the perspective of CDM; it was both natural and expected in no-CDM. At the time, it was computationally expensive to calculate power spectra, so people had pre-computed coarse grids within which to hunt for best fits. The range covered by the grids was informed by extant knowledge, of which BBN was only one element. From a dynamical perspective, Ω_m > 0.2 was adopted as a hard limit that imposed an edge in the grids of the time. There was no possibility of finding no-CDM as the best fit because it had been excluded as a possibility from the start.

***Spergel et al. (2003) also say “the best-fit Ω_bh² value for our fits is relatively insensitive to cosmological model and dataset combination as it depends primarily on the ratio of the first to second peak heights (Page et al. 2003b)” which is of course the basis of the prediction I made using the baryon density as it was Known at the time. They make no attempt to test that prediction, nor do they cite it.

⁺⁺⁺I’ve heard some people assert that this is dark matter by a different name, so is a success of the traditional dark matter picture rather than of modified gravity. That’s not at all correct. It’s just stage three in the list of reactions to surprising results identified by Louis Agassiz.

All of the figures below are from Cooke & Pettini (2018), which I employ here to briefly illustrate how D/H is measured. This is the level of detail I didn’t want to get into for either deuterium or helium or lithium, which are comparably involved.

First, here is a spectrum of the quasar they observe, Q1243+307. The quasar itself is not the object of interest here, though quasars are certainly interesting! Instead, we’re looking at the absorption lines along the line of sight; the quasar is being used as a spotlight to illuminate the gas between it and us.

**Figure 1.** Final combined and flux-calibrated spectrum of Q1243+307 (black histogram) shown with the corresponding error spectrum (blue histogram) and zero level (green dashed line). The red tick marks above the spectrum indicate the locations of the Lyman series absorption lines of the sub-DLA at redshift z_abs = 2.52564. Note the exquisite signal-to-noise ratio (S/N) of the combined spectrum, which varies from S/N ≃ 80 near the Lyα absorption line of the sub-DLA (∼4300 Å) to S/N ≃ 25 at the Lyman limit of the sub-DLA, near 3215 Å in the observed frame.

The big hump around 4330 Å is Lyman α emission from the quasar itself. Lyα is the n = 2 to 1 transition of hydrogen, Lyβ is the n = 3 to 1 transition, and so on. The rest frame wavelength of Lyα is far into the ultraviolet at 1216 Å; we see it redshifted to z = 2.558. The rest of the spectrum is continuum and emission lines from the quasar with absorption lines from stuff along the line of sight. Note that the red end of the spectrum at wavelengths longer than 4400 Å is mostly smooth with only the occasional absorption line. Blueward of 4300 Å, there is a huge jumble. This is not noise, this is the Lyα forest. Each of those lines is absorption from hydrogen in clouds at different distances, hence different redshifts, along the line of sight.

Most of the clouds in the Lyα forest are ephemeral. The cross section for Lyα is huge so It takes very little hydrogen to gobble it up. Most of these lines represent very low column densities of neutral hydrogen gas. Once in a while though, one encounters a higher column density cloud that has enough hydrogen to be completely opaque to Lyα. These are damped Lyα systems. In damped systems, one can often spot the higher order Lyman lines (these are marked in red in the figure). It also means that there is enough hydrogen present to have a shot at detecting the slightly shifted version of Lyα of deuterium. This is where the abundance ratio D/H is measured.

To measure D/H, one has not only to detect the lines, but also to model and subtract the continuum. This is a tricky business in the best of times, but here its importance is magnified by the huge difference between the primary Lyα line which is so strong that it is completely black and the deuterium Lyα line which is incredibly weak. A small error in the continuum placement will not matter to the measurement of the absorption by the primary line, but it could make a huge difference to that of the weak line. I won’t even venture to discuss the nonlinear difference between these limits due to the curve of growth.

**Figure 2.** Lyα profile of the absorption system at *z_abs = 2.52564* toward the quasar Q1243+307 (black histogram) overlaid with the best-fitting model profile (red line), continuum (long dashed blue line), and zero-level (short dashed green line). The top panels show the raw, extracted counts scaled to the maximum value of the best-fitting continuum model. The bottom panels show the continuum normalized flux spectrum. The label provided in the top left corner of every panel indicates the source of the data. The blue points below each spectrum show the normalized fit residuals, (data–model)/error, of all pixels used in the analysis, and the gray band represents a confidence interval of ±2σ. The S/N is comparable between the two data sets at this wavelength range, but it is markedly different near the high order Lyman series lines (see Figures 4 and 5). The red tick marks above the spectra in the bottom panels show the absorption components associated with the main gas cloud (Components 2, 3, 4, 5, 6, 8, and 10 in Table 2), while the blue tick marks indicate the fitted blends. Note that some blends are also detected in Lyβ–Lyε.

The above examples look pretty good. The authors make the necessary correction for the varying spectral sensitivity of the instrument, and take great care to simultaneously fit the emission of the quasar and the absorption. I don’t think they’ve done anything wrong; indeed, it looks like they did everything right – just as the people measuring lithium in stars have.

Still, as an experienced spectroscopist, there are some subtle details that make me queasy. There are two independent observations, which is awesome, and the data look almost exactly the same, a triumph of repeatability. The fitted models are nearly identical, but if you look closely, you can see the model cuts slightly differently along the left edge of the damped absorption around 4278 Å in the two versions of the spectrum, and again along the continuum towards the right edge.

These differences are small, so hopefully don’t matter. But what is the continuum, really? The model line goes through the data, because what else could one possibly do? But there is so much Lyα absorption, is that really continuum? Should the continuum perhaps trace the upper envelope of the data? A physical effect that I worry about is that weak Lyα is so ubiquitous, we never see the true continuum but rather continuum minus a tiny bit of extraordinarily weak (Gunn-Peterson) absorption. If the true continuum from the quasar is just a little higher, then the primary hydrogen absorption is unaffected but the weak deuterium absorption would go up a little. That means slightly higher D/H, which means lower Ω_bh², which is the direction in which the measurement would need to move to come into closer agreement with lithium.

Is the D/H measurement in error? I don’t know. I certainly hope not, and I see no reason to think it is. I do worry that it could be. The continuum level is one thing that could go wrong; there are others. My point is merely that we shouldn’t assume it has to be lithium that is in error.

An important check is whether the measured D/H ratio depends on metallicity or column density. It does not. There is no variation with metallicity as measured by the logarithmic oxygen abundance relative to solar (left panel below). Nor does it appear to depend on the amount of hydrogen in the absorbing cloud (right panel). In the early days of this kind of work there appeared to be a correlation, raising the specter of a systematic. That is not indicated here.

**Figure 6.** Our sample of seven high precision D/H measures (symbols with error bars); the green symbol represents the new measure that we report here. The weighted mean value of these seven measures is shown by the red dashed and dotted lines, which represent the 68% and 95% confidence levels, respectively. The left and right panels show the dependence of D/H on the oxygen abundance and neutral hydrogen column density, respectively. Assuming the Standard Model of cosmology and particle physics, the right vertical axis of each panel shows the conversion from D/H to the universal baryon density. This conversion uses the Marcucci et al. (2016) theoretical determination of the d(p,γ)³He cross-section. The dark and light shaded bands correspond to the 68% and 95% confidence bounds on the baryon density derived from the CMB (Planck Collaboration et al. 2016).

I’ll close by noting that Ω_bh² from this D/H measurement is indeed in very good agreement with the best-fit Planck CMB value. The question remains whether the physics assumed by that fit, baryons+non-baryonic cold dark mater+dark energy in a strictly FLRW cosmology, is the correct assumption to make.

Some more persistent cosmic tensions

I set out last time to discuss some of the tensions that persist in afflicting cosmic concordance, but didn’t get past the Hubble tension. Since then, I’ve come across more of that, e.g., Boubel et al (2024a), who use a variant of Tully-Fisher to obtain H₀ = 73.3 ± 2.1(stat) ± 3.5(sys) km/s/Mpc. Having done that sort of work, their systematic uncertainty term seemed large to me. I then came across Scolnic et al. (2024) who trace this issue back to one apparently erroneous calibration amongst many, and correct the results to H₀ = 76.3 ± 2.1(stat) ± 1.5(sys) km/s/Mpc. Boubel is an author of the latter paper, so apparently agrees with this revision. Fortunately they didn’t go all Sandage-de Vaucouleurs on us, but even so, this provides a good example of how fraught this field can get. It also demonstrates the opportunity for confirmation bias, as the revised numbers are almost exactly what we find ourselves. (New results coming soon!)

It’s a dang mess.

The Hubble tension is only the most prominent of many persistent tensions, so let’s wade into some of the rest.

The persistent tension in the amplitude of the power spectrum

The tension that cosmologists seem to stress about most after the Hubble tension is that in σ₈. σ₈ quantifies the amplitude of the power spectrum; it is a measure of the rms fluctuation in mass in spheres of 8h^-1 Mpc. Historically, this scale was chosen because early work by Peebles & Yu (1970) indicated that this was the scale on which the rms contrast in galaxy numbers* is unity. This is also a handy dividing line between linear and nonlinear regimes. On much larger scales, the fluctuations are smaller (a giant sphere is closer to the average for the whole universe) so can be treated in the limit of linear perturbation theory. Individual galaxies are “small” by this standard, so can’t be treated⁺ so simply, which is the excuse many cosmologists use to run shrieking from discussing them.

As we progressed from wrapping our heads around an expanding universe to quantifying the large scale structure (LSS) therein, the power spectrum statistically describing LSS became part of the canonical set of cosmological parameters. I don’t myself consider it to be on par with the Big Two, the Hubble constant H₀ and the density parameter Ω_m, but many cosmologists do seem partial to it despite the lack of phase information. Consequently, any tension in the amplitude σ₈ garners attention.

The tension in σ₈ has been persistent insofar as I recall debates in the previous century where some kinds of data indicated σ₈ ~ 0.5 while other data preferred σ₈ ~ 1. Some of that tension was in underlying assumptions (SCDM before LCDM). Today, the difference is [mostly] between the Planck best-fit amplitude σ₈ = 0.811 ± 0.006 and various local measurements that typically yield 0.7something. For example, Karim et al. (2024) find low σ₈ for emission line galaxies, even after specifically pursuing corrections in a necessary dust model that pushed things in the right direction:

**Fig. 16** from Karim et al. (2024): *Estimates of σ₈ from emission line galaxies (red and blue), luminous red galaxies (grey), and Planck (green).*

As with so many cosmic parameters, there is degeneracy, in this case between σ₈ and Ω_m. Physically this happens because you get more power when you have more stuff (Ω_m), but the different tracers are sensitive to it in different ways. Indeed, if I put on a cosmology hat, I personally am not too worried about this tension – emission line galaxies are typically lower mass than luminous red galaxies, so one expects that there may be a difference in these populations. The Planck value is clearly offset from both, but doesn’t seem too far afield. We wouldn’t fret at all if it weren’t for Planck’s damnably small error bars.

This tension is also evident as a function of redshift. Here are measures of the combination of parameters fσ₈ = Ω_m(z)^γσ₈ measured and compiled by Boubel et al (2024b):

**Fig. 16** from Boubel et al (2024b). *LCDM* matches the data for σ₈ = 0.74 (green line); the purple line is the expectation from Planck (σ₈ = 0.81). The inset shows the error ellipse, which is clearly offset from the Planck value (crossed lines), particularly for the GR^& value of γ = 0.55.

The line representing the Planck value σ₈ = 0.81 overshoots most of the low redshift data, particularly those with the smallest uncertainties. The green line has σ₈ = 0.74, so is a tad lower than Planck in the same sense as other low redshift measures. Again, the offset is modest, but it does look significant. The tension is persistent but not a show-stopper, so we generally shrug our shoulders and proceed as if it will inevitably work out.

The persistent tension in the cosmic mass density

A persistent tension that nobody seems to worry about is that in the density parameter Ω_m. Fits to the Planck CMB acoustic power spectrum currently peg Ω_m = 0.315±0.007, but as we’ve seen before, this covaries with the Hubble constant. Twenty years ago, WMAP indicated Ω_m = 0.24 and H₀ = 73, in good agreement with the concordance region of other measurements, both then and now. As with H₀, the tension is posed by the itty bitty uncertainties on the Planck fit.

Experienced cosmologists may be inclined to scoff at such tiny error bars. I was, so I’ve confirmed them myself. There is very little wiggle room to match the Planck data within the framework of the LCDM model. I emphasize that last bit because it is an assumption now so deeply ingrained that it is usually left unspoken. If we leave that part out, then the obvious interpretation is that Planck is correct and all measurements that disagree with it must suffer from some systematic error. This seems to be what most cosmologists believe at present. If we don’t leave that part out, perhaps because we’re aware of other possibilities so are not willing to grant this assumption, then the various tensions look like failures of a model that’s already broken. But let’s not go there today, and stay within the conventional framework.

There are lots of ways to estimate the gravitating mass density of the universe. Indeed, it was the persistent, early observation that the mass density Ω_m exceeded that in baryons, Ω_b, from big bang nucleosynthesis that got got the non-baryonic dark matter show on the road: there appears to be something out there gravitating that’s not normal matter. This was the key observation that launched non-baryonic cold dark matter: if Ω_m > Ω_b, there has^% to be some kind of particle that is non-baryonic.

So what is Ω_m? Most estimates have spanned the range 0.2 < Ω_m < 0.4. In the 1980s and into the 1990s, this seemed close enough to Ω_m = 1, by the standards of cosmology, that most Inflationary cosmologists presumed it would work out to what Inflation predicted, Ω_m = 1 exactly. Indeed, I remember that community directing some rather vicious tongue-lashings at observers, castigating them to look harder: you will surely get Ω_m = 1 if you do it right, you fools. But despite the occasional claim to get this “right” answer, the vast majority of the evidence never pointed that way. As I’ve related before, an important step on the path to LCDM – probably the most important step – was convincing everyone that really Ω_m < 1.

Discerning between Ω_m = 0.2 and 0.3 is a lot more challenging than determining that Ω_m < 1, so we tend to treat either as acceptable. That’s not really fair in this age of precision cosmology. There are far too many estimates of the mass density to review here, so I’ll just note a couple of discrepant examples while also acknowledging that it is easy to find dynamical estimates that agree with Planck.

To give a specific example, Mohayaee & Tully (2005) obtained Ω_m = 0.22 ± 0.02 by looking at peculiar velocities in the local universe. This was consistent with other constraints at the time, including WMAP, but is 4.5σ from the current Planck value. That’s not quite the 5σ we arbitrarily define to be an undeniable difference, but it’s plenty significant.

There have of course been other efforts to do this, and many of them lead to the same result, or sometimes even lower Ω_m. For example, Shaya et al. (2022) use the Numerical Action Method developed by Peebles to attempt to work out the motions of nearly 10,000 galaxies – not just their Hubble expansion, but their individual trajectories under the mutual influence of each other’s gravity and whatever else may be out there. The resulting deviations from a pure Hubble flow depend on how much mass is associated with each galaxy and whatever other density there is to perturb things.

**Fig. 4** from Shaya et al (2022): The gravitating mass density as a function of scale. After some local variations (hello Virgo cluster!), the data converge to Ω_m = 0.12. Reaching Ω_m = 0.24 requires an equal, additional amount of mass in “interhalo matter.” Even more mass would be required to reach the Planck value (red line added to original figure).

This result is in even greater tension with Planck than the earlier work by Mohayaee & Tully (2005). I find the need to invoke interhalo matter disturbing, since it acts as a pedestal in their analysis: extra mass density that is uniform everywhere. This is necessary so that it contributes to the global mass density Ω_m but does not contribute to perturbing the Hubble flow.

One can imagine mass that is uniformly distributed easily enough, but what bugs me is that dark matter should not do this. There is no magic segregation between dark matter that forms into halos that contain galaxies and dark matter that just hangs out in the intergalactic medium and declines to participate in any gravitational dynamics. That’s not an option available to it: if it gravitates, it should clump. To pull this off, we’d need to live in a universe made of two distinct kinds of dark matter: cold dark matter that clumps and a fluid that gravitates globally but does not clump, sort of an anti-dark energy.

Alternatively, we might live in an underdense region such that the local Ω_m is less than the global Ω_m. This is an idea that comes and goes for one reason or another, but it has always been hard to sustain. The convergence to low Ω_m looks pretty steady out to ~100 Mpc in the plot above; that’s a pretty big hole. Recall the non-linearity scale discussed above; this scale is a factor of ten larger so over/under-densities should typical be ±10%. This one is -60%, so I guess we’d have to accept that we’re not Copernican observers after all.

The persistent tension in bulk flows

Once we get past the basic Hubble expansion, individual galaxies each have their own peculiar motion, and beyond that we have bulk flows. These have been around a long time. We obsessed a lot about them for a while with discoveries like the Great Attractor. It was weird; I remember some pundits talking about “plate tectonics” in the universe, like there were giant continents of galaxy superclusters wandering around in random directions relative to the frame of the microwave background. Many of us, including me, couldn’t grok this, so we chose not to sweat it.

There is no single problem posed by bulk flows^, and of course you can find those that argue they pose no problem at all. We are in motion relative to the cosmic (CMB) frame^$, but that’s just our Milky Way’s peculiar motion. The strange fact is that it’s not just us; the entirety of the local universe seems to have a unexpected peculiar motion. There are lots of ways to quantify this; here’s a summary table from Courtois et al (2025):

**Table 1** from Courtois et al (2025): *various attempts to measure the scale of dynamical homogeneity.*

As we look to large scales, we expect the universe to converge to homogeneity – that’s the Cosmological Principle, which is one of those assumptions that is so fundamental that we forget we made it. The same holds for dynamics – as we look to large scales, we expect the peculiar motions to average out, and converge to a pure Hubble flow. The table above summarizes our efforts to measure the scale on which this happens – or doesn’t. It also shows what we expect on the second line, “predicted LCDM,” where you can see the expected convergence in the declining bulk velocities as the scale probed increases. The third line is for “cosmic variance;” when you see these words it usually means something is amiss so in addition to the usual uncertainties we’re going to entertain the possibility that we live in an abnormal universe.

Like most people, I was comfortably ignoring this issue until recently, when we had a visit and a talk from one of the protagonists listed above, Richard Watkins (W23). One of the problems that challenge this sort of work is the need for a large sample of galaxies with complete sky coverage. That’s observationally challenging to obtain. Real data are heterogeneous; treating this properly demands a more sophisticated treatment than the usual top-hat or Gaussian approaches. Watkins described in detail what a better way could be, and patiently endured the many questions my colleagues and I peppered him with. This is hard to do right, which gives aid and comfort to the inclination to ignore it. After hearing his talk, I don’t think we should do that.

Panel from **Fig. 7** of Watkins et al. (2023): The magnitude of the bulk flow as a function of scale. The green points are the data and the red dashed line is the expectation of LCDM. The blue dotted line is an estimate of known systematic effects.

The data do not converge with increasing scale as expected. It isn’t just the local space density Ω_m that’s weird, it’s also the way in which things move. And “local” isn’t at all small here, with the effect persisting out beyond 300 Mpc for any plausible h = H₀/100.

This is formally a highly significant result, with the authors noting that “the probability of observing a bulk flow [this] large … is small, only about 0.015 per cent.” Looking at the figure above, I’d say that’s a fairly conservative statement. A more colloquial way of putting it would be “no way we gonna reconcile this!” That said, one always has to worry about systematics. They’ve made every effort to account for these, but there can always be unknown unknowns.

Mapping the Universe

It is only possible to talk about these things thanks to decades of effort to map the universe. One has to survey a large area of sky to identify galaxies in the first place, then do follow-up work to obtain redshifts from spectra. This has become big business, but to do what we’ve just been talking about, it is further necessary to separate peculiar velocities from the Hubble flow. To do that, we need to estimate distances by some redshift-independent method, like Tully-Fisher. Tully has been doing this his entire career, with the largest and most recent data product being Cosmicflows-4. Such data reveal not only large bulk flows, but extensive structure in velocity space:

The Laniakea supercluster of galaxies (Tully et al. 2014).

We have a long way to go to wrap our heads around all of this.

Persistent tensions persist

I’ve discussed a few of the tensions that persist in cosmic data. Whether these are mere puzzles or a mounting pile of anomalies is a matter of judgement. They’ve been around for a while, so it isn’t fair to suggest that all of the data are consistent with LCDM. Nevertheless, I hear exactly this asserted with considerable frequency. It’s as if the definition of all is perpetually shrinking to include only the data that meet the consistency criterion. Yet it’s the discrepant bits that are interesting for containing new information; we need to grapple with them if the field is to progress.

*This was well before my time, so I am probably getting some aspect of the history wrong or oversimplifying it in some gross way. Crudely speaking, if you randomly plop down spheres of this size, some will be found to contain the cosmic average number of galaxies, some twice that, some half that. That the modern value of σ₈ is close to unity means that Peebles got it basically right with the data that were available back then and that galaxy light very nearly traces mass, which is not guaranteed in a universe dominated by dark matter.

⁺It amazes me how pervasively “galaxies are complicated” is used as an excuse⁺⁺ to ignore all small scale evidence.

Not all of us are limited to working on the simplest systems. In this case, it doesn’t matter. The LCDM prediction here is that galaxies should be complicated because they are nonlinear. But the observation is that they are simple – so simple that they obey a single effective force law. That’s the contradiction right there, regardless of what flavor of complicated might come out of some high resolution simulation.

⁺⁺At one KITP conference I attended, a particle-cosmologist said during a discussion session, in all seriousness and with a straight face, “We should stop talking about rotation curves.” Because scientific truth is best revealed by ignoring the inconvenient bits. David Merritt remarked on this in his book A Philosophical Approach to MOND. He surveyed the available cosmology textbooks, and found that not a single one of them mentioned the acceleration scale in the data. I guess that would go some way to explaining why statements of basic observational facts are often met with stunned silence. What’s obvious and well-established to me is a wellspring of fresh if incredible news to them. I’d probably give them the stink-eye about the cosmological constant if I hadn’t been paying the slightest attention to cosmology for the past thirty years.

^&There is an elegant approach to parameterizing the growth of structure in theories that deviate modestly from GR. In this context, such theories are usually invoked as an alternative to dark energy, because it is socially acceptable to modify GR to explain dark energy but not dark matter. The curious hysteresis of that strange and seemingly self-contradictory attitude aside, this approach cannot be adapted to MOND because it assumes linearity while MOND is inherently nonlinear. My very crude, back-of-the-envelope expectation for MOND is very nearly constant γ ~ 0.4 (depending on the scale probed) out to high redshift. The bend we see in the conventional models around z ~ 0.6 will occur at z > 2 (and probably much higher) because structure forms fast in MOND. It is annoyingly difficult to put a more precise redshift on this prediction because it also depends on the unknown metric. So this is a more of a hunch than a quantitative prediction. Still, it will be interesting to see if roughly constant fσ₈ persists to higher redshift.

^%The inference that non-baryonic dark matter has to exist assumes that gravity is normal in the sense taught to us by Newton and Einstein. If some other theory of gravity applies, then one has to reassess the data in that context. This is one of the first considerations I made of MOND in the cosmological context, finding Ω_m ≈ Ω_b.

^MOND is effective at generating large bulk flows.

^$Fun fact: you can type the name of a galaxy into NED (the NASA Extragalactic Database) and it will give you lots of information, including its recession velocity referenced to a variety of frames of reference and the corresponding distance from the Hubble law V = H₀D. Naively, you might think that the obvious choice of reference from is the CMB. You’d be wrong. If you use this, you will get the wrong distance to the galaxy. Of all the choices available there, it consistently performs the worst as adjudicated by direct distance measurements (e.g., Cepheids).

NED used to provide a menu of choices for the value of H₀ to use. It says something about the social-tyranny of precision cosmology that it now defaults to the Planck value. If you use this, you will get the wrong distance to the galaxy. Even if the Planck H₀ turns out to be correct in some global sense, it does not work for real galaxies that are relatively near to us. That’s what it means to have all the “local” measurements based on direct distance measurements (e.g., Cepheids) consistently give a larger H₀.

*Galaxies in the local universe are closer than they appear.* Photo by P.S. Pratheep, www.pratheep.com

Some persistent cosmic tensions

I took the occasion of the NEIU debate to refresh my knowledge of the status of some of the persistent tensions in cosmology. There wasn’t enough time to discuss those, so I thought I’d go through a few of them here. These issues tend to get downplayed or outright ignored when we hype LCDM’s successes.

When I teach cosmology, I like to have the students do a project in which they each track down a measurement of some cosmic parameter, and then report back on it. The idea, when I started doing this back in 1999, was to combine the different lines of evidence to see if we reach a consistent concordance cosmology. Below is an example from the 2002 graduate course at the University of Maryland. Does it all hang together? I ask the students to debate the pros and cons of the various lines of evidence.

The mass density parameter Ω_m = ρ_m/ρ_crit and the Hubble parameter h = H₀/(100 km/s/Mpc) from various constraints (colored lines) available in 2002. I later added the first (2003) WMAP result (box). The combination of results excludes the grey region; only the white portion is viable: this is the concordance region.

The concordance cosmology is the small portion of this diagram that was not ruled out. This is the way in which LCDM was established. Before we had either the CMB acoustic power spectrum or Type Ia supernovae, LCDM was pretty much a done deal based on a wide array of other astronomical evidence. It was the subsequent^α agreement of the Type Ia SN and the CMB that cemented the picture in place.

The implicit assumption in this approach is that we have identified the correct cosmology by process of elimination: whatever is left over must be the right answer. But what if nothing is left over?

I have long worried that we’ve painted ourselves into a corner: maybe the concordance window is merely the least unlikely spot before everything is excluded. Excluding everything would effectively falsify LCDM cosmology, if not the more basic picture of an expanding universe^% emerging from a hot big bang. Once one permits oneself to think this way, then it occurs to one that perhaps the reason we have to invoke the twin tooth fairies of dark matter and dark energy is to get FLRW to approximate some deeper, underlying theory.

Most cosmologists do not appear to contemplate this frightening scenario. And indeed, before we believe something so drastic, we have to have thoroughly debunked the standard picture – something rather difficult to do when 95% of it is invisible. It also means believing all the constraints that call the standard picture into question (hence why contradictory results experience considerably more scrutiny* than conforming results). The fact is that some results are more robust than others. The trick is deciding which to trust.^{^}

In the diagram above, the range of Ω_m from cluster mass-to-light ratios comes from some particular paper. There are hundreds of papers on this topic, if not thousands. I do not recall which one this particular illustration came from, but most of the estimates I’ve seen from the same method come in somewhat higher. So if we slide those green lines up, the allowed concordance window gets larger.

The practice of modern cosmology has necessarily been an exercise in judgement: which lines of evidence should we most trust? For example, there is a line up there for rotation curves. That was my effort to ask what combination of cosmological parameters led to dark matter halo densities that were tolerable to the rotation curve data of the time. Dense cosmologies give birth to dense dark matter halos, so everything above that line was excluded because those parameters cram too much dark matter into too little space. This was a pretty conservative limit at the time, but it is predicated on the insistence of theorists that dark matter halos had to have the NFW form predicted by dark matter-only simulations. Since that time, simulations including baryons have found any number of ways to alter the initial cusp. This in turn means that the constraint no longer applies as the halo might have been altered from its original, cosmology-predicted initial form. Whether the mechanisms that might cause such alterations are themselves viable becomes a separate question.

If we believed all of the available constraints, then there is no window left and FLRW is already ruled out. But not all of those data are correct, and some contradict each other, even absent the assumption of FLRW. So which do we believe? Finding one’s path in this field is like traipsing through an intellectual mine field full of hardened positions occupied by troops dedicated to this or that combination of parameters.

It is in every way an invitation to confirmation bias. The answer we get depends on how we weigh disparate lines of evidence. We are prone to give greater weight to lines of evidence that conform to our pre-established⁺ beliefs.

So, with that warning, let’s plunge ahead.

The modern Hubble tension

Gone but not yet forgotten are the Hubble wars between camps Sandage (H₀ = 50!) and de Vaucouleurs (H₀ = 100!). These were largely resolved early this century thanks to the Hubble Space Telescope Key Project on the distance scale. Obtaining this measurement was the major motivation to launch HST in the first place. Finally, this long standing argument was resolved: nearly everyone agreed that H₀ = 72 km/s/Mpc.

That agreement was long-lived by the standards of cosmology, but did not last forever. Here is an illustration of the time dependence of H₀ measurements this century, from Freedman (2021):

There are many illustrations like this; I choose this one because it looks great and seems to have become the go-to for illustrating the situation. Indeed, it seems to inform the attitude of many scientists close to but not directly involved in the H₀ debate. They seem to perceive this as a debate between Adam Riess and Wendy Freedman, who have become associated with the Cepheid and TRGB^$ calibrations, respectively. This is a gross oversimplification, as they are not the only actors on a very big stage^&. Even in this plot, the first Cepheid point is from Freedman’s HST Key Project. But this apparent dichotomy between calibrators and people seems to be how the subject is perceived by scientists who have neither time nor reason for closer scrutiny. Let’s scrutinize.

Fits to the acoustic power spectrum of the CMB agreed with astronomical measurements of H₀ for the first decade of the century. Concordance was confirmed. The current tension appeared with the first CMB data from Planck. Suddenly the grey band of the CMB best-fit no longer overlapped with the blue band of astronomical measurements. This came as a shock. Then a new (red) band appears, distinguishing between the “local” H₀ calibrated by the TRGB from that calibrated by Cepheids.

I think I mentioned that cosmology was an invitation to confirmation bias. If you put a lot of weight on CMB fits, as many cosmologists do, then it makes sense from that perspective that the TRGB measurement is the correct one and the Cepheid H₀ must be wrong. This is easy to imagine given the history of systematic errors that plagued the subject throughout the twentieth century. This confirmation bias makes one inclined to give more credence to the new^# TRGB calibration, which is only in modest tension with the CMB value. The narrative is then simplified to two astronomical methods that are subject to systematic uncertainty: one that agrees with the right answer and one that does not. Ergo, the Cepheid H₀ is in systematic error.

This narrative oversimplifies that matter to the point of being actively misleading, and the plot above abets this by focusing on only two of the many local measurements. There is no perfect way to do this, but I had a go at it last year. In the plot below, I cobbled together all the data I could without going ridiculously far back, but chose to show only one point per independent group, the most recent one available from each, the idea being that the same people don’t get new votes every time they tweak their result – that’s basically what is illustrated above. The most recent points from above are labeled Cepheids & TRGB (the date of the TRGB goes to the full Chicago-Carnegie paper, not Freedman’s summary paper where the above plot can be found). See McGaugh (2024) for the references.

When I first made this plot, I discovered that many measurements of the Hubble constant are not all that precise: the plot was an indecipherable forest of error bars. So I chose to make a cut at a statistical uncertainty of 3 km/s/Mpc: worse than that, the data are shown as open symbols sans error bars; better than that, the datum gets explicit illustration of both its statistical and systematic uncertainty. One could make other choices, but the point is that this choice paints a different picture from the choice made above. One of these local measurements is not like the others, inviting a different version of confirmation bias: the TRGB point is the outlier, so perhaps it is the one that is wrong.

*Recent measurements of the Hubble constant (left) and the calibration of the baryonic Tully-Fisher relation (right) underpinning one of those measurements.*

I highlight the measurement our group made not to note that we’ve done this too so much as to highlight an underappreciated aspect of the apparent tension between Cepheid and TRGB calibrations. There are 50 galaxies that calibrate the baryonic Tully-Fisher relation, split nearly evenly between galaxies whose distance is known through Cepheids (blue points) and TRGB (red points). They give the same answer. There is no tension between Cepheids and the TRGB here.

Chasing this up, it appears to me that what happened was that Freedman’s group reanalyzed the data that calibrate the TRGB, and wound up with a slightly different answer. This difference does not appear to be in the calibration equation (the absolute magnitude of the tip of the red giant branch didn’t change that much), but in something to do with how the tip magnitude is extracted. Maybe, I guess? I couldn’t follow it all the way, and I got bad vibes reminding me of when I tried to sort through Sandage’s many corrections in the early ’90s. That doesn’t make it wrong, but the point is that the discrepancy is not between Cepheids and TRGB calibrations so much as it is between the TRGB as implemented by Freedman’s group and the TRGB as implemented by others. The depiction of the local Hubble constant debate as being between Cepheid and TRGB calibrations is not just misleading, it is wrong.

Can we get away from Cepheids and the TRGB entirely? Yes. The black points above are for megamasers and gravitational lensing. These are geometric methods that do not require intermediate calibrators like Cepheids at all. It’s straight trigonometry. Both indicate H₀ > 70. Which way is our confirmation bias leaning now?

The way these things are presented has an impact on scientific consensus. A fascinating experiment on this has been done in a recent conference report. Sometimes people poll conference attendees in an attempt to gauge consensus; this report surveys conference attendees “to take a snapshot of the attitudes of physicists working on some of the most pressing questions in modern physics.” One of the topics queried is the Hubble tension. Survey says:

*Table XII from arXiv:2503.15776 in which scientists at the 2024 conference* Black Holes Inside and Out vote on their opinion about the most likely solution of the Hubble tension.

First, a shout out to the 1/4 of scientists who expressed no opinion. That’s the proper thing to do when you’re not close enough to a subject to make a well-informed judgement. Whether one knows enough to do this is itself a judgement call, and we often let our arrogance override our reluctance to over-share ill-informed opinions.

Second, a shout out to the folks who did the poll for including a line for systematics in the CMB. That is a logical possibility, even if only 3 of the 72 participants took it seriously. This corroborates the impression I have that most physicists seem to think the CMB is prefect like some kind of holy scripture written in fire on the primordial sky, so must be correct and cannot be questioned, amen. That’s silly; systematics are always a possibility in any observation of the sky. In the case of the CMB, I suspect it is not some instrumental systematic but the underlying assumption of LCDM FLRW that is the issue; once one assumes that, then indeed, the best fit to the Planck data as published is H₀ = 67.4, with H₀ > 68 being right out. (I’ve checked.)

A red flag that the CMB is where the problem lies is the systematic variation of the best-fit parameters along the trench of minimum χ²:

*The time evolution of best-fit CMB cosmology parameters. These have steadily drifted away from the LCDM concordance window while the astronomical measurements that established it have not.*

I’ve shown this plot and variations for other choices of H₀ before, yet it never fails to come as a surprise when I show it to people who work closely on the subject. I’m gonna guess that extends to most of the people who participated in the survey above. Some red flags prove to be false alarms, some don’t, but one should at least be aware of them and take them into consideration when making a judgement like this.

The plurality (35%) of those polled selected “systematic error in supernova data” as the most likely cause of the Hubble tension. It is indeed a common attitude, as I mentioned above, that the Hubble tension is somehow a problem of systematic errors in astronomical data like back in the bad old days^**of Sandage & de Vaucouleurs.

Let’s unpack this a bit. First, the framing: systematic error in supernova data is not the issue. There may, of course, be systematic uncertainties in supernova data, but that’s not a contender for what is causing the apparent Hubble tension. The debate over the local value of H₀ is in the calibrators of supernovae. This is often expressed as a tension between Cepheid and TRGB calibrators, but as we’ve seen, even that is misleading. So posing the question this way is all kinds of revealing, including of some implicit confirmation bias. It’s like putting the right answer of a multiple choice question first and then making up some random alternatives.

So what do we learn from this poll for consensus? There is no overwhelming consensus, and the most popular choice appears to be ill-informed. This could be a meme. Tell me you’re not an expert on a subject by expressing an opinion as if you were.

The kicker here is that this was a conference on black hole physics. There seems to have been some fundamental gravitational and quantum physics discussed, which is all very interesting, but this is a community that is pretty far removed from the nitty-gritty of astronomical observations. There are many other polls reported in this conference report, many of them about esoteric aspects of black holes that I find interesting but would not myself venture an opinion on: it’s not my field. It appears that a plurality of participants at this particular conference might want to consider adopting that policy for fields beyond their own expertise.

I don’t want to be too harsh, but it seems like we are repeating the same mistakes we made in the 1980s. As I’ve related before, I came to astronomy from physics with the utter assurance that H₀ had to be 50. It was Known. Then I met astronomers who were actually involved in measuring H₀ and they were like, “Maybe it is ~80?” This hurt my brain. It could not be so! and yet they turned out to be correct within the uncertainties of the time. Today, similar strong opinions are being expressed by the same community (and sometimes by the same people) who were wrong then, so it wouldn’t surprise me if they are wrong now. Putting how they think things should be ahead of how they are is how they roll.

There are other tensions besides the Hubble tension, but I’ll get to them in future posts. This is enough for now.

^αAs I’ve related before, I date the genesis of concordance LCDM to the work of Ostriker & Steinhardt (1995), though there were many other contributions leading to it (e.g., Efstathiou et al. 1990). Certainly many of us anticipated that the Type Ia SN experiments would confirm or deny this picture. Since the issue of confirmation bias is ever-present in cosmic considerations, it is important to understand this context: the acceleration of the expansion rate that is often depicted as a novel discovery in 1998 was an expect result. So much so that at a conference in 1997 in Aspen I recall watching Michael Turner badger the SN presenters to Proclaim Lambda already. One of the representatives from the SN teams was Richard Ellis, who wasn’t having it: the SN data weren’t there yet even if the attitude was. Amusingly, I later heard Turner claim to have been completely surprised by the 1998 discovery, as if he hadn’t been pushing for it just the year before. Aspen is a good venue for discussion; I commented at the time that the need to rehabilitate the cosmological constant was a big stop sign in the sky. He glared at me, and I’ve been on his shit list ever since.

^%I will not be entertaining assertions that the universe is not expanding in the comments: that’s beyond the scope of this post.

*Every time a paper corroborating a prediction of MOND is published, the usual suspects get on social media to complain that the referee(s) who reviewed the paper must be incompetent. This is a classic case of admitting you don’t understand how the process works by disparaging what happened in a process to which you weren’t privy. Anyone familiar with the practice of refereeing will appreciate that the opposite is true: claims that seem extraordinary are consistently held to a higher standard.

^{^}Note that it is impossible to exclude the act of judgement. There are approaches to minimizing this in particular experiments, e.g., by doing a blind analysis of large scale structure data. But you’ve still assumed a paradigm in which to analyze those data; that’s a judgement call. It is also a judgement call to decide to believe only large scale data and ignore evidence below some scale.

⁺I felt this hard when MOND first cropped up in my data for low surface brightness galaxies. I remember thinking How can this stupid theory get any predictions right when there is so much evidence for dark matter? It took a while for me to realize that dark matter really meant mass discrepancies. The evidence merely indicates a problem, the misnomer presupposes the solution. I had been working so hard to interpret things in terms of dark matter that it came as a surprise that once I allowed myself to try interpreting things in terms of MOND I no longer had to work so hard: lots of observations suddenly made sense.

^$TRGB = Tip of the Red Giant Branch. Low metallicity stars reach a consistent maximum luminosity as they evolve up the red giant branch, providing a convenient standard candle.

^&Where the heck is Tully? He seldom seems to get acknowledged despite having played a crucial role in breaking the tyranny of H₀ = 50 in the 1970s, having published steadily on the topic, and his group continues to provide accurate measurements to this day. Do physics-trained cosmologists even know who he is?

^#The TRGB was a well-established method before it suddenly appears on this graph. That it appears this way shortly after the CMB told us what answer we should get is a more worrisome potential example of confirmation bias, reminiscent of the situation with the primordial deuterium abundance.

^**Aside from the tension between the TRGB as implemented by Freedman’s group and the TRGB as implemented by others, I’m not aware of any serious hint of systematics in the calibration of the distance scale. Can it still happen? Sure! But people are well aware of the dangers and watch closely for them. At this juncture, there is ample evidence that we may indeed have gotten past this.

Ha! I knew the Riess reference off the top of my head, but lots of people have worked on this so I typed “hubble calibration not a systematic error” into Google to search for other papers only to have its AI overview confidently assert

The statement that Hubble calibration is not a systematic error is incorrect
Google AI

That gave me a good laugh. It’s bad enough when overconfident underachievers shout about this from the wrong peak of the Dunning-Kruger curve without AI adding its recycled opinion to the noise, especially since its “opinion” is constructed from the noise.

The best search engine for relevant academic papers is NASA ADS; putting the same text in the abstract box returns many hits that I’m not gonna wade through. (A well-structured ADS search doesn’t read so casually; apparently the same still applies to Google.)

NEIU debate: dark matter or modified gravity

As promised, the folks at NEIU have posted the video of my discussion with Scott Dodelson last week, so here you go:

I am in the midst of writing a related post on cosmic tensions, so hopefully I can post that soon as well.

A Blog About the Science and Sociology of Cosmology and Dark Matter