Non-equilibrium dynamics in galaxies that appear to have lots of dark matter: ultrafaint dwarfs

Non-equilibrium dynamics in galaxies that appear to have lots of dark matter: ultrafaint dwarfs

This is a long post. It started focused on ultrafaint dwarfs, but can’t avoid more general issues. In order to diagnose non-equilibrium effects, we have to have some expectation for what equilibrium would be. The Tully-Fisher relation is a useful empirical touchstone for that. How the Tully-Fisher relation comes about is itself theory-dependent. These issues are intertwined, so in addition to discussing the ultrafaints, I also review some of the many predictions for Tully-Fisher, and how our theoretical expectation for it has evolved (or not) over time.

In the last post, we discussed how non-equilibrium dynamics might make a galaxy look like it had less dark matter than similar galaxies. That pendulum swings both ways: sometimes non-equilibrium effects might stir up the velocity dispersion above what it would nominally be. Some galaxies where this might be relevant are the so-called ultrafaint dwarfs (not to be confused with ultradiffuse galaxies, which are themselves often dwarfs). I’ve talked about these before, but more keep being discovered, so an update seems timely.

Galaxies and ultrafaint dwarfs

It’s a big universe, so there’s a lot of awkward terminology, and the definition of an ultrafaint dwarf is somewhat debatable. Most often I see them defined as having an absolute magnitude limit MV > -8, which corresponds to a luminosity less than 100,000 suns. I’ve also seen attempts at something more physical, like being a “fossil” whose star formation was entirely before cosmic reionization, which ended way back at z ~ 6 so all the stars would be at least*&^# 12.5 Gyr old. While such physics-based definitions are appealing, these are often tied up with theoretical projection: the UV photons that reionized the universe should have evaporated the gas in small dark matter halos, so these tiny galaxies can only be fossils from before that time. This thinking pervades much of the literature despite it being obviously wrong, as counterexamples! exist. For example, Leo P is practically an ultrafaint dwarf by luminosity, but has ample gas (so a larger baryonic mass) and is currently forming stars.

A luminosity-based definition is good enough for us here; I don’t really care exactly where we make the cut. Note that ultrafaint is an appropriate moniker: a luminosity of 105 L is tiny by galaxy standards. This is a low-grade globular cluster, and some ultrafaints are only a few hundred solar luminosities, which is barely even# a star cluster. At this level, one has to worry about stochastic effects in stellar evolution. If there are only a handful of stars, the luminosity of the entire system changes markedly as a single star evolves up the red giant branch. Consequently, our mapping from observed quantities to stellar mass is extremely dodgy. For consistency, to compare with brighter dwarfs, I’ve adopted the same boilerplate M*/LV = 2 M/L. That makes for a fair comparison luminosity-to-luminosity, but the uncertainty in the actual stellar mass is ginormous.

It gets worse, as the ultrafaints that we know about so far are all very nearby satellites of the Milky Way. They are not discovered in the same way as other galaxies, where one plainly sees a galaxy on survey plates. For example, NGC 7757:

A faint galaxy in the night sky, surrounded by numerous distant star-like points.
The spiral galaxy NGC 7757 as seen on plates of the Palomar Sky Survey.

While bright, high surface brightness galaxies like NGC 7757 are easy to see, lower surface brightness galaxies are not. However, they can usually still be seen, if you know where to look:

A faint galaxy amidst numerous distant stars in a dark sky, illustrating the challenges of observing low surface brightness galaxies.
UGC 1230 as seen on the Palomar Sky Survey. It’s in the middle.

I like to use this pair as an illustration, as they’re about the same distance from us and about the same angular size on the sky – at least, once you crank up the gain for the low surface brightness UGC 1230:

Comparison of two astronomical images: the left side shows a spiral galaxy with visible structure and brightness, while the right side features a lower surface brightness galaxy, appearing more diffuse and less distinct.
Zoom in on deep CCD images of NGC 7757 (left) and UGC 1230 (right) with the contrast of the latter enhanced. The chief difference between the two is surface brightness – how spread out their stars are. They have a comparable physical diameter, they both have star forming regions that appear as knots in their spiral arms, etc. These galaxies are clearly distinct from the emptiness of the cosmic void around them, being examples of giant stellar systems that gave rise to the term “island universe.”

In contrast to objects that are obvious on the sky as independent island universes, ultrafaint dwarfs are often invisible to the eye. They are recognized as a subset of stars near each other on the sky that also share the same distance and direction of motion in a field that might otherwise be crowded with miscellaneous, unrelated stars. For example, here is Leo IV:

Wide field image of the Ultra-Faint Dwarf Galaxy Leo IV, featuring a zoomed-in view of its faint structure surrounded by numerous background stars and galaxies.
The ultrafaint dwarf Leo IV as identified by the Sloan Digital Sky Survey and the Hubble Space Telescope.

See it?

I don’t. I do see a number of background galaxies, including an edge-on spiral near the center of the square. Those are not the ultrafaint dwarf, which is some subset of the stars in this image. To decide which ones are potentially a part of such a dwarf, one examines the color magnitude diagram of all the stars to identify those that are consistent with being at the same distance, and assigns membership in a probabilistic way. It helps if one can also obtain radial velocities and/or proper motions for the stars to see which hang together – more or less – in phase space.

Part of the trick here is deciding what counts as hanging together. A strong argument in favor of these things residing in dark matter halos is that the velocity differences between the apparently-associated stars are too great for them to remain together for any length of time otherwise. This is essentially the same situation that confronted Zwicky in his observations of galaxies in clusters in the 1930s. Here are these objects that appear together in the sky, but they should fly apart unless bound together by some additional, unseen force. But perhaps some of these ultrafaints are not hanging together; they may be in the process of coming apart. Indeed, they may have so few stars because they are well down the path of dissolution.

Since one cannot see an ultrafaint dwarf in the same way as an island universe, I’ve heard people suggest that being bound by a dark matter halo be included in the definition of a galaxy. I see where they’re coming from, but find it unworkable. I know a galaxy when I see one. As did Hubble, as did thousands of other observers since, as can you when you look at the pictures above. It is absurd to make the definition of an object that is readily identifiable by visual inspection be contingent on the inferred presence of invisible stuff.

So are ultrafaints even galaxies? Yes and no. Some of the probabilistic identifications may be mere coincidences, not real objects. However, they can’t all be fakes, and I think that if you put them in the middle of intergalactic space, we would recognize them as galaxies – provided we could detect them at all. At present we can’t, but hopefully that situation will improve with the Rubin Observatory. In the meantime, what we have to work with are these fragmentary systems deep in the potential well of the seventy billion solar mass cosmic gorilla that is the Milky Way. We have to be cognizant that they might have gotten knocked around, as we can see in more massive systems like the Sagittarius dwarf. Of course, if they’ve gotten knocked around too much, then they shouldn’t be there at all. So how do these systems evolve under the influence of a comic gorilla?

Let’s start by looking at the size-mass diagram, as we did before. Ultrafaint dwarfs extend this relation to much lower mass, and also to rather small sizes – some approaching those of star clusters. They approximately follow a line of constant surface density, ~0.1 M pc-2 (dotted line)..

A graph illustrating the size-mass relationship of galaxies, plotting effective radius (Re) against stellar mass (M*). Black squares represent data points of larger galaxies, while green squares indicate ultrafaint dwarfs. The dotted line suggests a correlation between size and mass.
The size and stellar mass of Local Group dwarfs as discussed previously, with the addition of ultrafaint dwarfs$ (small gray squares).

This looks weird to me. All other types of galaxies scatter all over the place in this diagram. The ultrafaints are unique in following a tight line in the size-mass plane, and one that follows a line of constant surface brightness. Every element of my observational experience screams that this is likely to be an artifact. Given how these “galaxies” are identified as the loose association of a handful of stars, it is easy to imagine that this trend might be an artifact of how we define the characteristic size of a system that is essentially invisible. It might also arise for physical reasons to do with the cosmic gorilla; i.e., it is a consequence of dynamical evolution. So maybe this correlation is real, but the warning lights that it is not are flashing red.

The Baryonic Tully-Fisher relation as a baseline

Ideally, we would measure accelerations to test theories, particularly MOND. Here, we would need to use the size to estimate the acceleration, but I straight up don’t believe these sizes are physically meaningful. The stellar mass, dodgy as it is, seems robust by comparison. So we’ll proceed as if we know that much – which we don’t, really – but let’s at least try.

With the stellar mass (there is no gas in these things), we are halfway to constructing the baryonic Tully-Fisher relation (BTFR), which is the simplest test of the dynamics that we can make with the available data. The other quantity we need is the characteristic circular speed of the gravitational potential. For rotating galaxies, that is the flat rotation speed, Vf. For pressure supported dwarfs, what is usually measured is the velocity dispersion σ. We’ve previously established that for brighter dwarfs in the Local Group, a decent approximation is Vf = 2σ, so we’ll start by assuming that this should apply to the ultrafaints as well. This allows us to plot the BTFR:

A scatter plot showing the relationship between velocity (Vf in km/s) and baryonic mass (Mb in solar masses), with data points represented by different shapes and colors for various galaxy types.
The baryonic mass and characteristic circular speeds of both rotationally supported galaxies (circles) and pressure supported dwarfs (squares). The colored points follow the same baryonic Tully-Fisher relation (BTFR), but the data for low mass ultrafaint dwarfs (gray squares) flattens out, having nearly the same characteristic speed over several decades in mass.

The BTFR is an emprical relation of the form Vf ~ Mb1/4 over about six decades in mass. Somewhere around the ultrafaint scale, this no longer appears to hold, with the observed velocity flattening out to become approximately constant for these lowest mass galaxies. I’m not sure this is real, as there many practical caveats to interpreting the observations. Measuring stellar velocities is straightforward but demanding at this level of accuracy. There are many potential systematics, pretty much all of which cause the intrinsic velocity dispersion to be overestimated. For example, observations made with multislit masks tend to return larger dispersions than observations of the same object with fibers. That’s likely because it is hard to build a mask so well that all of the stars perfectly hit the centers of the slitlets assigned to them; offsets within the slit shift the spectrum in a way that artificially adds to the apparent velocity dispersion. Fibers are less efficient in their throughput, but have the virtue of blending the input light in a way that precludes this particular systematic. Another concern is physical – some of the stars that are observed are presumably binaries, and some of the velocity will be due to motion within the binary pair and nothing to do with the gravitational potential of the larger system. This can be addressed with repeated observations to see if some velocities change, but it is hard to do that for each and every system, especially when it is way more fun to discover and explore new systems than follow up on the same one over and over and over again.

There are lots of other things that can go wrong. At some level, some of them probably do – that’s the nature of observational astronomy&. While it seems likely that some of the velocity dispersions are systematically overestimated, it seems unlikely that all of them are. Let’s proceed as if the bulk of the data is telling us something, even if we treat individual objects with suspicion.

MOND

MOND makes a clear prediction for the BTFR of isolated galaxies: the baryonic mass goes as the fourth power of the flat rotation speed. Contrary to Newtonian expectation, this holds irrespective of surface brightness, which is what attracted my attention to the theory in the first place. So how does it do here?

A graph depicting the relationship between the flat rotation speed (Vf in km/s) and the baryonic mass (Mb in solar masses), showing data points for various galaxies, including ultrafaint dwarfs highlighted with unique markers.
The same data as above with the addition of the line predicted by MOND (Milgrom 1983).

Low surface density means low acceleration, so low surface brightness galaxies would make great tests of MOND if they were isolated. Oh, right – they already did. Repeatedly. MOND also correctly predicted the velocities of low mass, gas-rich dwarfs that were unknown when the prediction was made. These are highly nontrivial successes of the theory.

The ultrafaints we’re discussing here are not isolated, so they do not provide the clean tests that isolated galaxies provide. However, galaxies subject to external fields should have low velocities relative to the BTFR, while the ultrafaints have higher velocities. They’re on the wrong side of the relation! Taking this at face value (i.e., assuming equilibrium), MOND fails here.

Whenever MOND has a problem, it is widely seen as a success of dark matter. In my experience, this is rarely true: observations that are problematic for MOND usually don’t make sense in terms of dark matter either. For each observational test we also have to check how LCDM fares.

LCDM

How LCDM fares is often hard to judge because its predictions for the same phenomena are not always clear. Different people predict different things for the same theory. There have been lots of LCDM-based predictions made for both dwarf satellite galaxies and the Tully-Fisher relation. Too many, in fact – it is a practical impossibility to examine them all. Nevertheless, some common themes emerge if we look at enough examples.

The halo mass-velocity relation

The most basic prediction of LCDM is that the mass of a dark matter halo scales with the cube of the circular velocity of a test particle at the virial radius (conventionally taken to be the radius R200 that encompasses an average density 200 times the critical density of the universe. If that sounds like gobbledygook to you, just read “halo” for “200”): M200 ~ V2003. This is a very basic prediction that everyone seems to agree to.

There is a tiny problem with testing this prediction: it refers to the dark matter halo that we cannot see. In order to test it, we have to introduce some scaling factors to relate the dark to the light. Specifically, Mb = fd M200 and Vf = fv V200, where fd is the observed fraction of mass in baryons and fv relates the observed flat velocity to the circular speed of our notional test particle at the virial radius. The obvious assumptions to make are that fd is a constant (perhaps as much as but not more than the cosmic baryon fraction of 16%) and fv is close to untiy. The latter requirement stems from the need for dark matter to explain the amplitude of the flat rotation speed, but fv could be slightly different; plausible values range from 0.9 < fv < 1.4. Values large than one indicate a rotation curve that declines before the virial radius is reached, which is the natural expectation for NFW halos.

Here is a worked example with fd = 0.025 and fv = 1:

A graph depicting the relationship between the flat rotation speed (Vf) in kilometers per second and the baryonic mass (Mb) in solar masses. The data points are shown with various markers, including gray squares, green squares, and blue circles, each representing different galaxy types, along with error bars. A solid gray line indicates a trend, while a dotted line marks a theoretical lower bound.
The same data as above with the addition of the nominal prediction of LCDM. The dotted line is the halo mass-circular velocity relation; the gray band is a simple model with fd = 0.025 and fv = 1 (e.g., Mo, Mao, & White 1998).

I have illustrated the model with a fat grey line because fd = 0.025 is an arbitrary choice* I made to match the data. It could be more, it could be less. The detected baryon fraction can be anythings up to or less than the cosmic value, fd < fb = 0.16 as not all of the baryons available in a halo cool and condense into cold gas that forms visible stars. That’s fine; there’s no requirement that all of the baryons have to become readily observable, but there is also no reason to expect all halos to cool exactly the same fraction of baryons. Naively one would expect at least some variation in fd from halo to halo, so there could and probably should be a lot of scatter: the gray line could easily be a much wider band than depicted.

In addition to the rather arbitrary value of fd, this reasoning also predicts a Tully-Fisher relation with the wrong slope. Picking a favorable value of fd only matches the data over a narrow range of mass. It was nevertheless embraced for many years by many people. Selection effects bias samples to bright galaxies. Consequently, the literature is rife with TF samples dominated by galaxies with Mb > 1010 M (the top right corner of the plot above); with so little dynamic range, a slope of 3 looks fine. Once you look outside that tiny box, it does not look fine.

Personally, I think a slope of 3 is an oversimplification. That is the prediction for dark matter halos; there can be effects that vary systematically with mass. An obvious one is adiabatic compression, the effect by which baryons drag some dark matter along with them as they settle to the center of their halos. This increases fv by an amount that depends on the baryonic surface density. Surface density correlates with mass, so I would nominally expect higher velocities in brighter galaxies; this drives up the slope. There are various estimates of this effect; typically one gets a slope like 3.3, not the observed 4. Worse, it predicts an additional effect: at a given mass, galaxies of higher surface brightness should also have higher velocity. Surface brightness should be a second parameter in the Tully-Fisher relation, but this is not observed.

The easiest way to reconcile the predicted and observed slopes are to make fd a function of mass. Since Mb = fd M200 and M200 ~ V2003, Mb ~ fd V2003. Adopting fv = 1 for simplicity, Mb ~ Vf4 follows if fd ~ Vf. Problem solved, QED.

There are [at least] two problems with this argument. One is that the scaling fd ~ Vf must hold perfectly without introducing any scatter. This is a fine-tuning problem: we need one parameter to vary precisely with an another, unrelated parameter. There is no good reason to expect this; we just have to insert the required dependence by hand. This is much worse than choosing an arbitrary value for fd: now we’re making it a rolling fudge factor to match whatever we need it to. We can make it even more complicated by invoking some additional variation in fv, but this just makes the fine-tuning worse as the product fdfv-3 has to vary just so. Another problem is that what we’re doing all this to adjust the prediction of one theory (LCDM) to match that of a different theory (MOND). It is never a good sign when we have to do that, whether we admit it or not.

Abundance matching

The reasoning leading to a slope 3 Tully-Fisher relation assumes a one-to-one relation between baryonic and halo mass (fd = constant). This is an eminently reasonable assumption. We spent a couple of decades trying to avoid having to break this assumption. Once we do so and make fd a freely variable parameter, then it can become a rolling fudge factor that can be adjusted to fit anything. Everyone agrees that is Bad. However, it might be tolerable if there is an independent way of estimating this variation. Rather than make fd just be what we need it to be as described above, we can instead estimate it with abundance matching.

Abundance matching comes from equating the observed number density of galaxies as a function of mass with the number density of dark matter halos. This process gives fd, or at least the stellar fraction, f*, which is close to fd for bright galaxies. Critically, it provides a way to assign dark matter halo masses to galaxies independently of their kinematics. This replaces an arbitrary, rolling fudge factor with a predictive theory.

Abundance matching models generically introduce curvature into the prediction for the BTFR. This stems from the mismatch in the shape of the galaxy stellar mass function (a Schechter function) and the dark halo mass function (a power law on galaxy scales). This leads to a bend in relations that map between visible and dark mass.

The transition from the M ~ V3 reasoning to abundance matching occurred gradually, but became pronounced circa 2010. There are many abundance matching models; I already faced the problem of the multiplicity of LCDM predictions when I wrote a lengthy article on the BTFR in 2012. To get specific, let’s start with an example from then, the model of Trujillo-Gomez-et al. (2011):

Scatter plot showing the relationship between gravitational potential flat rotation speed (Vf in km/s) and baryonic mass (Mb in solar masses). The plot features varying data points marked with blue circles, green squares, and gray squares, indicating different galaxy types or observational methods. A red curve is drawn, illustrating an empirical relationship fitting the data.
The same data as above with the addition of the line predicted by LCDM in the model of Trujillo-Gomez-et al. (2011).

One thing Trujillo-Gomez-et al. (2011) say in their abstract is “The data present a clear monotonic LV relation from ∼50 km s−1 to ∼500 km s−1, with a bend below ∼80 km s−1“. By LV they mean luminosity-velocity, i.e., the regular Tully-Fisher relation. The bend they note is real; that’s what happens when you consider only the starlight and ignore the gas. The bend goes away if you include that gas. This was already known at the time – our original BTFR paper from 2000 has nearly a thousand citations, so it isn’t exactly obscure. Ignoring the gas is a choice that makes no sense empirically but makes a lot of sense from the perspective of LCDM simulations. By 2010, these had become reasonably good at matching the numbers of stars observed in galaxies, but the gas properties of simulated galaxies remained, hmmmmmmm, wanting. It makes sense to utilize the part that works. It makes less sense to pretend that this bend is something physically meaningful rather than an artifact of ignoring the gas. The pressure-supported dwarfs are all star dominated, so this distinction doesn’t matter here, and they follow the BTFR, not the stars-only version.

An old problem in galaxy formation theory is how to calibrate the number density of dark matter halos to that of observed galaxies. For a long time, a choice that people made was to match either the luminosity function or the kinematics. These didn’t really match up, so there was occasional discussion of the virtues and vices of the “luminosity function calibration” vs. the “Tully-Fisher calibration.” These differed by a factor of ~2. This tension between remains with us. Mostly simulations have opted to adopt the luminosity function calibration, updated and rebranded as abundance matching. Again, this makes sense from the perspective of LCDM simulations, because the number density of dark matter halos is something that simulations can readily quantify while the kinematics of individual galaxies are much harder to resolve**.

The nonlinear relation between stellar mass and halo mass obtained from abundance matching inevitably introduces curvature into the corresponding Tully-Fisher relation predicted by such models. That’s what you see in the curved line of Trujillo-Gomez-et al. (2011) above. They weren’t the first to obtain such a result, and the certainly weren’t the last: this is a feature of LCDM with abundance matching, not a bug.

The line of Trujillo-Gomez-et al. (2011) matches the data pretty well at intermediate masses. It diverges to higher velocities at both small and large galaxy masses. I’ve written about this tension at high masses before; it appears to be real, but let’s concentrate on low masses here. At low masses, the velocity of galaxies with Mb < 108 M appears to be overestimated. But the divergence between model and reality has just begun, and it is hard to resolve small things in simulations, so this doesn’t seem too bad. Yet.

Moving ahead, there are the “Latte” simulations of Wetzel et al. (2016) that use the well-regarded FIRE code to look specifically at simulated dwarfs, both isolated and satellites – specifically satellites of Milky Way-like systems. (Milky Way. Latte. Get it? Nerd humor.) So what does that find?

A graph displaying the relationship between circular velocity (Vf in km/s) and baryonic mass (Mb in solar masses), featuring various data points distinguished by shape and color, including gray squares, green squares, orange triangles, and blue circles to represent different types of galaxies.
The same data as above with the addition of simulated dwarfs (orange triangles) from the Latte LCDM simulation of Wetzel et al. (2016), specifically the simulated satellites in the top panel of their Fig. 3. Note that we plot Vf = 2σ for pressure supported systems, both real and simulated.

The individual simulated dwarf satellites of Wetzel et al. (2016) follow the extrapolation of the line predicted by Trujillo-Gomez-et al. (2011). To first order, it is the same result to higher resolution (i.e., smaller galaxy mass). Most of the simulated objects have velocity dispersions that are higher than observed in real galaxies. Intriguingly, there are a couple of simulated objects with M* ~ 5 x 106 M that fall nicely among the data where there are both star-dominated and gas-rich galaxies. However, these two are exceptions; the rule appears to be characteristic speeds that are higher than observed.

The lowest mass simulated satellite objects begin to approach the ultrafaint regime, but resolution continues to be an issue: they’re not really there yet. This hasn’t precluded many people from assuming that dark matter will work where MOND fails, which seems like a heck of a presumption given that MOND has been consistently more successful up until that point. Where MOND underpredicts the characteristic velocity of ultrafaints, LCDM hasn’t yet made a clear prediction, and it overpredicts velocities for objects of slightly larger mass. Ain’t no theory covering itself in glory here, but this is a good example where objects that are a problem for MOND are also a problem for dark matter, and it seems likely that non-equilibrium dynamics play a role in either case.

Comparing apples with apples

A persistent issue with comparing simulations to reality is extracting comparable measures. Where circular velocities are measured from velocity fields in rotating galaxies and estimated from measured velocity dispersions in pressure supported galaxies, the most common approach to deriving rotation curves from simulated objects is to sum up particles in spherical shells and assume V2 = GM/R. These are not the same quantities. They should be proxies for one another, but equality holds only in the limit of isotropic orbits in spherical symmetry. Reality is messier than that, and simulations aren’t that simple either%.

Sales et al. (2017) make the effort to make a better comparison between what is observed given how it is observed, and what the simulations would show for that quantity. Others have made a similar effort; a common finding is that the apparent rotation speeds of simulated gas disks do not trace the gravitational potential as simply as GM/R. That’s no surprise, but most simulated rotation curves do not look like those of real galaxies^, so the comparison is not straightforward. Those caveats aside, Sales et al. (2017) are doing the right thing in trying to make an apples-to-apples comparison between simulated and observed quantities. They extract from simulations a quantity Vout that is appropriate for comparison with what we observe in the outer parts of rotation curves. So here is the resulting prediction for the BTFR:

A graph plotting the baryonic mass (Mb in solar masses) against the characteristic flat rotation speed (Vf in km/s) for various galaxies, showing a curve that describes the baryonic Tully-Fisher relation. The scatter points include different types of galaxies, with green squares indicating specific categories.
The same data as above with the addition of the line predicted by LCDM in the model of Sales et al. (2017), specifically the formula for Vout in their Table 2 which is their proxy for the observable rotation speed.

That’s pretty good. It still misses at high masses (those two big blue points at the top are Andromeda and the Milky Way) and it still bends away from the data at low masses where there are both star-dominated and gas-rich galaxies. (There are a lot more examples of the latter that I haven’t used here because the plot gets overcrowded.) Despite the overshoot, the use of an observable aspect of the simulations gets closer to the data, and the prediction flattens out in the same qualitative sense. That’s good, so one might see cause for hope that this problem is simply a matter of making a fair comparison between simulations and data. We should also be careful not to over-interpret it: I’ve simply plotted the formula they give; the simulations to which they fit it surely do not resolve ultrafaint dwarfs, so really the line should stop at some appropriate mass scale.

Nevertheless, it makes sense to look more closely at what is observed vs. what is simulated. This has recently been done in greater detail by Ruan et al. (2025). They consider two simulations that implement rather different feedback; both wind up producing rotating, gas rich dwarfs that actually fall on the BTFR.

Scatter plot illustrating the baryonic Tully-Fisher relation, showing the relationship between characteristic circular velocity (Vf) and baryonic mass (Mb) for various galaxy types, including data points for ultrafaint dwarfs.
The same data as above with the addition of simulated dwarfs of Ruan et al. (2025), specifically from the top right panel of their Fig. 6. The orange circles are their “massives” and the red triangles the “marvels” (the distinction refers to different feedback models).

Finally some success after all these years! Looking at this, it is tempting to declare victory: problem solved. It was just a matter of doing the right simulation all along, and making an apples-to-apples comparison with the data.

That sounds too goo to be true. Is it repeatable in other simulations? What works now that didn’t before?

These are high resolution simulations, but they still don’t resolve ultrafaints. We’re talking here about gas-rich dwarfs. That’s also an important topic, so let’s look more closely. What works now is in the apples-to-apples assessment: what we would measure for Vout is less than Vmax (related to V200) of the halo:

A graph displaying two panels: the top panel shows the relation between the ratio of mid-outward velocity to maximum velocity (Vout, mid / Vmax, mid) and the logarithm of baryonic mass (Mbar), with data points represented as circles and triangles. The bottom panel illustrates the relationship between the ratio of outer radius to maximum radius (Rout, mid / Rmax, mid) and the logarithm of baryonic mass, also featuring similar data points.
Two panels from Fig. 7 of Ruan et al. (2025) showing the ratio of the velocity we might observe relative to the characteristic circular velocity of the halo (top) and the ratio of the radii where these occur (bottom).

The treatment of cold gas in simulations has improved. In these simulations, Vout(Rout) is measured where the gas surface density falls to 1 M pc-2, which is typical of many observations. But the true rotation curve is still rising for objects with Mb < a few x 108 M; it has not yet reached a value that is characteristic of the halo. So the apparent velocity is low, even if the dark matter halos are doing basically the same thing as before:

Graph showing the baryonic Tully-Fisher relation, with velocity Vf (km/s) plotted against baryonic mass Mb (solar masses). Data points include various galaxies and dwarf galaxies, with error bars indicating measurement uncertainties. A red line represents the best-fit relation.
As above, but with the addition of the true Vmax (small black dots) of the simulated halos discussed by Ruan et al. (2025), which follow the relation of Sales et al. (2017) (line for Vmax in their Table 2).

I have mixed feelings about this. On the one hand, there are many dwarf galaxies with rising rotation curves that we don’t see flatten out, so it is easy to imagine they might keep going up, and I find it plausible that this is what we would find if we looked harder. So plausible that I’ve spend a fair amount of time doing exactly this. Not all observations terminate at 1 M pc-2, and whenever we push further out, we see the same damn thing over and over: the rotation curve flattens out and stays flat!!. That’s been my anecdotal experience; getting beyond that systematically is the point of the MOHNGOOSE survey. This was constructed to detect much lower atomic gas surface densities, and routinely detects gas at the 0.1 M pc-2 level where Ruan et al. suggest we should see something closer to Vmax. So far, we don’t.

I don’t want to sound too negative, because how we map what we predict in simulations to what we measure in observations is a serious issue. But it seems a bit of a stretch for a low-scatter power law BTFR to be the happenstance of observational sensitivity that cuts in at a convenient mass scale. So far, we see no indication of that in more sensitive observations. I’ll certainly let you know if that changes.

Survey says…

At this juncture, we’ve examined enough examples that the reader can appreciate my concern that LCDM models can predict rather different things. What does the theory really predict? We can’t really test it until we agree what it should do!!!.

I thought it might be instructive to combine some of the models discussed above. It is.

Graph illustrating the correlation between the characteristic flat rotation speed (Vf) and baryonic mass (Mb) of galaxies. The plot features data points in different colors representing various galaxy types, with lines indicating theoretical trends and empirical relations.
Some of the LCDM predictions discussed above shown together. The dotted line to the right of the data is the halo mass-velocity relation, which is the one thing we all agree LCDM predicts but which is observationally inaccessible. The grey band is a Mo, Mao, & White-type model with fd = 0.025. The red dotted line is the model of Trujillo-Gomez-et al. (2011); the solid red line that of Sales et al. (2017) for Vmax.

The models run together, more or less, for high mass galaxies. Thanks to observational selection effects, these are the objects we’ve always known about and matched our theories to. In order to test a theory, one wants to force it to make predictions in new regimes it wasn’t built for. Low mass galaxies do that, as do low surface brightness galaxies, which are often but not always low mass. MOND has done well for both, down to the ultrafaints we’re discussing here. LCDM does not yet explain those, or really any of the intermediate mass dwarfs.

What really disturbs me about LCDM models is their flexibility. It’s not just that they miss, it’s that it is possible to miss the data on either side of the BTFR. The older fd = constant models predict velocities that are too low for low mass galaxies. The more recent abundance matching models predict velocities that are too high for low mass galaxies. I have no doubt that a model can be constructed that gets it right, because there is obviously enough flexibility to do pretty much anything. Adding new parameters until we get it right is an example of epicyclic thinking, as I’ve been pointing out for thirty years. I don’t know what could be worse for an idea like dark matter that is not falsifiable.

We still haven’t come anywhere close to explaining the ultrafaints in either theory. In LCDM, we don’t even know if we should draw a curved line that catches them as if they’re in equilibrium, or start from a power-law BTFR and look for departures from that due to tidal effects. Both are possible in LCDM, both are plausible, as is some combination of both. I expect theorists will pick an option and argue about it indefinitely.

Tidal effects

The typical velocity dispersion of the ultrafaint dwarfs is too high for them to be in equilibrium in MOND. But there’s also pretty much no way these tiny things could be in equilibrium, being in the rough neighborhood dominated by our home, the cosmic gorilla. That by itself doesn’t make an explanation; we need to work out what happens to such things as they evolve dynamically under the influence of a pronounced external field. To my knowledge, this hasn’t been addressed in detail in MOND any more than in LCDM, though Brada & Milgrom addressed some of the relevant issues.

There is a difference in approach required for the two theories. In LCDM, we need to increase the resolution of simulations to see what happens to the tiniest of dark matter halos and their resident galaxies within the larger dark matter halos of giant galaxies. In MOND we have to simulate the evolution along the orbit of each unique individual. This is challenging on multiple levels, as each possible realization of a MOND theory requires its own code. Writing a simulation code for AQUAL requires a different numerical approach than QUMOND, and those are both modifications of gravity via the Poisson euqation. We don’t know which might be closer to reality; heck, we don’t even know [yet] if MOND is a modification of gravity or intertia, the latter being even harder to code.

Cold dark matter is scale-free, so crudely I expect ultrafaint dwarfs in LCDM to do the same as larger dwarf satellites that have been simulated: their outer dark matter halos are gradually whittled away by tidal stripping for many Gyr. At first the stars are unaffected, but eventually so little dark matter is left that the stars start to be lost impulsively during pericenter passages. Though the dark matter is scale free, the stars and the baryonic physics that made them are not, so that’s where it gets tricky. The apparent dark-to-luminous mass ratio is huge, so one possibility is that the ultrafaints are in equilibrium despite their environment; they just made ridiculously few stars from the amount of mass available. That’s consistent with a wild extrapolation of abundance matching models, but how it comes about physically is less clear. For example, at some low mass, a galaxy would make so few stars that none are massive enough to result in a supernova, so there is no feedback, which is what is preventing too many stars from forming. Awkward. Alternately, the constant exposure to tidal perturbation might stir things up, with the velocity dispersion growing and stars getting stripped to form tidal streams, so they may have started as more massive objects. Or some combination of both, plus the evergreen possibility of things that don’t occur to me offhand.

Equilibrium for ultrafaint satellites is not an option in MOND, but tidal stirring and stripping is. As a thought experiment, let’s imagine what happens to a low mass dwarf typical of the field that falls towards the Milky Way from some large distance. Initially gas-rich, the first environmental effect that it is likely to experience is ram pressure stripping by the hot coronal gas around the Milky Way. That’s a baryonic effect that happens in either theory; it’s nothing to do with the effective law of gravity. A galaxy thus deprived of much of its mass will be out of equilibrium; its internal velocities will be typical of the original mass but the stripped mass is less. Consequently, its structure must adjust to compensate; perhaps dwarf Irregulars puff up and are transformed into dwarf Spheroidals in this way. Our notional infalling dwarf may have time to equilibrate to its new mass before being subject to strong tidal perturbation by the Milky Way, or it may not. If not, it will have characteristic internal velocities that are too high for its new mass, and reside above the BTFR. I doubt this suffices to explain [m]any of the ultrafaints, as their masses are so tiny that some stellar mass loss is also likely to have occurred.

Let’s suppose that our infalling dwarf has time to [approximately] equilibrate, or it simply formed nearby to begin with. Now it is a pressure supported system [more or less] on the BTFR. As it orbits the Milky Way, it feels an extra force from the external field. If it stays far enough out to remain in quasi-equilibrium in the EFE regime, then it will oscillate in size and velocity dispersion in phase with the strength of the external field it feels along its orbit.

If instead a satellite dips too close, it will be tidally disturbed and depart from equilibrium. The extra energy may stir it up, increasing its velocity dispersion. It doesn’t have the mass to sustain that, so stars will start to leak out. Tidal disruption will eventually happen, with the details depending on the initial mass and structure of the dwarf and on the eccentricity of its orbit, the distance of closest approach (pericenter), whether the orbit is prograde or retrograde relative to any angular momentum the dwarf may have… it’s complicated, so it is hard to generalize##. Nevertheless, we (McGaugh & Wolf 2010) anticipated that “the deviant dwarfs [ultrafaints] should show evidence of tidal disruption while the dwarfs that adhere to the BTFR should not.” Unlike LCDM where most of the damage is done at closest approach, we anticipate for MOND that “stripping of the deviant dwarfs should be ongoing and not restricted to pericenter passage” because tides are stronger and there is no cocoon of dark matter to shelter the stars. The effect is still maximized at pericenter, its just not as impulsive as in the some of the dark matter simulations I’ve seen.

This means that there should be streams of stars all over the sky. As indeed there are. For example:

A color-coded map of the northern sky displaying various stellar streams, indicated by labels such as 'Gaia-1*', 'Gaia-3*', and 'GD-1'. The color gradient represents velocity in kilometers per second, with colors ranging from blue for lower velocities to red for higher velocities.
Stellar streams in the Milky Way identified using Gaia (Malhan et al. 2018).

As a tidally influence dwarf dissolves, the stars will leak out and form a trail. This happens in LCDM too, but there are differences in the rate, coherence, and symmetry of the resulting streams. Perhaps ultrafaint dwarfs are just the last dregs of the tidal disruption process. From this perspective, it hardly matters if they originated as external satellites or are internal star clusters: globular clusters native to the Milky Way should undergo a similar evolution.

Evolutionary tracks

Perhaps some of the ultrafaint dwarfs are the nuggets of disturbed systems that have suffered mass loss through tidal stripping. That may be the case in either LCDM or MOND, and has appealing aspects in either case – we went through all the possibilities in McGaugh & Wolf (2010). In MOND, the BTFR provides a reference point for what a stable system in equilibrium should do. That’s the starting point for the evolutionary tracks suggested here:

A graph plotting flat rotation speed (Vf) in km/s against baryonic mass (Mb) in solar masses. The data points include various galaxies represented as blue circles and green squares, with error bars indicating measurement uncertainty. A solid black line demonstrates the overall trend, while red curves suggest alternative theoretical predictions.
BTFR with conceptual evolutionary tracks (red lines) for tidally-stirred ultrafaint dwarfs.

Objects start in equilibrium on the BTFR. As they become subject to the external field, their velocity dispersions first decreases as they transition through the quasi-Newtonian regime. As tides kick in, stars are lost and stretched along the satellite’s orbit, so mass is lost but the apparent velocity dispersion increases as stars gradually separate and stretch out along a stream. Their relative velocities no longer represent a measure of the internal gravitational potential; rather than a cohesive dwarf satellite they’re more an association of stars in similar orbits around the Milky Way.

This is crudely what I imagine might be happening in some of the ultrafaint dwarfs that reside above the BTFR. Reality can be more complicated, and probably is. For example, objects that are not yet disrupted may oscillate around and below the BTFR before becoming completely unglued. Moreover, some individual ultrafaints probably are not real, while the data for others may suffer from systematic uncertainties. There’s a lot to sort out, and we’ve reached the point where the possibility of non-equilibrium effects cannot be ignored.

As a test of theories, the better course remains to look for new galaxies free from environmental perturbation. Ultrafaint dwarfs in the field, far from cosmic gorillas like the Milky Way, would be ideal. Hopefully many will be discovered in current and future surveys.


!Other examples exist and continue to be discovered. More pertinent to my thinking is that the mass threshold at which reionization is supposed to suppress star formation has been a constantly moving goal post. To give an amusing anecdote, while I was junior faculty at the University of Maryland (so at least twenty years ago), Colin Norman called me up out of the blue. Colin is an expert on star formation, and had a burning question he thought I could answer. “Stacy,” he says as soon as I pick up, “what is the lowest mass star forming galaxy?” Uh, Hi, Colin. Off the cuff and totally unprepared for this inquiry, I said “um, a stellar mass of a few times 107 solar masses.” Colin’s immediate response was to laugh long and loud, as if I had made the best nerd joke ever. When he regained his composure, he said “We know that can’t be true as reionization will prevent star formation in potential wells that small.” So, after this abrupt conversation, I did some fact-checking, and indeed, the number I had pulled out of my arse on the spot was basically correct, at that time. I also looked up the predictions, and of course Colin knew his business too; galaxies that small shouldn’t exist. Yet they do, and now the minimum known is two orders of magnitude lower in mass, with still no indication that a lower limit has been reached. So far, the threshold of our knowledge has been imposed by observational selection effects (low luminosity galaxies are hard to see), not by any discernible physics.

More recently, McQuinn et al. (2024) have made a study of the star formation histories of Leo P and a few similar galaxies that are near enough to see individual stars so as to work out the star formation rate over the course of cosmic history. They argue that there seems to be a pause in star formation after reionization, so a more nuanced version of the hypothesis may be that reionization did suppress star forming activity for a while, but these tiny objects were subsequently able to re-accrete cold gas and get started again. I find that appealing as a less simplistic thing that might have happened in the real universe, and not just a simple on/off switch that leaves only a fossil. However, it isn’t immediately clear to me that this more nuanced hypothesis should happen in LCDM. Once those baryons have evaporated, they’re gone, and it is far from obvious that they’ll ever come back to the weak gravity of such a small dark matter halo. It is also not clear to me that this interpretation, appealing as it is, is unique: the reconstructed star formation histories also look consistent with stochastic star formation, with fluctuations in the star formation rate being a matter of happenstance that have nothing to do with the epoch of reionization.

#So how are ultrafaint dwarfs different from star clusters? Great question! Wish we had a great answer.

Some ultrafaints probably are star clusters rather than independent satellite galaxies. How do we tell the difference? Chiefly, the velocity dispersion: star clusters show no need for dark matter, while ultrafaint dwarfs generally appear to need a lot. This of course assumes that their measured velocity dispersions represent an equilibrium measure of their gravitational potential, which is what we’re questioning here, so the opportunity for circular reasoning is rife.

$Rather than apply a strict luminosity cut, for convenience I’ve kept the same “not safe from tidal disruption” distinction that we’ve used before. Some of the objects in the 105 – 106 M range might belong more with the classical dwarfs than with the ultrafaints. This is a reminder that our nomenclature is terrible more than anything physically meaningful.

&Astronomy is an observational science, not a laboratory science. We can only detect the photons nature sends our way. We cannot control all the potential systematics as can be done in an enclosed, finite, carefully controlled laboratory. That means there is always the potential for systematic uncertainties whose magnitude can be difficult to estimate, or sometimes to even be aware of, like how local variations impact Jeans analyses. This means we have to take our error bars with a grain of salt, often such a big grain as to make statistical tests unreliable: goodness of fit is only as meaningful as the error bars.

I say this because it seems to be the hardest thing for physicists to understand. I also see many younger astronomers turning the crank on fancy statistical machinery as if astronomical error bars can be trusted. Garbage in, garbage out.

*This is an example of setting a parameter in a model “by hand.”

**The transition to thinking in terms of the luminosity function rather than Tully-Fisher is so complete that the most recent, super-large, Euclid flagship simulation doesn’t even attempt to address the kinematics of individual galaxies while giving extraordinarily detailed and extensive details about their luminosity distributions. I can see why they’d do that – they want to focus on what the Euclid mission might observe – but it is also symptomatic of the growing tendency to I’ve witnessed to just not talk about those pesky kinematics.

%Halos in dark matter simulations tend to be rather triaxial, i.e., a 3D bloboid that is neither spherical like a soccer ball nor oblate like a frisbee nor prolate like an American football: each principle axis has a different length. If real halos were triaxial, it would lead to non-circular orbits in dark matter-dominated galaxies that are not observed.

The triaxiality of halos is a result from dark matter-only simulations. Personally, I suspect that the condensation of gas within a dark matter halo (presuming such things exist) during the process of galaxy formation rounds-out the inner halo, making it nearly spherical where we are able to make measurements. So I don’t see this as necessarily a failure of LCDM, but rather an example of how more elaborate simulations that include baryonic physics are sometimes warranted. Sometimes. There’s a big difference between this process, which also compresses the halo (making it more dense when it already starts out too dense), and the various forms of feedback, which may or may not further alter the structure of the halo.

^There are many failure modes in simulated rotation curves, the two most common being the cusp-core problem in dwarfs and sub-maximal disks in giants. It is common for the disks of bright spiral galaxies to be nearly maximal in the sense that the observed stars suffice to explain the inner rotation curve. They may not be completely maximal in this sense, but they come close for normal stellar populations. (Our own Milky Way is a good example.) In contrast, many simulations produce bright galaxies that are absurdly sub-maximal; EAGLE and SIMBA being two examples I remember offhand.

Another common problem is that LCDM simulations often don’t produce rotation curves that are as flat as observed. This was something I also found in my early attempts at model-building with dark matter halos. It is easy to fit a flat rotation curve given the data, but it is hard to predict a priori that rotation curves should be flat.

!!Gravitational lensing indicates that rotation curves remain flat to even larger radii. However, these observations are only sensitive to galaxies more massive than those under discussion here. So conceivably there could be another coincidence wherein flatness persists for galaxies with Mb > 1010 M, but not those with Mb < 109 M.

!!!Many in the community seem to agree that it will surely work out.

##I’ve tried to estimate dissolution timescales, but find the results wanting. For plausible assumptions, one finds timescales that seem plausible (a few Gyr) but with some minor fiddling one can also find results that are no-way that’s-too-short (a few tens of millions of years), depending on the dwarf and its orbit. These are crude analytic estimates; I’m not satisfied that these numbers were particularly meaningful. Still, this is a worry with the tidal-stirring hypothesis: will perturbed objects persist long enough to be observed as they are? This is another reason we need detailed simulations tailored to each object.


*&^#Note added after initial publication: While I was writing this, a nice paper appeared on exactly this issue of the star formation history of a good number of ultrafaint dwarfs. They find that 80% of the stellar mass formed 12.48 ± 0.18 Gyr ago, so 12.5 was a good guess. Formally, at the one sigma level, this is a little after reionization, but only a tiny bit, so close enough: the bulk of the stars formed long ago, like a classical globular cluster, and these ultrafaints are consistent with being fossils.

Intriguingly, there is a hint of an age difference by kinematic grouping, with things that have been in the Milky Way being the oldest, those on first infall being a little younger (but still very old), and those infalling with the Large Magellanic Cloud a tad younger still. If so, then there is more to the story than quenching by cosmic reionization.

They also show a nice collection of images so you can see more examples. The ellipses trace out the half-light radii, so can see the proclivity for many (not all!) of these objects to be elongated, perhaps as a result of tidal perturbation:

Figure 2 from Durbin et al. (2025)Footprints of all HST observations (blue filled patches) overlaid on DSS2 imaging cutouts. Open black ellipses show the galaxy profiles at one half-light radius.

Non-equilibrium dynamics in galaxies that appear to lack dark matter: ultradiffuse galaxies

Non-equilibrium dynamics in galaxies that appear to lack dark matter: ultradiffuse galaxies

Previously, we discussed non-equilibrium dynamics in tidal dwarf galaxies. These are the result of interactions between giant galaxies that are manifestly a departure from equilibrium, a circumstance that makes TDGs potentially a decisive test to distinguish between dark matter and MOND, and simultaneously precludes confident application of that test. There are other galaxies for which I suspect non-equilibrium dynamics may play a role, among them some (not all) of the so-called ultradiffuse galaxies (UDGs).

UDGs

The term UDG has been adopted for galaxies below a certain surface brightness threshold with a size (half-light radius) in excess of 1.5 kpc (van Dokkum et al. 2015). I find the stipulation about the size to be redundant, as surface brightness* is already a measure of diffuseness. But OK, whatever, these things are really spread out. That means they should be good tests of MOND like low surface brightness galaxies before them: their low stellar surface densities mean** that they should be in the regime of low acceleration and evince large mass discrepancies when isolated. It also makes them susceptible to the external field effect (EFE) in MOND when they are not isolated, and perhaps also to tidal disruption.

To give some context, here is a plot of the size-mass relation for Local Group dwarf spheroidals. Typically they have masses comparable to globular clusters, but much large sizes – a few hundred parsecs instead of just a few. As with more massive galaxies, these pressure supported dwarfs are all over the place – at a give mass, some are large while others are relatively compact. All but the one most massive galaxy in this plot are in the MOND regime. For convenience, I’ll refer to the black points labelled with names as UDGs+.

The size (radius encompassing half of the total light) and stellar mass of Local Group dwarf spheroidals (green points selected by McGaugh et al. 2021 to be relatively safe from external perturbation) along with two more Local Group dwarfs that are subject to the EFE (Crater 2 and Antlia 2) and the two UDGs NGC 1052-DF2 and DF4. Dotted lines show loci of constant surface density. For reference, the solar neighborhood has ~40 M pc-2; the centers of high surface brightness galaxies frequently exceed 1,000 M pc-2.

The UDGs are big and diffuse. This makes them susceptible to the EFE and tidal effects. The lower the density of a system, the easier it is for external systems to mess with it. The ultimate example is something gets so close to a dominant central mass that it gets tidally disrupted. That can happen conventionally; the stronger effective force of MOND increases tidal effects. Indeed, there is only a fairly narrow regime between the isolated case and tidally-induced disequilibrium where the EFE modifies the internal dynamics in a quasi-static way.

The trouble is the s-word: static. In order to test theories, we assume that the dynamical systems we observe are in equilibrium. Though often a good assumption, it doesn’t always hold. If we forget we made the assumption, we might think we’ve falsified a theory when all we’ve done is discover a system that is out of equilibrium. The universe is a very dynamic place – the whole thing is expanding, after all – so we need to be wary of static thinking.

Equilibrium MOND formulae

That said, let’s indulge in some static thinking. An isolated, pressure supported galaxy in the MOND regime will have an equilibrium velocity dispersion

where M is the mass (the stellar mass in the case of a gas-free dwarf spheroidal), G is Newton’s constant, and a0 is Milgrom’s acceleration constant. The number 4/81 is a geometrical factor that assumes we’re observing a spherical system with isotropic orbits, neither of which is guaranteed even in the equilibrium case, and deviations from this idealized situation are noticeable. Still, this is as simple as it gets: if you know the mass, you can predict the characteristic speed at which stars move. Mass is all that matters: we don’t care about the radius as we must with Newton (v2 = GM/r); the only other quantities are constants of nature.

But what do we mean by isolated? In MOND, it is that the internal acceleration of the system, gin, exceeds that from external sources, gex: gingex. For a pressure supported dwarf, gin ≈ 3σ2/r (so here the size of the dwarf does matter, as does the location of a star within it), while the external field from a giant host galaxy would be gex = Vf2/D where Vf is the flat rotation speed stipulated by the baryonic mass of the host and D is the distance from the host to the dwarf satellite. The distance is not a static quantity. As a dwarf orbits its host, D will vary by an amount that depends on the eccentricity of the orbit, and the external field will vary with it, so it is possible to have an orbit in which a dwarf satellite dips in and out of the EFE regime. Many Local Group dwarfs straddle the line gingex, and it takes time to equilibrate, so static thinking can go awry.

It is possible to define a sample of Local Group dwarfs that have sufficiently high internal accelerations (but also in the MOND regime with gexgin ≪ a0) that we can pretend they are isolated, and the above equation applies. Such dwarfs should& fall on the BTFR, which they do:

The baryonic Tully-Fisher relation (BTFR) including pressure supported dwarfs (green points) with their measured velocity dispersions matched to the flat rotation speeds of rotationally supported galaxies (blue points) via the prescription of McGaugh et al. (2021). The large blue points are rotators in the Local Group (with Andromeda and the Milky Way up near the top); smaller points are spirals with direct distance measurements (Schombert et al. 2020). The Local Group dwarfs assessed to be safe from external perturbation are on the BTFR (for Vf = 2σ); Crater 2 and the UDGs near NGC 1052 are not.

In contrast, three of the four the UDGs considered here do not fall on the BTFR. Should they?

Conventionally, in terms of dark matter, probably they should. There is no reason for them to deviate from whatever story we make up to explain the BTFR for everything else. That they do means we have to make up a separate story for them. I don’t want to go deeply into this here since the cold dark matter model doesn’t really explain the observed BTFR in the first place. But even accepting that it does so after invoking feedback (or whatever), does it tolerate deviants? In a broad sense, yes: since it doesn’t require the particular form of the BTFR that’s observed, it is no problem to deviate from it. In a more serious sense, no: if one comes up with a model that explains the small scatter of the BTFR, it is hard to make that same model defy said small scatter. I know, I’ve tried. Lots. One winds up with some form of special pleading in pretty much any flavor of dark matter theory on top of whatever special pleading we invoked to explain the BTFR in the first place. This is bad, but perhaps not as bad as it seems once one realizes that not everything has to be in equilibrium all the time.

In MOND, the BTFR is absolute – for isolated systems in equilibrium. In the EFE regime, galaxies can and should deviate from it even if they are in equilibrium. This always goes in the sense of having a lower characteristic velocity for a given mass, so below the line in the plot. To get above the line would require being out of equilibrium through some process that inflates velocities (if systematic errors are not to blame, which also sometimes happens.)

The velocity dispersion in the EFE regime (gingex ≪ a0) is slightly more complicated than this isolated case:

This is just like Newton except the effective value of the gravitational constant is modified. It gets a boost^ by how far the system is in the MOND regime: GeffG(a0/gex). An easy way to tell which regime an object is in is to calculate both velocity dispersions σiso and σefe: the smaller one is the one that applies#. An upshot of this is that systems in the EFE regime should deviate from the BTFR to the low velocity side. The amplitude of the deviation depends on the system and the EFE: both the size and mass matter, as does gex. Indeed, if an object is on an eccentric orbit, then the velocity dispersion can vary with the EFE as the distance of the satellite from its host varies, so over time the object would trace out some variable path in the BTFR plane.

Three of the four UDGs fall off the BTFR, so that sounds mostly right, qualitatively. Is it? Yes, for Crater 2, but but not really for the others. Even for Crater 2 it is only a partial answer, as non-equilibrium effects may play a role. This gets involved for Crater 2, then more so for the others, so let’s start with Crater 2.

Crater 2 – the velocity dispersion

The velocity dispersion of Crater 2 was correctly predicted a priori by the formula for σefe above. It is a tiny number, 2 km/s, and that’s what was subsequently observed. Crater 2 is very low mass, ~3 x 105 M, which is barely a globular cluster, but it is even more spread out than the typical dwarf spheroidal, having an effective surface density of only ~0.05 Mpc-2. If it were isolated, MOND predicts that it would have a higher velocity dispersion – all of 4 km/s. That’s what it would take to put it on the BTFR above. The seemingly modest difference between 2 and 4 km/s makes for a clear offset. But despite its substantial current distance from the Milky Way (~ 120 kpc), Crater 2 is so low surface density that it is still subject to the external field effect, which lowers its equilibrium velocity dispersion. Unlike isolated galaxies, it should be offset from the BTFR according to MOND.

LCDM struggles to explain the low mass end of the BTFR because it predicts a halo mass-circular speed relation Mhalo ~ Vhalo3 that differs from the observed Mb ~ Vf4. A couple of decades ago, it looked like massive galaxies might be consistent with the lower power-law, but that anticipates higher velocities for small systems. The low velocity dispersion of Crater 2 is thus doubly weird in LCDM. It’s internal velocities are too small not just once – the BTFR is already lower than was expected – but twice, being below even that.

An object with a large radial extent like Crater 2 probes far out into its notional dark matter halo, making the nominal prediction$ of LCDM around ~17 km/s, albeit with a huge expected scatter. Even if we can explain the low mass end of the BTFR and its unnaturally low scatter in LCDM, we now have to explain this exception to it – an exception that is natural in MOND, but is on the wrong side of the probability distribution for LCDM. That’s one of the troubles with tuning LCDM to mimic MOND: if you succeed in explaining the first thing, you still fail to anticipate the other. There is no EFE% in LCDM, no reason to anticipate that σefe applies rather than σiso, and no reason to expect via feedback that this distinction has anything to do with the dynamical accelerations gin and gex.

But wait – this is a post about non-equilibrium dynamics. That can happen in LCDM too. Indeed, one expects that satellite galaxies suffer tidal effects in the field of their giant host. The primary effect is that the dark matter subhalos in which dwarf satellites reside are stripped from the outside in. Their dark matter becomes part of the large halo of the host. But the stars are well-cocooned in the inner cusp of the NFW halo which is more robust than the outskirts of the subhalo, so the observable velocity dispersion barely evolves until most of the dark mass has been stripped away. Eventually, the stars too get stripped, forming tidal streams. Most of the damage occurs during pericenter passage when satellites are closest to their host. What’s left is no longer in equilibrium, with the details depending on the initial conditions of the dwarf on infall, the orbit, the number of pericenter passages, etc., etc.

What does not come out of this process is Crater 2 – at least not naturally. It has stars very far out – these should get stripped outright if the subhalo has been eviscerated to the point where its velocity dispersion is only 2 km/s. This tidal limitation has been noted by Errani et al.: “the large size of kinematically cold ‘feeble giant’ satellites like Crater 2 or Antlia 2 cannot be explained as due to tidal effects alone in the Lambda Cold Dark Matter scenario.” To save LCDM, we need something extra, some additional special pleading on top of non-equilibrium tidal effects, which is why I previously referred to Crater 2 as the Bullet Cluster of LCDM: an observation so problematic that it amounts to a falsification.

Crater 2 – the orbit

We held a workshop on dwarf galaxies on CWRU’s campus in 2017 where issues pertaining to both dark matter and MOND discussed. The case of Crater 2 was one of the things discussed, and it was included in the list of further tests for both theories (see above links). Basically the expectation in LCDM is that most subhalo orbits are radial (highly eccentric), so that is likely to be the case for Crater 2. In contrast, the ultradiffuse blob that is Crater 2 would not survive a close passage by the Milky Way given the strong tidal force exerted by MOND, so the expectation was for a more tangential (quasi-circular) orbit that keeps it at a safe distance.

Subsequently, it became possible to constrain orbits with Gaia data. The exact orbit depends on the gravitational potential of the Milky Way, which isn’t perfectly known. However, several plausible choices of the global potential give an an eccentricity around 0.6. That’s not exactly radial, but it’s pretty far from circular, placing the pericenter around 30 kpc. That’s much closer than its current distance, and well into the regime where it should be tidally disrupted in MOND. No way it survives such a close passage!

So which is it? MOND predicted the correct velocity dispersion, which LCDM struggles to explain. Yet the orbit is reasonable in LCDM, but incompatible with MOND.

Simulations of dwarf satellites

It occurs to me that we might be falling victim to static thinking somewhere. We talked about the impact of tides on dark matter halos a bit above. What should we expect in MOND?

The first numerical simulations of dwarf galaxies orbiting a giant host were conducted by Brada & Milgrom (2000). Their work is specific to the Aquadratic Lagrangian (AQUAL) theory proposed by Bekenstein & Milgrom (1984). This was the first demonstration that it was possible to write a version of MOND that conserved momentum and energy. Since then, a number of different approaches have been demonstrated. These can be subtly different, so it is challenging to know which (if any) is correct. Sorting that out is well beyond the scope of this post, so let’s stick to what we can learn from Brada & Milgrom.

Brada & Milgrom followed the evolution of low surface density dwarfs of a range of masses as they orbited a giant host galaxy. One thing they found was that the behavior of the numerical model could deviate from the analytic expectation of quasi-equilibrium enshrined in the equations above. For an eccentric orbit, the external field varies with distance from the host. If there is enough time to respond to this, the change can be adiabatic (reversible), and the static approximation may be close enough. However, as the external field varies more rapidly and/or the dwarf is more fragile, the numerical solution departs from the simple analytic approximation. For example:

Fig. 2 of Brada & Milgrom (2000): showing the numerically calculated (dotted line) variation of radius (left) and characteristic velocity (right) for a dwarf on a mildly eccentric orbit (peri- and apocenter of roughly 60 and 90 kpc, respectively, for a Milky Way-like host). Also shown is the variation in the EFE as the dwarf’s distance from the host varies (solid line). Dwarfs go through a breathing mode of increasing/decreasing size and decreasing/increasing velocity dispersion in phase with the orbit. If this process is adiabatic, it tracks the solid line and the static EFE approximation holds. This is not always the case in the simulation, so applying our usual assumption of dynamical equilibrium will result in an error stipulated by the difference between the dotted and solid lines. The amplitude of this error depends on the size, mass, and orbital history of each and every dwarf satellite.

As long as the behavior is adiabatic, the dwarf can be stable indefinitely even as it goes through periodic expansion and contraction in phase with the orbit. Departure from adiabaticity means that every passage will be different. Some damage will be done on the first passage, more on the second, and so on. As a consequence, reality will depart from our simple analytic expectations.

I was aware of this when I made the prediction for the velocity dispersion of Crater 2, and hedged appropriately. Indeed, I worried that Crater 2 should already be out of equilibrium. Nevertheless, I took solace in two things: first, the orbital timescale is long, over a Gyr, so departures from the equilibrium prediction might not have had time to make a dramatic difference. Second, this expectation is consistent with the slow evolution of the characteristic velocity for the most Crater 2-like, m=1 model of Brada & Milgrom (bottom track in the right panel below):

Fig. 4 of Brada & Milgrom (2000): The variation of the size and characteristic velocity of dwarf models of different mass. The more massive models approximate the adiabatic limit, which gradually breaks down for the lowest mass models. In this example, the m = 1 and 2 models explode, with the scale size growing gradually without recovering.

What about the size? That is not constant except for the most massive (m=16) model. The m=3 and 4 models recover, albeit not adiabatically. The m=4 model almost returns to its original size, but the m=3 model has puffed up after one orbit. The m=1 and 2 models explode.

One can see this by eye. The continuous growth in radii of the lower mass models is obvious. If one looks closely, one can also see the expansion then contraction of the heavier models.

Fig. 5 of Brada & Milgrom (2000): AQUAL numerical simulations dwarf satellites orbiting a more massive host galaxy. The parameter m describes the mass and effective surface density of the satellite; all the satellites are in the MOND regime and subject to the external field of the host galaxy, which exceeds their internal accelerations. In dimensionless simulation units, m = 5 x 10-5, which for a satellite of the Milky Way corresponds roughly to a stellar mass of 3 x 106 M. For real dwarf satellite galaxies, the scale size is also relevant, but the sequence of m above suffices to illustrate the increasingly severe effects of the external field as m decreases.

The current size of Crater 2 is unusual. It is very extended for its mass. If the current version of Crater 2 has a close passage with the Milky Way, it won’t survive. But we know it already had a close passage, so it should be expanding now as a result. (I did discuss the potential for non-equilibrium effects.) Knowing now that there was a pericenter passage in the (not exactly recent) past, we need to imagine running back the clock on the simulations. It would have been smaller in the past, so maybe it started with a normal size, and now appears so large because of its pericenter passage. The dynamics predict something like that; it is static thinking to assume it was always thus.

The dotted line shows a possible evolutionary track for Crater 2 as it expands after pericenter passage. Its initial condition would have been amongst the other dwarf spheroidals. It could also have lost some mass in the process, so any of the green low-mass dwarfs might be similar to the progenitor.

This is a good example of a phenomena I’ve encountered repeatedly with MOND. It predicts something right, but seems to get something else wrong. If we’re already sure it is wrong, we stop there and never think further. But when one bothers to follow through on what the theory really predicts, more often than not the apparently problematic observation is in fact what we should have expected in the first place.

DF2 and DF4

DF2 and DF4 are two UDGs in the vicinity of the giant galaxy NGC 1052. They have very similar properties, and are practically identical in terms of having the same size and mass within the errors. They are similar to Crater 2 in that they are larger than other galaxies of the same mass.

When it was first discovered, NGC 1052-DF2 was portrayed as a falsification of MOND. On closer examination, had I known about it, I could have used MOND to correctly predict its velocity dispersion, just like the dwarfs of Andromeda. This seemed like yet another case where the initial interpretation contrary to MOND melted away to actually be a confirmation. At this point, I’ve seen literally hundreds^^of cases like that. Indeed, this particular incident made me realize that there would always be new cases like that, so I decided to stop spending my time addressing every single case.

Since then, DF2 has been the target of many intensive observing campaigns. Apparently it is easier to get lots of telescope time to observe a single object that might have the capacity to falsify MOND than it is to get a more modest amount to study everything else in the universe. That speaks volumes about community priorities and the biases that inform them. At any rate, there is now lots more data on this one object. In some sense there is too much – there has been an active debate in the literature over the best distance determination (which affects the mass) and the most accurate velocity dispersion. Some of these combinations are fine with MOND, but others are not. Let’s consider the worst case scenario.

In the worst case scenario, both DF2 and DF4 are too far from NGC 1052 for its current EFE to have much impact, and they have relatively low velocity dispersions for their luminosity, around 8 km/s, so they fall below the BTFR. Worse for MOND is that this is about what one expects from Newton for the stars alone. Consequently, these galaxies are sometimes referred to as being “dark matter free.” That’s a problem for MOND, which predicts a larger velocity dispersion for systems in equilibrium.

Perhaps we are falling prey to static thinking, and these objects are not in equilibrium. While their proximity to neighboring galaxies and the EFE to which they are presently exposed depends on the distance, which is disputed, it is clear that they live in a rough neighborhood with lots of more massive galaxies that could have bullied them in a close passage at some point in the past. Looking at Fig. 4 of Brada & Milgrom above, I see that galaxies whacked out of equilibrium not only expand in radius, potentially explaining the unusually large sizes of these UDGs, but they also experience a period during which their velocity dispersion is below the equilibrium value. The amplitude of the dip in these simulations is about right to explain the appearance of being dark-matter-free.

It is thus conceivable that DF2 and DF4 (the two are nearly identical in the relevant respects) suffered some sort of interaction that perturbed them into their current state. Their apparent absence of a mass discrepancy and the apparent falsification of MOND that follows therefrom might simply be a chimera of static thinking.

Make no mistake: this is a form of special pleading. The period of depressed velocity dispersion does not last indefinitely, so we have to catch them at a somewhat special time. How special depends on the nature of the interaction and its timescale. This can be long in intergalactic space (Gyrs), so it may not be crazy special, but we don’t really know how special. To say more, we would have to do detailed simulations to map out the large parameter space of possibilities for these objects.

I’d be embarrassed for MOND to have to make this kind of special pleading if we didn’t also have to do it for LCDM. A dwarf galaxy being dark matter free in LCDM shouldn’t happen. Galaxies form in dark matter halos; it is very hard to get rid of the dark matter while keeping the galaxy. The most obvious way to do it, in rare cases, is through tidal disruption, though one can come up with other possibilities. These amount to the same sort of special pleading we’re contemplating on behalf of MOND.

Recently, Tang et al. (2024) argue that DF2 and DF4 are “part of a large linear substructure of dwarf galaxies that could have been formed from a high-velocity head-on encounter of two gas-rich galaxies” which might have stripped the dark matter while leaving the galactic material. That sounds… unlikely. Whether it is more or less unlikely than what it would take to preserve MOND is hard to judge. It appears that we have to indulge in some sort of special pleading no matter what: it simply isn’t natural for galaxies to lack dark matter in a universe made of dark matter, just as it is unnatural for low acceleration systems to not manifest a mass discrepancy in MOND. There is no world model in which these objects make sense.

Tang et al. (2024) also consider a number of other possibilities, which they conveniently tabulate:

Table 3 from Tang et al. (2024).

There are many variations on awkward hypotheses for how these particular UDGs came to be in LCDM. They’re all forms of special pleading. Even putting on my dark matter hat, most sound like crazy talk to me. (Stellar feedback? Really? Is there anything it cannot do?) It feels like special pleading on top of special pleading; it’s special pleading all the way down. All we have left to debate is which form of special pleading seems less unlikely than the others.

I don’t find this debate particularly engaging. Something weird happened here. What that might be is certainly of interest, but I don’t see how we can hope to extract from it a definitive test of world models.

Antlia 2

The last of the UDGs in the first plot above is Antlia 2, which I now regret including – not because it isn’t interesting, but because this post is getting exhausting. Certainly to write, perhaps to read.

Antlia 2 is on the BTFR, which is ordinarily normal. In this case it is weird in MOND, as the EFE should put it off the BTFR. The observed velocity dispersion is 6 km/s, but the static EFE formula predicts it should only be 3 km/s. This case should be like Crater 2.

First, I’d like to point out that, as an observer, it is amazing to me that we can seriously discuss the difference between 3 and 6 km/s. These are tiny numbers by the standard of the field. The more strident advocates of cold dark matter used to routinely assume that our rotation curve observations suffered much larger systematic errors than that in order to (often blithely) assert that everything was OK with cuspy halos so who are you going to believe, our big, beautiful simulations or those lying data?

I’m not like that, so I do take the difference seriously. My next question, whenever MOND is a bit off like this, is what does LCDM predict?

I’ll wait.

Well, no, I won’t, because I’ve been waiting for thirty years, and the answer, when there is one, keeps changing. The nominal answer, as best I can tell, is ~20 km/s. As with Crater 2, the large scale size of this dwarf means it should sample a large portion of its dark matter halo, so the expected characteristic speed is much higher than 6 km/s. So while the static MOND prediction may be somewhat off here, the static LCDM expectation fares even worse.

This happens a lot. Whenever I come across a case that doesn’t make sense in MOND, it usually doesn’t make sense in dark matter either.

In this case, the failure of the static-case prediction is apparently caused by tidal perturbation. Like Crater 2, Antlia 2 may have a large half-light radius because it is expanding in the way seen in the simulations of Brada & Milgrom. But it appears to be a bit further down that path, with member stars stretched out along the orbital path. They start to trace a small portion of a much deeper gravitational potential, so the apparent velocity dispersion goes up in excess of the static prediction.

Fig. 9 from Ji et al. (2021) showing tidal features in Antlia 2 considering the effects of the Milky Way alone (left panel) and of the Milky Way and the Large Magellanic Cloud together (central panel) along with the position-velocity diagram from individual stars (right panel). The object is clearly not the isotropic, spherical cow presumed by the static equation for the velocity dispersion. Indeed, it is elongated as would be expected from tidal effects, with individual member stars apparently leaking out.

This is essentially what I inferred must be happening in the ultrafaint dwarfs of the Milky Way. There is no way that these tiny objects deep in the potential well of the Milky Way escape tidal perturbation%% in MOND. They may be stripped of their stars and their velocity dispersions mage get tidally stirred up. Indeed, Antlia 2 looks very much like the MOND prediction for the formation of tidal streams from such dwarfs made by McGaugh & Wolf (2010). Unlike dark matter models in which stars are first protected, then lost in pulses during pericenter passages, the stronger tides of MOND combined with the absence of a protective dark matter cocoon means that stars leak out gradually all along the orbit of the dwarf. The rate is faster when the external field is stronger at pericenter passage, but the mass loss is more continuous. This is a good way to make long stellar streams, which are ubiquitous in the stellar halo of the Milky Way.

So… so what?

It appears that aspects of the observations of the UDGs discussed here that seem problematic for MOND may not be as bad for the theory as they at first seem. Indeed, it appears that the noted problems may instead be a consequence of the static assumptions we usually adopt to do the analysis. The universe is a dynamic place, so we know this assumption does not always hold. One has to judge each case individually to assess whether this is reasonable or not.

In the cases of Crater 2 and Antlia 2, yes, the stranger aspects of the observations fit well with non-equilibrium effects. Indeed, the unusually large half-light radii of these low mass dwarfs may well be a result of expansion after tidal perturbation. That this might happen was specifically anticipated for Crater 2, and Antlia 2 fits the bill described by McGaugh & Wolf (2010) as anticipated by the simulations of Brada & Milgrom (2000) even though it was unknown at the time.

In the cases of DF2 and DF4, it is less clear what is going on. I’m not sure which data to believe, and I want to refrain from cherry-picking, so I’ve discussed the worst-case scenario above. But the data don’t make a heck of a lot of sense in any world view; the many hypotheses made in the dark matter context seem just as contrived and unlikely as a tidally-induced, temporary dip in the velocity dispersion that might happen in MOND. I don’t find any of these scenarios to be satisfactory.

This is a long post, and we have only discussed four galaxies. We should bear in mind that the vast majority of galaxies do as predicted by MOND; a few discrepant cases are always to be expected in astronomy. That MOND works at all is a problem for the dark matter paradigm: that it would do so was not anticipated by any flavor of dark matter theory, and there remains no satisfactory explanation of why MOND appears to happen in a universe made of dark matter. These four galaxies are interesting cases, but they may be an example of missing the forest for the trees.


*As it happens, the surface brightness threshold adopted in the definition of UDGs is exactly the same as I suggested for VLSBGs (very low surface brightness galaxies: McGaugh 1996), once the filter conversions have been made. At the time, this was the threshold of our knowledge, and I and other early pioneers of LSB galaxies were struggling to convince the community that such things might exist. Up until that time, the balance of opinion was that they did not, so it is gratifying to see that they do.

**This expectation is specific to MOND; it doesn’t necessarily hold in dark matter where the acceleration in the central regions of diffuse galaxies can be dominated by the cusp of the dark matter halo. These were predicted to exceed what is observed, hence the cusp-core problem.

+Measuring by surface brightness, Crater 2 and Antlia 2 are two orders of magnitude more diffuse than the prototypical ultradiffuse galaxies DF2 and DF4. Crater 2 is not quite large enough to count as a UDG by the adopted size definition, but Antlia 2 is. So does that make it super-ultra diffuse? Would it even be astronomy without terrible nomenclature?

&I didn’t want to use a MOND-specific criterion in McGaugh et al. (2021) because I was making a more general point, so the green points are overly conservative from the perspective of the MOND isolation criterion: there are more dwarfs for which this works. Indeed, we had great success in predicting velocity dispersions in exactly this fashion in McGaugh & Milgrom (2013a, 2013b). And XXVIII was a case not included above that we highlighted as a great test of MOND, being low mass (~4×105 M) but still qualifying as isolated, and its dispersion came in (6.6+2.9-2.1 km/s in one measurement, 4.9 ± 1.6 km/s in another) as predicted a priori (4.3+0.8-0.7 km/s). Hopefully the Rubin Observatory will discover many more similar objects that are truly isolated; these will be great additional tests, though one wonders how much more piling-on needs to be done.

^This is an approximation that is reasonable for the small accelerations involved. More generally we have Geff = G/μ(|gex+gin|/a0) where μ is the MOND interpolation function and one takes the vector sum of all relevant accelerations.

#This follows because the boost from MOND is limited by how far into the low acceleration regime an object is in. If the EFE is important, the boost will be less than in the isolated case. As we said in 2013, “the case that reports the lower velocity dispersion is always the formally correct one.” I mention it again here because apparently people are good at scraping equations from papers without reading the associated instructions, so one gets statements likethe theory does not specify precisely when the EFE formula should replace the isolated MOND prediction.” Yes it does. We told you precisely when the EFE formula should replace the isolated formula. It is when it reports the lower velocity dispersion. We also noted this as the reason for not giving σefe in the tables in cases it didn’t apply, so there were multiple flags. It took half a dozen coauthors to not read that. I’d hate to see how their Ikea furniture turned out.

$As often happens with LCDM, there are many nominal predictions. One common theme is that “Despite spanning four decades in luminosity, dSphs appear to inhabit halos of comparable peak circular velocity.” So nominally, one would expect a faint galaxy like Crater 2 to have a similar velocity dispersion to a much brighter one like Fornax, and the luminosity would have practically no power to predict the velocity dispersion, contrary to what we observe in the BTFR.

%There is the 2-halo term – once you get far enough from the center of a dark matter halo (the 1-halo term), there are other halos out there. These provide additional unseen mass, so can boost the velocity. The EFE in MOND has the opposite effect, and occurs for completely different physical reasons, so they’re not at all the same.

^^For arbitrary reasons of human psychology, the threshold many physicists set for “always happens” is around 100 times. That is, if a phenomenon is repeated 100 times, it is widely presumed to be a general rule. That was the threshold Vera Rubin hit when convincing the community that flat rotation curves were the general rule, not just some peculiar cases. That threshold has also been hit and exceeded by detailed MOND fits to rotation curves, and it seems to be widely accepted that this is the general rule even if many people deny the obvious implications. By now, it is also the case for apparent exceptions to MOND ceasing to be exceptions as the data improve. Unfortunately, people tend to stop listening at what they want to hear (in this case, “falsifies MOND”) and fail to pay attention to further developments.

%%It is conceivable that the ultrafaint dwarfs might elude tidal disruption in dark matter models if they reside in sufficiently dense dark matter halos. This seems unlikely given the obvious tidal effects on much more massive systems like the Sagittarius dwarf and the Magellanic Clouds, but it could in principle happen. Indeed, if one calculates the mass density from the observed velocity dispersion, one infers that they do reside in dense dark matter halos. In order to do this calculation, we are obliged to assume that the objects are in equilibrium. This is, of course, a form of static thinking: the possibility of tidal stirring that enhances the velocity dispersion above the equilibrium value is excluded by assumption. The assumption of equilibrium is so basic that it is easy to unwittingly engage in circular reasoning. I know, as I did exactly that myself to begin with.

Non-equilibrium dynamics in galaxies that appear to lack dark matter: tidal dwarf galaxies

Non-equilibrium dynamics in galaxies that appear to lack dark matter: tidal dwarf galaxies

There are a number of galaxies that have been reported to lack dark matter. This is weird in a universe made of dark matter. It is also weird in MOND, which (if true) is what causes the inference of dark matter. So how can this happen?

In most cases, it doesn’t. These claims not only don’t make sense in either context, they are simply wrong. I don’t want to sound too harsh, as I’ve come close to making the same mistake myself. The root cause of this mistake is often a form of static thinking in dynamic situations that the here and now is always a representative test. The basic assumption we have to make to interpret observed velocities in terms of mass is that systems are in (or close to) gravitational equilibrium so that the kinetic energy is a measure of the gravitational potential. In most places, this is a good assumption, so we tend to forget we even made it.

However, no assumption is ever perfect. For example, Gaia has revealed a wealth of subtle non-equilibrium effects in the Milky Way. These are not so large as to invalidate the basic inference of the mass discrepancy, but neither can they be entirely ignored. Even maintaining the assumption of a symmetric but non-smooth mass profile in equilibrium complicates the analysis.

Since the apparent absence of dark matter is unexpected in either theory, one needs to question the assumptions whenever this inference is made. There is one situation in which it is expected, so let’s consider that special case:

Tidal dwarf galaxies

Most dwarf galaxies are primordial – they are the way they are because they formed that way. However, it is conceivable that some dwarfs may form in the tidal debris of collisions between large galaxies. These are tidal dwarf galaxies (TDGs). Here are some examples of interacting systems containing candidate TDGs:

Fig. 1 from Lelli et al. (2015): images of interacting systems with TDG candidates noted in yellow.

I say candidate TDGs because it is hard to be sure a particular object is indeed tidal in origin. A good argument can be made that TDGs require such special conditions to form that perhaps they should not be able to form at all. As debris in tidal arms is being flung about in the (~ 200 km/s) potential well of a larger system, it is rather challenging for material to condense into a knot with a much smaller potential well (< 50 km/s). It can perhaps happen if the material in the tidal stream is both lumpy (to provide a seed to condense on) and sufficiently comoving (i.e., the tidal shear of the larger system isn’t too great), so maybe it happens on rare occasions. One way to distinguish TDGs from primordial dwarfs is metallicity: typical primordial dwarfs have low metallicity while TDGs have the higher metallicity of the giant system that is the source of the parent material.

A clean test of hypotheses

TDGs provide an interesting test of dark matter and MOND. In the vast majority of dark matter models, dark matter halos are dynamically hot, quasi-spherical systems with the particles that compose the dark matter (whatever it is) on eccentric, randomly oriented orbits that sum to a big, messy blob. Arguably it has to be this way in order to stabilize the disks of spiral galaxies. In contrast, the material that composes the tidal tails in which TDGs form originates in the baryonic material of the dynamically cold spiral disks where orbits are nearly circular in the same direction in the same thin plane. The phase space – the combination of position x,y,z and momentum vx,vy,vz – of disk and halo couldn’t be more different. This means that when two big galaxies collide or have a close interaction, everything gets whacked and the two components go their separate ways. Starting in orderly disks, the stars and gas make long, coherent tidal tails. The dark matter does not. The expectation from these basic phase space considerations is consistent with detailed numerical simulations.

We now have a situation in which the dark matter has been neatly segregated from the luminous matter. Consequently, if TDGs are able to form, they must do it only* with baryonic mass. The ironic prediction of a universe dominated by dark matter is that TDGs should be devoid of dark matter.

In contrast, one cannot “turn off” the force law in MOND. MOND can boost the formation of TDGs in the first place, but if said TDGs wind up in the low acceleration regime, they must evince a mass discrepancy. So the ironic prediction here is that, in ignorance of MOND, MOND means that we would infer that TDGs do have dark matter.

Got that? Dark matter predicts TDGs with no dark matter. MOND predicts TDGs that look like they do have dark matter. That’s not confusing at all.

Clean in principle, messy in practice

Tests of these predictions have a colorful history. Bournaud et al. (2007) did a lovely job of combining simulations with observations of the Seashell system (NGC 5291 above) and came to a striking conclusion: the rotation curves of TDGs exceeded that expected for the baryons alone:

Fig. 2 from Bournaud et al. (2007) showing the rotation curves for the three TDGs identified in the image above.

This was a strange, intermediary result. TDGs had more dark matter than the practically zero expected in LCDM, but less than comparable primordial dwarfs as expected in MOND. That didn’t make sense in either theory. They concluded that there must be a component of some other kind of dark matter that was not the traditional dark halo, but rather part of the spiral disk to begin with, perhaps unseen baryons in the form of very cold molecular gas.

Gentile et al. (2007) reexamined the situation, and concluded that the inclinations could be better constrained. When this was done, the result was more consistent with the prediction of MOND and the baryonic Tully-Fisher relation (BTFR. See their Fig. 2).

Fig. 1 from Gentile et al. (2007): Rotation curve data (full circles) of the 3 tidal dwarf galaxies (Bournaud et al. 2007). The lower (red) curves are the Newtonian contribution Vbar of the baryons (and its uncertainty, indicated as dotted lines). The upper (black) curves are the MOND prediction and its uncertainty (dotted lines). The top panels have as an implicit assumption (following Bournaud et al.) an inclination angle of 45 degrees. In the middle panels the inclination is a free parameter, and the bottom panels show the fits made with the first estimate for the external field effect (EFE).

Clearly there was room for improvement, both in data quality and quantity. We decided to have a go at it ourselves, ultimately leading to Lelli et al. (2015), which is the source of the pretty image above. We reanalyzed the Seashell system, along with some new TDG candidates.

Making sense of these data is not easy. TDG candidates are embedded in tidal features. It is hard to know where the dwarf ends and the tidal stream begins, or even to be sure there is a clear distinction. Here is an example of the northern knot in the Seashell system:

Fig. 5 from Lelli et al. (2015): Top panels: optical image (left), total H I  map (middle), and H I  velocity field (right). The dashed ellipse corresponds to the disc model described in Sect. 5.1. The cross and dashed line illustrate the kinematical centre and major axis, respectively. In the bottom-left corner, we show the linear scale (optical image) and the H I  beam (total H I  map and velocity field) as given in Table 6. In the total H I  map, contours are at ~4.5, 9, 13.5, 18, and 22.5 M pc-2. Bottom panels: position-velocity diagrams obtained from the observed cube (left), model cube (middle), and residual cube (right) along the major and minor axes. Solid contours range from 2σ to 8σ in steps of 1σ. Dashed contours range from −2σ to −4σ in steps of −1σ. The horizontal and vertical lines correspond to the systemic velocity and dynamical centre, respectively.

Both the distribution of gas and the velocities along the tidal tail often blend smoothly across TDG candidates, making it hard to be sure they have formed a separate system. In the case above, I can see what we think is the velocity field of the TDG alone (contained by the ellipse in the upper right panel), but is that really an independent system that has completely decoupled from the tidal material from which it formed? Definite maybe!

Federico Lelli did amazing work to sort through these difficult-to-interpret data. At the end of the day, he found that there was no need for dark matter in any of these TDG candidates. The amplitude of the apparent circular speed was consistent with the enclosed mass of baryons.

Figs. 11 and 13 from Lelli et al. (2015): the enclosed dynamical-to-baryonic mass ratio (left) and baryonic Tully-Fisher relation (right). TDGs (red points) are consistent with a mass ratio of unity: the observed baryons suffice; no dark matter is inferred. Contrary to Gentile et al., this manifests as a clear offset from the BTFR followed by normal galaxies.

Taken at face value, this absence of dark matter is a win for a universe made of dark matter and a falsification of MOND.

So we were prepared to say that, and did, but as Federico checked the numbers, it occurred to him to check the timescales. Mergers like this happen over the course of a few hundred million years, maybe a billion. The interactions we observe are ongoing; just how far into the process are they? Have the TDGs had time to settle down into dynamical equilibrium? That is the necessary assumption built into the mass ratio plotted above: the dynamical mass assumes the measured speed is that of a test particle in an equilibrium orbit. But these systems are manifestly not in equilibrium, at least on large scales. Maybe the TDGs have had time to settle down?

We can ask how long it takes to make an orbit at the observed speed, which is low by the standards of such systems (hence their offset from Tully-Fisher). To quote from the conclusions of the paper,

These [TDG] discs, however, have orbital times ranging from ~1 to ~3 Gyr, which are significantly longer than the TDG formation timescales (≲1 Gyr). This raises the question as to whether TDGs have had enough time to reach dynamical equilibrium.

Lelli et al. (2015)

So no, not really. We can’t be sure the velocities are measuring the local potential well as we want them to do. A particle should have had time to go around and around a few times to settle down in a new equilibrium configuration; here they’ve made 1/3, maybe 1/2 half of one orbit. Things have not had time to settle down, so there’s not really a good reason to expect that the dynamical mass calculation is reliable.

It would help to study older TDGs, as these would presumably have had time to settle down. We know of a few candidates, but as systems age, it becomes harder to gauge how likely they are to be legitimate TDGs. When you see a knot in a tidal arm, the odds seem good. If there has been time for the tidal stream to dissipate, it becomes less clear. So if such a thing turns out to need dark matter, is that because it is a TDG doing as MOND predicted, or just a primordial dwarf we mistakenly guessed was a TDG?

We gave one of these previously unexplored TDG candidates to a grad student. After much hard work combining observations from both radio and optical telescopes, she has demonstrated that it isn’t a TDG at all, in either paradigm. The metallicity is low, just as it should be for a primordial dwarf. Apparently it just happens to be projected along a tidal tail where it looks like a decent candidate TDG.

This further illustrates the trials and tribulations we encounter in trying to understand our vast universe.


*One expects cold dark matter halos to have subhalos, so it seems wise to suspect that perhaps TDGs condense onto these. Phase space says otherwise. It is not sufficient for tidal debris to intersect the location of a subhalo, the material must also “dock” in velocity space. Since tidal arms are being flung out at the speed that is characteristic of the giant system, the potential wells of the subhalos are barely speed bumps. They might perturb streams, but the probability of them being the seeds onto which TDGs condense is small: the phase space just doesn’t match up for the same reasons the baryonic and dark components get segregated in the first place. TDGs are one galaxy formation scenario the baryons have to pull off unassisted.

The minimum acceleration in intergalactic space

The minimum acceleration in intergalactic space

A strange and interesting aspect of MOND is the External Field Effect (EFE). If physics is strictly local, it doesn’t matter what happens outside of an experimental apparatus, only inside it. Examples of gravitational experiments include an Eötvös-style apparatus in a laboratory or a dwarf galaxy in space: in each case, test masses/stars respond to each other’s gravity.

The MOND force depends on the acceleration from all sources; it is not strictly local. Consequently, the results of a gravitational experiment depend on the environment in which it happens. An Eötvös experiment sitting in a laboratory on the surface of the Earth feels the one gee of acceleration due to the Earth and remains firmly in the Newtonian regime no matter how small an inter-particle acceleration experimenters achieve within the apparatus. This is the way in which MOND breaks the strong equivalence principle (but not the weak or Einstein equivalence principle).

A dwarf galaxy in the depths of intergalactic space behaves differently from an otherwise identical dwarf that is the satellite of a giant galaxy. In the isolated case, only the dwarf’s internal acceleration matters. For the typical low surface brightness dwarf galaxy, the internal acceleration gin due to self-gravity is deep in the MOND regime (gin < a0). In contrast, a dwarf satellite with the same internal acceleration gin is also subject to an external orbital acceleration gex around the host that may be comparable to or even greater than its internal acceleration. Both of those accelerations matter, so the isolated case (gin < a0) is deeper in the MOND regime and will evince a larger acceleration discrepancy than when the same dwarf is proximate to a giant galaxy and in the EFE regime (gin < gin+gex < a0)*. This effect is observed in the dwarfs of the Local Group.

The same effect holds everywhere in the universe. There should be a minimum acceleration due to the net effect of everything: galaxies, clusters, filaments in the intergalactic medium (IGM); anything and everything add up to a nonzero acceleration everywhere. I first attempted to estimate this myself in McGaugh & de Blok (1998), obtaining ~0.026 Å s-2, which is about 2% of a0 (1.2 Å s-2). This is a tiny fraction of a tiny number, but it is practically+ never zero: it’s as low as you can go, an effective minimum acceleration experienced even in the darkest depths of intergalactic space.

One can do better nowadays. The community has invested a lot in galaxy surveys; one can use those to construct a map of the acceleration the observed baryonic mass predicts in MOND. We did this in Chae et al. (2021) using a number of surveys. This gets us more than just a mean number as I guestimated in 1998, but also a measure of its variation.

Here is a map of the expected Newtonian acceleration across the sky for different ranges of distance from us. Blue is low acceleration; yellow higher. Glossing over some minor technical details, the corresponding MONDian acceleration is basically the square root (a0 gN)1/2, so 2% of a0 corresponds to log(eN) = -3.4 in the following plots, where eN is the Newtonian environmental acceleration: what Newton would predict for the visible galaxies alone.

Figure 4 from Chae et al. (2021): All-sky distributions of the environmental field acceleration gNe,env from 2M++ galaxies and MCXC clusters in Mollweide projection and equatorial coordinates averaged across various distance ranges. The locations of SPARC galaxies with independent estimates of gNe from RC fits are shown as points with color reflecting stronger (red) or weaker (blue) EFE and with the opacity of each point increasing with its accuracy.

The EFE imposes an effect on all objects, even giant galaxies, which we were trying to estimate in Chae et al. (2021) – hence the dots in the above maps. Each of those dots is a galaxy for which we had made an estimate of the EFE from its effect on the rotation curve. This is a subtle effect that is incredibly hard to constrain, but there is a signal when all galaxies are considered statistically in aggregate. It does look like the EFE is at work, but we can’t yet judge whether its variation from place to place matches the predicted map. Still, we obtained values for the acceleration in intergalactic space that are in the same realm as my crude early estimate.

Here’s another way to look at it. The acceleration is plotted as a function of distance, with the various colors corresponding to different directions on the sky. So where above we’re looking at maps of the sky in different distance bins, here we’re looking as a function of distance but relying on the color bar to indicate different directions. There is a fair amount of variation: some places have more structure and others less with a corresponding variation in the acceleration field.

Figure 5 from Chae et al. (2021)Variation of eN,env with distance for the galaxies in the NSA and Karachentsev catalogs. Individual galaxies are color-coded by right ascension (RA). The black lines show the mean trend (solid) and standard deviation (dashed) in bins of distance. This plot assumes the “max clustering” model for the missing baryons (see Figure 6, below).

Different catalogs have been used to map the structure here, but the answer comes out pretty much the same, but for one little (big) detail: how clustered are the baryons? The locations of the galaxies have been well mapped, so we can turn that into a map of their gravitational field. But we also know that galaxies are not the majority of the baryons. So where are the rest? Are they clustered like galaxies, or spread uniformly through intergalactic space?

When we did this, we knew a lot of baryons were in the IGM, but it really wasn’t clear how clustered they might be. So we took two limiting cases by assuming (1) all the baryons were as clustered as the galaxies or (2) not clustered at all, just a uniform background. This makes a difference since a uniform background, being uniform, doesn’t contribute. There’s as much force from this direction as that, and it cancels itself out, leading to a lower overall amplitude for the environmental acceleration field.

Figure 6 from Chae et al. (2021): Variation of eN,env with distance for the SPARC galaxies within the NSA footprint. The “max clustering” model (blue) assumes that missing baryons are effectively coincident with observed structures, while the “no clustering” model (orange) distributes them uniformly in space. See Section 3.2.1 for details.

That’s where the new result reported last time comes in. We now know that the missing baryons were all in the IGM. Indeed, the split is 1/4 clustered, 3/4 not. So something closer to (2), the “no clustering” limit above. That places the minimum acceleration in intergalactic space around log(eN) = -3.5, which is very close to the 2% of a0 that I estimated last century.

The diffuse IGM is presumably not perfectly uniform. There are large scale filaments and wall around giant voids. This structure will contribute variations in the local minimum acceleration, as visualized in this MOND structure formation simulation by Llinares (2008):

Figures 7 & 9 of Llinares (2008): The simulated density field (left) and modulus (right) of the MONDian force |∇ΦM | at z = 0 normalized by g2a0. For values above 1 the particles are in the Newtonian regime whereas values below 1 indicate the MOND regime.

Surveys for fast radio bursts are very sensitive to variations in the free electron density along the line of sight. Consequently, they can be used to map out structure in the IGM. The trick is that we need to cover lots of the sky with them – the denser the tracers, the better. That means discovering lots of them all over the sky, a task the DSA-110 was built to do.

I sure hope NSF continues to fund it.


*I write gin < gin+gex for simplicity, but strictly speaking the acceleration is a vector quantity so it is possible for the orientation of gin and gex to oppose one another so that their vector sum cancels out. This doesn’t happen often, but in periodic orbits it will always happen at some moment, with further interesting consequences. The more basic point is that the amplitude of the discrepancy scales with the ratio a0/g: the lower the acceleration g, the bigger the discrepancy from Newton – or, equivalently, the more dark matter we appear to need. The discrepancy of the isolated case a0/gin is larger than the discrepancy of the non-isolated case a0/(gin+gex) just because gin+gex > gin.

+A test of MOND using Lyman-alpha clouds was proposed by Aguirre et al (2001). These tiny puffs of intergalactic gas have very low internal accelerations, so should evince much larger discrepancies than observed. Or at least that was their initial interpretation, until I pointed out that the EFE from large scale structures would be the dominant effect. They argued it was still a problem, albeit a much smaller one than initially estimated. I don’t think it is a problem at all, because the amplitude of the EFE is so uncertain. Indeed, they made an estimate of the EFE at the relevant redshifts that depended on the rate of structure formation being conventional, which it is not in MOND. Lyman-alpha clouds are entirely consistent with MOND when one takes into account the more rapid growth of structure.

The baryons are mostly in the intergalactic medium. Mostly.

The baryons are mostly in the intergalactic medium. Mostly.

My colleague Jim Schombert pointed out a nifty new result published in Nature Astronomy which you probably can’t access so here is a link to what looks to be the preliminary version. The authors use the Deep Synoptic Array (DSA) to discover some new Fast Radio Bursts (FRBs), many of which are apparently in galaxies at large enough distances to provide an interesting probe of the intervening intergalactic medium (IGM).

There is lots that’s new and cool here. The DSA-110 is able to localize FRBs well enough to figure out where they are, which is an interesting challenge and impressive technological accomplishment. FRBs themselves remain something of a mystery. The are observed as short (typically millisecond), high intensity pulses of very low frequency radio emission, typically 1,400 MHz or less. What causes these pulses isn’t entirely clear, but they might be produced in the absurdly intense magnetic fields around some neutron stars.

FRBs are intrinsically luminous – lots of energy packed into a short burst – so can be detected from cosmological distances. The trick is to find them (blink and miss it!) and also to localize them on the sky. That’s challenging to do at these frequencies well enough to uniquely associate them with optical sources like candidate host galaxies. To quote from their website, “DSA-110 is a radio interferometer purpose-built for fast radio burst (FRB) detection and direct localization.” It was literally made to do this.

Connor et al. analyze dozens of known and report nine new FRBs covering enough of the sky to probe an interesting cosmological volume. Host galaxies with known redshifts define a web of pencil-beam probes – the paths that the radio waves have to traverse to get here. Low frequency radio waves are incredibly useful as a probe of the intervening space because they are sensitive to the density of intervening electrons, providing a measure of how many there are between us and each FRB.

Most of intergalactic space is so empty that the average density of matter is orders of magnitude lower than the best vacuum we can achieve in the laboratory. But there is some matter there, and of course intergalactic space is huge, so even low densities might add up to a lot. This provides a good way to find out how much.

The speed of light is the ultimate speed limit, in a vacuum. When propagating through a medium like glass or water, the effective speed of light is reduced by the index of refraction. For low frequency radio waves, the exceedingly low density of free electrons of the IGM suffice to slow them down a bit. This effect, called the dispersion measure, is frequency dependent. It usually comes up in the context of pulsars for which the width of their pulses is spread by the effect, but it works for any radio source with appropriate observable frequencies, like FRBs. The dispersion measure tells us the product of the distance and the density traversed along the line of sight to the source, so is usually expressed in typical obscure astronomical fashion as pc cm-3. This is really a column density, the number per square cm, but with host galaxies of known redshift the distance in known independently and we get a measure of the average electron volume density along the line of sight.

That’s it. That by itself provides a good measure of the density of intergalactic matter. The IGM is highly ionized, with a neutral fraction < 10-4, so counting electrons is the same as counting atoms. (Not every nucleus is hydrogen, so they adopt 0.875 electrons per baryon to account for the neutrons in helium and heavier elements. We know the neutral fraction is low in the IGM because hydrogen is incredibly opaque to ultraviolet radiation: absorption would easily be seen, yet there is no Gunn-Peterson trough until z > 6.) This leads to a baryon density of ΩBh2 = 0.025 ± 0.003, which is 5% of the critical density for a reasonable Hubble parameter of h = 0.7.

This solves the cosmic missing baryon problem. There had been an order of magnitude discrepancy when most of the baryons we knew about were in stars. It gradually became clear that many of the baryons were in various forms of tenuous plasma in the space between galaxies, for example in the Lyman alpha forest, but these didn’t account for everything so a decade ago a third of the baryons expected from BBN were still unaccounted for in the overall baryon budget. Now that checksum is complete. Indeed, if anything, we now have a small (if not statistically significant) baryon surplus+.

Here is a graphic representing the distribution of baryons among the various reservoirs. Connor et al. find that the fraction in the intergalactic medium is fIGM = 0.76 +0.10/-0.11. Three quarters of the baryons are Out There, spread incredibly thin throughout the vastness of cosmic space, with an absolute density of a few x 10-31 g cm-3, which is about one atom per cubic meter. Most of the atoms are hydrogen, so “normal” for most of the universe is one proton and one electron in a box a meter across rather than the 10-10 m occupied by a bound hydrogen atom. That’s a whole lot of empty.

Connor et al. assess that about 3/4 of all baryons are in the intergalactic medium (IGM), give or take 10% – the side bars illustrate the range of uncertainty. Many of the remaining baryons are in other forms of space plasma associated with but not in galaxies: the intracluster medium (ICM) of rich clusters, the intragroup medium (IGroupM) of smaller groups, and the circumgalactic medium (CGM) associated with individual galaxies. All the stars in all the galaxies add up to less than 10%, and the cold (non-ionized) atomic and molecular gas in galaxies comprise about 1% of the baryons.

The other reservoirs of baryons pale in comparison to the IGM. Most are still in some form of diffuse space plasma, like the intracluster media of clusters of galaxies and groups of galaxies, or associated with but not in individual galaxies (the circumgalactic medium). These distinctions are a bit fuzzy, as are the uncertainties on each component, especially the CGM (fCGM = 0.08 +0.07/-0.06). This leaves some room for a lower overall baryon density, but not much.

Connor et al. get some constraint on the CGM by looking at the increase in the dispersion measure for FRBs with sight-lines that pass close to intervening galaxies vs. those that don’t. This shows that there does seem to be some extra gas associated with such galaxies, but not enough to account for all the baryons that should be associated with their dark matter halos. So the object-by-object checksum of how the baryons are partitioned remains problematic, and I hope to have more to say about it in the near future. Connor et al. argue that some of the baryons have to have been blown entirely out of their original dark matter halos by feedback; they can’t all be lurking there or there would be less dispersion measure from the general IGM between us and relatively nearby galaxies where there is no intervening CGM*.

The baryonic content of visible galaxies – the building blocks of the universe that most readily meet the eye – is less than 10% of the total baryon density. Most of that is in stars and their remnants, which contain about 5% of the baryons, give or take a few percent stemming from the uncertainty in the stellar initial mass function. The cold gas – both neutral atomic gas and the denser molecular gas from which stars form, only add up to about 1% of all baryons. What we see most readily is only a fraction of what’s out there, even when restricting our consideration to normal matter: mostly the baryons are in the IGM. Mostly.

The new baryon inventory is now in good agreement with big bang nucleosynthesis: ΩBh2 = 0.025 ± 0.003 is consistent with Ωbh2 = 0.0224 ± 0.0001 from Planck CMB fits. It is more consistent with this and the higher baryon density favored by deuterium than it is with lithium, but isn’t accurate enough to exclude the latter. Irrespective of this important detail, I feel better that the third of the baryons that used to be missing (or perhaps not there at all) are now accounted for. The agreement with the checksum of the baryon inventory with the density of baryons consistent with BBN is as encouraging success of this deeply fundamental aspect of the hot big bang cosmology.


+Looking at their equation 2, there is some degeneracy between the baryon density Ωb and the fraction of ionized baryons Out There. Lower Ωb would mean a higher baryon fraction in the diffuse ionized state. This is already large, so there is only a little room to trade off between the two.

*What counts as CGM is a bit dicey. Putting on a cosmology hat, the definition Connor et al. adopt involving a range of masses of dark matter halos appropriate for individual galaxies is a reasonable one, and it makes sense to talk about the baryon fraction of those objects relative to the cosmic value, of which they fall short (fgas = 0.35 +0.30/-0.25 in individual galaxies where f* < 0.35: these don’t add up to unity). Switching to MOND, the notional association of the CGM with the virial radii of a host dark matter halos is meaningless, so it doesn’t matter if the gas in the vicinity of galaxies was once part of them and got blown out or simply never accreted in the first place. In LCDM we require at least some blow out to explain the sub-cosmic baryon fractions, while in MOND I’m inclined to suspect that the dominant process is non-accretion due to inefficient galaxy formation. Of course, the universe may indulge in a mix of both physical effects, in either paradigm!

%Unlike FLRW cosmology, there is no special scale defined by the critical density; a universe experiencing the MOND force-law will ultimately recollapse whatever its density, at least in the absence of something that acts like anti-gravity (i.e., dark energy). In retrospect, this is a more satisfactory solution of the flatness problem than Inflation, as there is nothing surprising about the observed density being what it is. There is no worry about it being close to but not quite equal to the critical density since the critical density is no longer a special scale.

( There are none )

Currently, English is the lingua franca of science. It wasn’t always that way, and there’s no reason to expect it always will be. A century ago, all the great physicists who wanted to be part of the quantum revolution went to study in Germany. “Been to Germany” was a desirable bragging point on a cv. Then this little thing called WWII happened, and the gravitational center of physics research, and science more generally, moved to the United States. Now “Been to America” is a bragging point for a German cv.

American Science – the world’s gold standard

The post-war success of American science wasn’t happenstance, it was an outcome of intentional government policy. Investment in science research was seen as an essential element of national security. It also became a phenomenal engine for the growth of knowledge and technology that underpins many essential elements of modern society that we’ve come to take for granted but shouldn’t, like this here internet*. The relatively modest investments (as a fraction of the federal budget) that made this possible have been repaid many times over in economic growth.

Part of the way in which the federal government has invested in science over the past 75 years is through research grants from agencies like NSF, NIH, and NASA awarded to individual scientists via their university employers. This has created a web of interconnected success: grants fund the science, develop new technologies and facilities, train new scientists, help support the environment that makes this possible (including universities), and enable a society where science thrives. American leadership in science seems to be taken for granted, but it only happens with effort and investment. The past three quarters of a century give a clear answer to whether this investment is worthwhile: Absolutely YES.

A legitimate question is what level of investment is appropriate. America’s scientific leadership has been slipping because other nations have witnessed our success and many have taken steps to replicate it. That’s good. But if one wants to maintain leadership for all the value that provides, or even remain competitive, one needs to invest more, not less.

Instead, the budget currently before congress can only be described as a rampage of draconian budget reductions. NASA science is down 47%; NSF 56%. Even NIH, the core agency for research that impacts medicine that we all rely on at some point, is down 37%. Heck, a military unit is considered destroyed if it suffers 30% casualties; these cuts are deeper. This is how you destroy something while pretending not to do so. Rather than simply murder American science outright, the “big, beautiful bill” drags it behind the woodshed, ties it up, thrashes it half to death, and leaves it to bleed out, killing it slowly enough to preserve plausible deniability.

This is a prescription to abandon American leadership in science:

This is all being done in the name of rooting out fraud, waste and abuse. This is an excuse, an assertion without merit. In other words, pure, unadulterated political bullshit.

I’ve worked closely with NSF and NASA. NSF is incredibly efficient – an achievement made largely in response to years of congressional complaint. Funny how the same congresspeople keep complaining even after the agency has done everything they asked. NASA is less efficient, but that’s largely on the side that funds crewed spaceflight, which is super expensive if you don’t want to routinely explode. The science-funding side of NASA is basically pocket change.

Whether any of this research spending is wasteful depends on your value system. But there is no fraud to speak of, nor abuse. Grant budgets are closely scrutinized at many levels. Success rates are low (typically 20% before the cuts; they’re projected to be 7% afterwards. One might as well shoot dice.) The issue is not that fraudulent grants get funded, it is that there isn’t enough funding to support all the excellent proposals. One could literally double** the funding of the science agencies and there would still be meritorious grant proposals that went unfunded.

Personal Experience so far in 2025

I thought I would share some personal experience with how this has been unfolding, both as a member of a research university where I sit on university-wide committees that oversee such things, and as an individual scientist.

Overhead

In February, the Trump administration announced that the overhead rate for NIH grants would be limited to 15%. This is an odd-sounding technicality to most people, so first some background. I didn’t invent the federal grant system, and I do think there are some ways in which it could be improved. But this is the system that has developed, and changing it constructively would require lengthy study and consideration, not the sudden jolt that is being applied.

When a scientist like myself applies for a grant, we mostly focus on the science we want to do. But part of the process is making a budget: what will it cost to achieve the scientific goals? This usually involves funding for junior researchers (grad students and postdocs), money for laboratory equipment or travel to facilities like observatories, and in the system we have, partial funding for the PI (principle investigator). How much salary funding the PI is supposed to obtain from grants varies by field; for the physical sciences it is usually two or three months of summer*** salary.

For my colleagues in the School of Medicine, the average salary support from grants is around 50%; in some departments it is as high as 70%. So cuts to NIH funding are a big deal, even the overhead rate. Overhead is the amount of support provided to the university to keep the lights on, the buildings open, for safe and modern laboratories, administrative support, etc. – all the ecological support necessary to maintain a thriving research environment. Each university negotiates its overhead rate separately with one of the federal funding agencies; there are only a handful of federal employees who know how to do this, as it involves complicated formulae for laboratory space and all sorts of other factors affecting operations. The typical overhead rate is ~50%, so for every two dollars of direct spending (e.g., grad student salary), another dollar**** goes to the university to keep things running. This has gradually become an essential portion of the overall budget of universities over the years, so cuts to the overhead rate are de facto cuts to everything a university does.

The CWRU School of Medicine is a very successful research college. Its cancer research group is particularly renowned, including the only scientists on campus who rank ahead of yours truly in impact according to the Stanford-Elsevier science-wide author databases of standardized citation indicators. It is a large part of the overall campus research effort and is largely funded by NIH. The proposed cut to the overhead rate to 15% would correspond to a $54 million reduction in the university’s annual budget (about 6% of the total, if I recall right).

Not many organizations can gracefully miss $54 million dollars, so this prospect caused much consternation. There were lawsuits (by many universities, not just us), injunctions, petitions by the government to change venue so as to dodge the injunctions, and so far, no concrete action. So spending on existing grants continued as normal, for now. There was guarded optimism in our administration that we’d at least get through the fiscal year without immediate tragedy.

Then another insidious thing started to happen. NIH simply ceased disbursing new grants. Sure, you can spend on existing grants. You can apply for new grants and some of you will even be successful – on paper. We just won’t send you the money. There were administrative hijinx to achieve this end that are too complicated to bother explaining; the administration is very creative at bending/reinterpreting/making up rules to obtain the outcome they want. They did eventually start slow-walking some new grants, so again giving the appearance of normality while in practice choking off an important funding source. In the long run, that’s a bigger deal than the overhead rate. It doesn’t matter what the overhead rate is if it is a percentage of zero.

Now maybe there is some better way to fund science, and it shouldn’t be the role of the federal government. OK, so what would that be? It would be good planning to have a replacement system in place before trashing the existing one. But no one is doing that. Private foundations cannot possibly pick up the slack. So will my colleagues in the School of Medicine suffer 50% salary cuts? Most people couldn’t handle that, but their dean is acting like it’s a possibility.

From the outside, the current situation may look almost normal but it is not. There is no brilliant plan to come up with some better funding scheme. Things will crash soon if not all at once. I expect our university – and many across the country – to be forced to take draconian budget action of their own. Not today, not tomorrow, but soon. What that looks like I don’t know, but I don’t see how it fails to include mass layoffs. Aside from the human cost that obviously entails, it also means we can’t do as much in either research or education. Since this is happening nation-wide, we will all be reduced as a consequence.

As a nation, this is choosing to fail.

My own recent experience with grants

I can’t begin to describe how difficult it is to write a successful grant. There is so much that goes into it; it’s like cramming everything I’ve ever written in this blog into 15 pages without leaving anything out. You don’t dare leave anything out because if you leave out obscure reference X you can be sure the author of X will be on the panel and complain that you’re unaware of important result X. More importantly, every talented colleague I have – and there are many – are doing the same thing, competing for the same shrinking pot. It’s super competitive, and has been for so long that I’ve heard serious suggestions of simply drawing proposals at random, lottery style. Strange as this sounds, this procedure would be more fair than the multiple-jeopardy merit evaluation we have at present: if a proposal almost succeeds one year, and a panel tells you to just improve this one thing; next year a different panel may hate that one thing and ask for something different. Feedback from panels used to be extremely useful; now it is just a list of prefab excuses for why you got rejected again.

NSF

I’ve mostly worked with NSF and NASA. I had an NSF proposal that was really well received in 2023; the review was basically “we would have funded this if we had enough money but we didn’t and something else edged you out.” This happens a lot, so I resubmitted it last year. Same result. There was a time when you could expect to succeed through perseverance; that time had already seemed to have reached an end and dissolved into a crap shoot even before the proposed cuts.

In the good old days of which I hear tell, but entirely before my time, NSF had something called an accomplishment-based renewal. Basically you could get a continuation of your grant as long as you were doing Good Things. I never experienced that; all my grants have been a standard three years and done. Getting new grants means writing an entirely new proposal and all the work that entails. It’s exhausting and can be a distraction from actually doing the science. But the legacy of accomplishment-based renewals lives on; as part of the fifteen pages of an NSF grant, you are required to spend five saying what great things you accomplished with previous funding. For me, as it relates to the most recent proposal, that’d be SPARC.

SPARC has been widely used as a database. It is in great demand by the community. So great that when our web server was down recently for the better part of a week for some extensive updates, I immediately got a stack of email asking where was it and when would it be back? The SPARC data paper has been cited over 600 times; the Radial Acceleration Relation based on it over 500. Those are Babe Ruth numbers, easily in the top percentile of citation rates. These are important results, and the data are clearly data the community want. The new proposal would have provided that and more, a dozen-fold, but apparently that’s not good enough.

NASA

While waiting to hear of that predictable disappointment, I tried to rally for NASA ROSES. These Research Opportunities in Space and Earth Science are traditionally announced on Valentine’s Day. ROSES on Valentine’s day? Get it? Yuk, yuk. I didn’t, until it didn’t happen at the appointed time. There were any number of announcements from different parts of NASA saying different things, mostly to the effect of “any day now.” So in March, I logged into my NSPIRES account to see what was available. Here’s the screenshot:

NASA proposals due within 30 days of March 25, 2025.

OK, those are the dregs from last year: the last of the proposal opportunities from ROSES 2024. The program appropriate for my project already passed; I’m looking for the 2025 edition. So let’s filter for future opportunities:

( There are none )

OK. Clearly NASA is going through some things. Let’s all just take a chill pill and come back and check on them three months later:

Huh, same result: future opportunities? ( There are none ) Who coulda guessed? It’s like it’s a feature rather than a bug.

Maybe NASA will get around to slow-walking grants like NIH. But there will be a lot less money at whatever rate it gets dolled out – to the manifest detriment of science in the United States and everyone everywhere who is interested in science in general and astrophysics in particular.

The bottom line

Make no mistake, the cuts congress***** and the administration intend to make to US science agencies are so severe that they amount to a termination of science as we’ve come to know it. It is a willful abandonment of American leadership in scientific endeavors. It is culture-war hatred for nerds and eggheads rendered as public policy. The scientific endeavor in the US is already suffering, and it will get much worse. There will be some brain drain, but I’m more concerned with the absence of brain nourishment. We risk murdering the careers of a generation of aspiring scientists.

I am reminded of what I said in the acknowledgements of my own Ph.D. thesis many years ago:

As I recall the path that has brought me here, I am both amazed and appalled by the amount of time, effort, and energy I have put into the production of this document. But this pales in comparison to the amount of tolerance and support (both moral and financial) required to bring a person to this point. It is difficult to grasp the depth and breadth of community commitment the doctoral process requires, let alone acknowledge all who contribute to its successful completion.

S. McGaugh, Ph.D thesis, 1992

Was that investment not worthwhile? I think it was. But it will be impossible for an aspiring young American like me to do science the way I have done. The career path is already difficult; in future it looks like the opportunity simply won’t exist.

Science is a tiny piece of American greatness that the Trump administration – with the active help of republicans in congress and a corrupt, partisan Supreme Court – has idly tossed in the bonfire. I have focused my comments to what I know directly from my own experience. Millions upon millions of Americans are currently experiencing other forms of malignant maladministration. It’s as if competent government matters after all.

In the longer term, a likely result of the current perfidy is not just a surrender of American leadership, but that the lingua franca of science moves on from English to some other language that is less hostile to it.

I hate politics and have no interest in debating it. I’m not the one who chose to suddenly undo decades of successful bipartisan science policy in a way that has a very direct negative impact on the country, my field, and me personally. Since politics invites divisive argument, the comments section will not be open.


*I don’t know who doesn’t know this, but the internet was developed by universities and the NSF. It grew out of previous efforts by the military (DARPAnet) and private industry (DECnet), but what we now know as the internet was pioneered by academic scientists funded by NSF. I’ve sometimes seen this period (1985 – 1995) referred to as NSFnet to distinguish it from the internet after is was made available to the public in 1995. But that’s not what we called it back then; we called it the internet. That’s what it was; that’s where the name came from.

I’ve been on the internet since 1987. I personally was against sharing it with the public for selfish reasons. As a scientist, I was driving truckloads of data+ along narrow lanes of limited bandwidth; I didn’t want to share the road with randos sharing god knows what. That greedy people (e.g., Mark Zuckerberg) would fence off parts of the fruits of public investment and profit by gatekeeping who could see what and harvesting gobs of personal data had not occurred to me as something that would be allowed.

I relate this bit of personal experience because I’ve seen a lot of tech bros try to downplay the role of NSF and claim its successes by asserting that they invented the internet. They did not; they merely colonized and monetized it. It was invented by scientists to share data.

+I once personally choked what is now known as arXiv by submitting a preprint with galaxy images larger than the system could handle at the time. The submission set off a doom-loop of warning emails that throttled things for many hours before I succeeded in killing all the guilty unix processes. That’s why the comments of that preprint have a link (long since defunct) to a version of the paper on the Institute of Astronomy’s local server.


**I’m old enough to remember, not all that long ago, when there was a bipartisan commitment to double science funding. That didn’t happen. It really did have widespread bipartisan support, but the science budget is a tiny portion of discretionary spending which itself is a tiny portion of the overall federal budget. The effort got lost in reconciliation.


***I would prefer a system that is less focused on the the individual PI; it is a very American-social Darwinism approach to get you to compete by dangling the carrot of more pay. But that carrot long ago evolved into a stick; getting grants is a de facto job requirement, not merely an occasional success. Overall I can’t complain; I’ve been very successful, managing to remain fully funded over the course of my career, up until very recently. Now my grants are finished so my salary is down 25%. In the current environment I don’t expect to see that again.


****Is this a fair rate? I have no idea – not my specialty. But we recently had external consultants brought in to review our expenses; I think the board of trustees expected to identify wasteful spending that could be cut, and that was certainly the attitude the consultants brought in with them. After actually reviewing everything, their report was “Geez, this operation is super-efficient; there’s no fat to cut and really the whole operation should cost more than it does.” While that’s specific to my college, it seems to me to be a pretty accurate depiction of NSF as well.


*****The republicans are pushing through this poisonous budget with a one seat majority in the House or Representatives. One. Seat. It literally could not be closer to a 50/50 split. So don’t go thinking “Americans voted for this.” Americans couldn’t be more divided.

Sad to think how much tragedy could be averted if a single republican in congress grew a spine and put country before party.

The Deuterium-Lithium tension in Big Bang Nucleosynthesis

The Deuterium-Lithium tension in Big Bang Nucleosynthesis

There are many tensions in the era of precision cosmology. The most prominent, at present, is the Hubble tension – the difference between traditional measurements, which consistently obtain H0 = 73 km/s/Mpc, and best fit* to the acoustic power spectrum of the cosmic microwave background (CMB) observed by Planck, H0 = 67 km/s/Mpc. There are others of varying severity that are less widely discussed. In this post, I want to talk about a persistent tension in the baryon density implied by the measured primordial abundances of deuterium and lithium+. Unlike the tension in H0, this problem is not nearly as widely discussed as it should be.

Framing

Part of the reason that this problem is not seen as an important tension has to do with the way in which it is commonly framed. In most discussions, it is simply the primordial lithium problem. Deuterium agrees with the CMB, so those must be right and lithium must be wrong. Once framed that way, it becomes a trivial matter specific to one untrustworthy (to cosmologists) observation. It’s a problem for specialists to sort out what went wrong with lithium: the “right” answer is otherwise known, so this tension is not real, making it unworthy of wider discussion. However, as we shall see, this might not be the right way to look at it.

It’s a bit like calling the acceleration discrepancy the dark matter problem. Once we frame it this way, it biases how we see the entire problem. Solving this problem becomes a matter of finding the dark matter. It precludes consideration of the logical possibility that the observed discrepancies occur because the force law changes on the relevant scales. This is the mental block I struggled mightily with when MOND first cropped up in my data; this experience makes it easy to see when other scientists succumb to it sans struggle.

Big Bang Nucleosynthesis (BBN)

I’ve talked about the cosmic baryon density here a lot, but I’ve never given an overview of BBN itself. That’s because it is well-established, and has been for a long time – I assume you, the reader, already know about it or are competent to look it up. There are many good resources for that, so I’ll only give enough of a sketch necessary to the subsequent narrative – a sketch that will be both too little for the experts and too much for the subsequent narrative that most experts are unaware of.

Primordial nucleosynthesis occurs in the first few minutes after the Big Bang when the universe is the right temperature and density to be one big fusion reactor. The protons and available neutrons fuse to form helium and other isotopes of the light elements. Neutrons are slightly more massive and less numerous than protons to begin with. In addition, free neutrons decay with a half-life of roughly ten minutes, so are outnumbered by protons when nucleosynthesis happens. The vast majority of the available neutrons pair up with protons and wind up in 4He while most of the protons remain on their own as the most common isotope of hydrogen, 1H. The resulting abundance ratio is one alpha particle for every dozen protons, or in terms of mass fractions&, Xp = 3/4 hydrogen and Yp = 1/4 helium. That is the basic composition with which the universe starts; heavy elements are produced subsequently in stars and supernova explosions.

Though 1H and 4He are by far the most common products of BBN, there are traces of other isotopes that emerge from BBN:

The time evolution of the relative numbers of light element isotopes through BBN. As the universe expands, nuclear reactions “freeze-out” and establish primordial abundances for the indicated species. The precise outcome depends on the baryon density, Ωb. This plot illustrates a particular choice of Ωb; different Ωb result in observationally distinguishable abundances. (Figures like this are so ubiquitous in discussions of the early universe that I have not been able to identify the original citation for this particular version.)

After hydrogen and helium, the next most common isotope to emerge from BBN is deuterium, 2H. It is the first thing made (one proton plus one neutron) but most of it gets processed into 4He, so after a brief peak, its abundance declines. How much it declines is very sensitive to Ωb: the higher the baryon density, the more deuterium gets gobbled up by helium before freeze-out. The following figure illustrates how the abundance of each isotope depends on Ωb:

“Schramm diagram” adopted from Cyburt et al (2003) showing the abundance of 4He by mass fraction (top) and the number relative to hydrogen of deuterium (D = 2H), helium-3, and lithium as a function of the baryon-to-photon ratio. We measure the photon density in the CMB, so this translates directly to the baryon density$ Ωbh2 (top axis).

If we can go out and measure the primordial abundances of these various isotopes, we can constrain the baryon density.

The Baryon Density

It works! Each isotope provides an independent estimate of Ωbh2, and they agree pretty well. This was the first and for a long time the only over-constrained quantity in cosmology. So while I am going to quibble about the exact value of Ωbh2, I don’t doubt that the basic picture is correct. There are too many details we have to get right in the complex nuclear reaction chains coupled to the decreasing temperature of a universe expanding at the rate required during radiation domination for this to be an accident. It is an exquisite success of the standard Hot Big Bang cosmology, albeit not one specific to LCDM.

Getting at primordial, rather than current, abundances is an interesting observational challenge too involved to go into much detail here. Suffice it to say that it can be done, albeit to varying degrees of satisfaction. We can then compare the measured abundances to the theoretical BBN abundance predictions to infer the baryon density.

The Schramm diagram with measured abundances (orange boxes) for the isotopes of the light elements. The thickness of the box illustrates the uncertainty: tiny for deuterium and large for 4He because of the large zoom on the axis scale. The lithium abundance could correspond to either low or high baryon density. 3He is omitted because its uncertainty is too large to provide a useful constraint.

Deuterium is considered the best baryometer because its relic abundance is very sensitive to Ωbh2: a small change in baryon density corresponds to a large change in D/H. In contrast, 4He is a great confirmation of the basic picture – the primordial mass fraction has to come in very close to 1/4 – but the precise value is not very sensitive to Ωbh2. Most of the neutrons end up in helium no matter what, so it is hard to distinguish# a few more from a few less. (Note the huge zoom on the linear scale for 4He. If we plotted it logarithmically with decades of range as we do the other isotopes, it would be a nearly flat line.) Lithium is annoying for being double-valued right around the interesting baryon density so that the observed lithium abundance can correspond to two values of Ωbh2. This behavior stems from the trade off with 7Be which is produced at a higher rate but decays to 7Li after a few months. For this discussion the double-valued ambiguity of lithium doesn’t matter, as the problem is that the deuterium abundance indicates Ωbh2 that is even higher than the higher branch of lithium.

BBN pre-CMB

The diagrams above and below show the situation in the 1990s before CMB estimates became available. Consideration of all the available data in the review of Walker et al. led to the value Ωbh2 = 0.0125 ± 0.0025. This value** was so famous that it was Known. It formed the basis of my predictions for the CMB for both LCDM and no-CDM. This prediction hinged on BBN being correct, and that we understood the experimental bounds on the baryon density. A few years after Walker’s work, Copi et al. provided the estimate++ 0.009 < Ωbh2 < 0.02. Those were the extreme limits of the time, as illustrated by the green box below:

The baryon density as it was known before detailed observations of the acoustic power spectrum of the CMB. BBN was a mature subject before 1990; the massive reviews of Walker et al. and Copi et al. creak with the authority of a solved problem. The controversial tension at the time was between the high and low deuterium measurements from Hogan and Tytler, which were at the extreme ends of the ranges indicated by the bulk of the data in the reviews.

Up until this point, the constraints on BBN had come mostly from helium observations in nearby galaxies and lithium measurements in metal poor stars. It was only just then becoming possible to obtain high quality spectra of sufficiently high redshift quasars to see weak deuterium lines associated with strongly damped primary hydrogen absorption in intergalactic gas along the line of sight. This is great: deuterium is the most sensitive baryometer, the redshifts were high enough to be early in the history of the universe close to primordial times, and the gas was in the middle of intergalactic nowhere so shouldn’t be altered by astrophysical processes. These are ideal conditions, at least in principle.

First results were binary. Craig Hogan obtained a high deuterium abundance, corresponding to a low baryon density. Really low. From my Walker et al.-informed confirmation bias, too low. It was a a brand new result, so promising but probably wrong. Then Tytler and his collaborators came up with the opposite result: low deuterium abundance corresponding to a high baryon density: Ωbh2 = 0.019 ± 0.001. That seemed pretty high at the time, but at least it was within the bound Ωbh2 < 0.02 set by Copi et al. There was a debate between these high/low deuterium camps that ended in a rare act of intellectual honesty by a cosmologist when Hogan&& conceded. We seemed to have settled on the high-end of the allowed range, just under Ωbh2 = 0.02.

Enter the CMB

CMB data started to be useful for constraining the baryon density in 2000 and improved rapidly. By that point, LCDM was already well-established, and I had published predictions for both LCDM and no-CDM. In the absences of cold dark matter, one expects a damping spectrum, with each peak lower than the one before it. For the narrow (factor of two) Known range of possible baryon densities, all the no-CDM models run together to essentially the same first-to-second peak ratio.

Peak locations measured by WMAP in 2003 (points) compared to the a priori (1999) predictions of LCDM (red tone lines) and no-CDM (blue tone lines). Models are normalized in amplitude around the first peak.

Adding CDM into the mix adds a driver to the oscillations. This fights the baryonic damping: the CDM is like a parent pushing a swing while the baryons are the kid dragging his feet. This combination makes just about any pattern of peaks possible. Not all free parameters are made equal: the addition of a single free parameter, ΩCDM, makes it possible to fit any plausible pattern of peaks. Without it (no-CDM means ΩCDM = 0), only the damping spectrum is allowed.

For BBN as it was known at the time, the clear difference was in the relative amplitude$$ of the first and second peaks. As can be seen above, the prediction for no-CDM was correct and that for LCDM was not. So we were done, right?

Of course not. To the CMB community, the only thing that mattered was the fit to the CMB power spectrum, not some obscure prediction based on BBN. Whatever the fit said was True; too bad for BBN if it didn’t agree.

The way to fit the unexpectedly small## second peak was to crank up the baryon density. To do that, Tegmark & Zaldarriaga (2000) needed 0.022 < Ωbh2 < 0.040. That’s what the first blue point below. This was the first time that I heard it suggested that the baryon density could be so high.

The baryon density from deuterium (red triangles) before and after (dotted vertical line) estimates from the CMB (blue points). The horizontal dotted line is the pre-CMB upper limit of Copi et al.

The astute reader will note that the CMB-fit 0.022 < Ωbh2 < 0.040 sits entirely outside the BBN bounds 0.009 < Ωbh2 < 0.02. So we’re done, right? Well, no – the community simply ignored the successful a priori prediction of the no-CDM scenario. That was certainly easier than wrestling with its implications, and no one seems to have paused to contemplate why the observed peak ratio came in exactly at the one unique value that it could obtain in the case of no-CDM.

For a few years, the attitude seemed to be that BBN was close but not quite right. As the CMB data improved, the baryon density came down, ultimately settling on Ωbh2 = 0.0224 ± 0.0001. Part of the reason for this decline from the high initial estimate is covariance. In this case, the tilt plays a role: the baryon density declined as ns = 1 → 0.965 ± 0.004. Getting the second peak amplitude right takes a combination of both.

Now we’re back in the ballpark, almost: Ωbh2 = 0.0224 is not ridiculously far above the BBN limit Ωbh2 < 0.02. Close enough for Spergel et al. (2003) to say “The remarkable agreement between the baryon density inferred from D/H values and our [WMAP] measurements is an important triumph for the basic big bang model.” This was certainly true given the size of the error bars on both deuterium and the CMB at the time. It also elides*** any mention of either helium or lithium or the fact that the new Known was not consistent with the previous Known. Ωbh2 = 0.0224 was always the ally; Ωbh2 = 0.0125 was always the enemy.

Note, however, that deuterium made a leap from below Ωbh2 = 0.02 to above 0.02 exactly when the CMB indicated that it should do so. They iterated to better agreement and pretty much stayed there. Hopefully that is the correct answer, but given the history of the field, I can’t help worrying about confirmation bias. I don’t know if that is what’s going on, but if it were, this convergence over time is what it would look like.

Lithium does not concur

Taking the deuterium results at face value, there really is excellent agreement with the LCDM fit to the CMB, so I have some sympathy for the desire to stop there. Deuterium is the best baryometer, after all. Helium is hard to get right at a precise enough level to provide a comparable constraint, and lithium, well, lithium is measured in stars. Stars are tiny, much smaller than galaxies, and we know those are too puny to simulate.

Spite & Spite (1982) [those are names, pronounced “speet”; we’re not talking about spiteful stars] discovered what is now known as the Spite plateau, a level of constant lithium abundance in metal poor stars, apparently indicative of the primordial lithium abundance. Lithium is a fragile nucleus; it can be destroyed in stellar interiors. It can also be formed as the fragmentation product of cosmic ray collisions with heavier nuclei. Both of these things go on in nature, making some people distrustful of any lithium abundance. However, the Spite plateau is a sort of safe zone where neither effect appears to dominate. The abundance of lithium observed there is indeed very much in the right ballpark to be a primordial abundance, so that’s the most obvious interpretation.

Lithium indicates a lowish baryon density. Modern estimates are in the same range as BBN of old; they have not varied systematically with time. There is no tension between lithium and pre-CMB deuterium, but it disagrees with LCDM fits to the CMB and with post-CMB deuterium. This tension is both persistent and statistically significant (Fields 2011 describes it as “4–5σ”).

The baryon density from lithium (yellow symbols) over time. Stars are measurements in groups of stars on the Spite plateau; the square represents the approximate value from the ISM of the SMC.

I’ve seen many models that attempt to fix the lithium abundance, e.g., by invoking enhanced convective mixing via <<mumble mumble>> so that lithium on the surface of stars is subject to destruction deep in the stellar interior in a previously unexpected way. This isn’t exactly satisfactory – it should result in a mess, not a well-defined plateau – and other attempts I’ve seen to explain away the problem do so with at least as much contrivance. All of these models appeared after lithium became a problem; they’re clearly motivated by the assumption bias that the CMB is correct so the discrepancy is specific to lithium so there must be something weird about stars that explains it.

Another way to illustrate the tension is to use Ωbh2 from the Planck fit to predict what the primordial lithium abundance should be. The Planck-predicted band is clearly higher than and offset from the stars of the Spite plateau. There should be a plateau, sure, but it’s in the wrong place.

The lithium abundance in metal poor stars (points), the interstellar medium of the Small Magellanic Cloud (green band), and the primordial lithium abundance expected for the best-fit Planck LCDM. For reference, [Fe/H] = -3 means an iron abundance that is one one-thousandth that of the sun.

An important recent observation is that a similar lithium abundance is obtained in the metal poor interstellar gas of the Small Magellanic Cloud. That would seem to obviate any explanation based on stellar physics.

The Schramm diagram with the Planck CMB-LCDM value added (vertical line). This agrees well with deuterium measurements made after CMB data became available, but not with those before, nor with the measured abundance of lithium.

We can also illustrate the tension on the Schramm diagram. This version adds the best-fit CMB value and the modern deuterium abundance. These are indeed in excellent agreement, but they don’t intersect with lithium. The deuterium-lithium tension appears to be real, and comparable in significance to the H0 tension.

So what’s the answer?

I don’t know. The logical options are

  • A systematic error in the primordial lithium abundance
  • A systematic error in the primordial deuterium abundance
  • Physics beyond standard BBN

I don’t like any of these solutions. The data for both lithium and deuterium are what they are. As astronomical observations, both are subject to the potential for systematic errors and/or physical effects that complicate their interpretation. I am also extremely reluctant to consider modifications to BBN. There are occasional suggestions to this effect, but it is a lot easier to break than it is to fix, especially for what is a fairly small disagreement in the absolute value of Ωbh2.

I have left the CMB off the list because it isn’t part of BBN: it’s constraint on the baryon density is real, but involves completely different physics. It also involves different assumptions, i.e., the LCDM model and all its invisible baggage, while BBN is just what happens to ordinary nucleons during radiation domination in the early universe. CMB fits are corroborative of deuterium only if we assume LCDM, which I am not inclined to accept: deuterium disagreed with the subsequent CMB data before it agreed. Whether that’s just progress or a sign of confirmation bias, I also don’t know. But I do know confirmation bias has bedeviled the history of cosmology, and as the H0 debate shows, we clearly have not outgrown it.

The appearance of confirmation bias is augmented by the response time of each measured elemental abundance. Deuterium is measured using high redshift quasars; the community that does that work is necessarily tightly coupled to cosmology. It’s response was practically instantaneous: as soon as the CMB suggested that the baryon density needed to be higher, conforming D/H measurements appeared. Indeed, I recall when that first high red triangle appeared in the literature, a colleague snarked to me “we can do that too!” In those days, those of us who had been paying attention were all shocked at how quickly Ωbh2 = 0.0125 ± 0.0025 was abandoned for literally double that value, ΩBh2 = 0.025 ± 0.001. That’s 4.6 sigma for those keeping score.

The primordial helium abundance is measured in nearby dwarf galaxies. That community is aware of cosmology, but not as strongly coupled to it. Estimates of the primordial helium abundance have drifted upwards over time, corresponding to higher implied baryon densities. It’s as if confirmation bias is driving things towards the same result, but on a timescale that depends on the sociological pressure of the CMB imperative.

Fig. 8 from Steigman (2012) showing the history of primordial helium mass fraction (YP) determinations as a function of time.

I am not accusing anyone of trying to obtain a particular result. Confirmation bias can be a lot more subtle than that. There is an entire field of study of it in psychology. We “humans actively sample evidence to support prior beliefs” – none of us are immune to it.

In this case, how we sample evidence depends on the field we’re active in. Lithium is measured in stars. One can have a productive career in stellar physics while entirely ignoring cosmology; it is the least likely to be perturbed by edicts from the CMB community. The inferred primordial lithium abundance has not budged over time.

What’s your confirmation bias?

I try not to succumb to confirmation bias, but I know that’s impossible. The best I can do is change my mind when confronted with new evidence. This is why I went from being sure that non-baryonic dark matter had to exist to taking seriously MOND as the theory that predicted what I observed.

I do try to look at things from all perspectives. Here, the CMB has been a roller coaster. Putting on an LCDM hat, the location of the first peak came in exactly where it was predicted: this was strong corroboration of a flat FLRW geometry. What does it mean in MOND? No idea – MOND doesn’t make a prediction about that. The amplitude of the second peak came in precisely as predicted for the case of no-CDM. This was corroboration of the ansatz inspired by MOND, and the strongest possible CMB-based hint that we might be barking up the wrong tree with LCDM.

As an exercise, I went back and maxed out the baryon density as it was known before the second peak was observed. We already thought we knew LCDM parameters well enough to do this. We couldn’t. The amplitude of the second peak came as a huge surprise to LCDM; everyone acknowledged that at the time (if pressed; many simply ignored it). Nowadays this is forgotten, or people have gaslit themselves into believing this was expected all along. It was not.

Fig. 45 from Famaey & McGaugh (2012): WMAP data are shown with the a priori prediction of no-CDM (blue line) and the most favorable prediction that could have been made ahead of time for LCDM (red line).

From the perspective of no-CDM, we don’t really care whether deuterium or lithium hits closer to the right baryon density. All plausible baryon densities predict essentially the same A1:2 amplitude ratio. Once we admit CDM as a possibility, then the second peak amplitude becomes very sensitive to the mix of CDM and baryons. From this perspective, the lithium-indicated baryon density is unacceptable. That’s why it is important to have a test that is independent of the CMB. Both deuterium and lithium provide that, but they disagree about the answer.

Once we broke BBN to fit the second peak in LCDM, we were admitting (if not to ourselves) that the a priori prediction of LCDM had failed. Everything after that is a fitting exercise. There are enough free parameters in LCDM to fit any plausible power spectrum. Cosmologists are fond of saying there are thousands of independent multipoles, but that overstates the case: it doesn’t matter how finely we sample the wave pattern, it matters what the wave pattern is. That is not as over-constrained as it is made to sound. LCDM is, nevertheless, an excellent fit to the CMB data; the test then is whether the parameters of this fit are consistent with independent measurements. It was until it wasn’t; that’s why we face all these tensions now.

Despite the success of the prediction of the second peak, no-CDM gets the third peak wrong. It does so in a way that is impossible to fix short of invoking new physics. We knew that had to happen at some level; empirically that level occurs at L = 600. After that, it becomes a fitting exercise, just as it is in LCDM – only now, one has to invent a new theory of gravity in which to make the fit. That seems like a lot to ask, so while it remained as a logical possibility, LCDM seemed the more plausible explanation for the CMB if not dynamical data. From this perspective, that A1:2 came out bang on the value predicted by no-CDM must just be one heck of a cosmic fluke. That’s easy to accept if you were unaware of the prediction or scornful of its motivation; less so if you were the one who made it.

Either way, the CMB is now beyond our ability to predict. It has become a fitting exercise, the chief issue being what paradigm in which to fit it. In LCDM, the fit follows easily enough; the question is whether the result agrees with other data: are these tensions mere hiccups in the great tradition of observational cosmology? Or are they real, demanding some new physics?

The widespread attitude among cosmologists is that it will be impossible to fit the CMB in any way other than LCDM. That is a comforting thought (it has to be CDM!) and for a long time seemed reasonable. However, it has been contradicted by the success of Skordis & Zlosnik (2021) using AeST, which can fit the CMB as well as LCDM.

CMB power spectrum observed by Planck fit by AeST (Skordis & Zlosnik 2021).

AeST is a very important demonstration that one does not need dark matter to fit the CMB. One does need other fields+++, so now the reality of those have to be examined. Where this show stops, nobody knows.

I’ll close by noting that the uniqueness claimed by the LCDM fit to the CMB is a property more correctly attributed to MOND in galaxies. It is less obvious that this is true because it is always possible to fit a dark matter model to data once presented with the data. That’s not science, that’s fitting French curves. To succeed, a dark matter model must “look like” MOND. It obviously shouldn’t do that, so modelers refuse to go there, and we continue to spin our wheels and dig the rut of our field deeper.

Note added in proof, as it were: I’ve been meaning to write about this subject for a long time, but hadn’t, in part because I knew it would be long and arduous. Being deeply interested in the subject, I had to slap myself repeatedly to refrain from spending even more time updating the plots with publication date as an axis: nothing has changed, so that would serve only to feed my OCD. Even so, it has taken a long time to write, which I mention because I had completed the vast majority of this post before the IAU announced on May 15 that Cooke & Pettini have been awarded the Gruber prize for their precision deuterium abundance. This is excellent work (it is one of the deuterium points in the relevant plot above), and I’m glad to see this kind of hard, real-astronomy work recognized.

The award of a prize is a recognition of meritorious work but is not a guarantee that it is correct. So this does not alter any of the concerns that I express here, concerns that I’ve expressed for a long time. It does make my OCD feels obliged to comment at least a little on the relevant observations, which is itself considerably involved, but I will tack on some brief discussion below, after the footnotes.

*These methods were in agreement before they were in tension, e.g., Spergel et al. (2003) state: “The agreement between the HST Key Project value and our [WMAP CMB] value, h = 0.72 ±0.05, is striking, given that the two methods rely on different observables, different underlying physics, and different model assumptions.”

+Here I mean the abundance of the primary isotope of lithium, 7Li. There is a different problem involving the apparent overabundance of 6Li. I’m not talking about that here; I’m talking about the different baryon densities inferred separately from the abundances of D/H and 7Li/H.

&By convention, X, Y, and Z are the mass fractions of hydrogen, helium, and everything else. Since the universe starts from a primordial abundance of Xp = 3/4 and Yp = 1/4, and stars are seen to have approximately that composition plus a small sprinkling of everything else (for the sun, Z ≈ 0.02), and since iron lines are commonly measured in stars to trace Z, astronomers fell into the habit of calling Z the metallicity even though oxygen is the third most common element in the universe today (by both number and mass). Since everything in the periodic table that isn’t hydrogen and helium is a small fraction of the mass, all the heavier elements are often referred to collectively as metals despite the unintentional offense to chemistry.

$The factor of h2 appears because of the definition of the critical density ρc = (3H02)/(8πG): Ωb = ρbc. The physics cares about the actual density ρb but Ωbh2 = 0.02 is a lot more convenient to write than ρb,now = 3.75 x 10-31 g/cm3.

#I’ve worked on helium myself, but was never able to do better than Yp = 0.25 ± 0.01. This corroborates the basic BBN picture, but does not suffice as a precise measure of the baryon density. To do that, one must obtain a result accurate to the third place of decimals, as discussed in the exquisite works of Kris Davidson, Bernie Pagel, Evan Skillman, and their collaborators. It’s hard to do for both observational reasons and because a wealth of subtle atomic physics effects come into play at that level of precision – helium has multiple lines; their parent population levels depend on the ionization mechanism, the plasma temperature, its density, and fluorescence effects as well as abundance.

**The value reported by Walker et al. was phrased as Ωbh502 = 0.05 ± 0.01, where h50 = H0/(50 km/s/Mpc); translating this to the more conventional h = H0/(100 km/s/Mpc) decreases these numbers by a factor of four and leads to the impression of more significant digits than were claimed. It is interesting to consider the psychological effect of this numerology. For example, the modern CMB best-fit value in this phrasing is Ωbh502 = 0.09, four sigma higher than the value Known from the combined assessment of the light isotope abundances. That seems like a tension – not just involving lithium, but the CMB vs. all of BBN. Amusingly, the higher baryon density needed to obtain a CMB fit assuming LCDM is close to the threshold where we might have gotten away without the dynamical needm > Ωb) for non-baryonic dark matter that motivated non-baryonic dark matter in the first place. (For further perspective at a critical juncture in the development of the field, see Peebles 1999).

The use of h50 itself is an example of the confirmation bias I’ve mentioned before as prevalent at the time, that Ωm = 1 and H0 = 50 km/s/Mpc. I would love to be able to do the experiment of sending the older cosmologists who are now certain of LCDM back in time to share the news with their younger selves who were then equally certain of SCDM. I suspect their younger selves would ask their older selves at what age they went insane, if they didn’t simply beat themselves up.

++Craig Copi is a colleague here at CWRU, so I’ve asked him about the history of this. He seemed almost apologetic, since the current “right” baryon density from the CMB now is higher than his upper limit, but that’s what the data said at the time. The CMB gives a more accurate value only once you assume LCDM, so perhaps BBN was correct in the first place.

&&Or succumbed to peer pressure, as that does happen. I didn’t witness it myself, so don’t know.

$$The absolute amplitude of the no-CDM model is too high in a transparent universe. Part of the prediction of MOND is that reionization happens early, causing the universe to be a tiny bit opaque. This combination came out just right for τ = 0.17, which was the original WMAP measurement. It also happens to be consistent with the EDGES cosmic dawn signal and the growing body of evidence from JWST.

##The second peak was unexpectedly small from the perspective of CDM; it was both natural and expected in no-CDM. At the time, it was computationally expensive to calculate power spectra, so people had pre-computed coarse grids within which to hunt for best fits. The range covered by the grids was informed by extant knowledge, of which BBN was only one element. From a dynamical perspective, Ωm > 0.2 was adopted as a hard limit that imposed an edge in the grids of the time. There was no possibility of finding no-CDM as the best fit because it had been excluded as a possibility from the start.

***Spergel et al. (2003) also say “the best-fit Ωbh2 value for our fits is relatively insensitive to cosmological model and dataset combination as it depends primarily on the ratio of the first to second peak heights (Page et al. 2003b)” which is of course the basis of the prediction I made using the baryon density as it was Known at the time. They make no attempt to test that prediction, nor do they cite it.

+++I’ve heard some people assert that this is dark matter by a different name, so is a success of the traditional dark matter picture rather than of modified gravity. That’s not at all correct. It’s just stage three in the list of reactions to surprising results identified by Louis Agassiz.

All of the figures below are from Cooke & Pettini (2018), which I employ here to briefly illustrate how D/H is measured. This is the level of detail I didn’t want to get into for either deuterium or helium or lithium, which are comparably involved.

First, here is a spectrum of the quasar they observe, Q1243+307. The quasar itself is not the object of interest here, though quasars are certainly interesting! Instead, we’re looking at the absorption lines along the line of sight; the quasar is being used as a spotlight to illuminate the gas between it and us.

Figure 1. Final combined and flux-calibrated spectrum of Q1243+307 (black histogram) shown with the corresponding error spectrum (blue histogram) and zero level (green dashed line). The red tick marks above the spectrum indicate the locations of the Lyman series absorption lines of the sub-DLA at redshift zabs = 2.52564. Note the exquisite signal-to-noise ratio (S/N) of the combined spectrum, which varies from S/N ≃ 80 near the Lyα absorption line of the sub-DLA (∼4300 Å) to S/N ≃ 25 at the Lyman limit of the sub-DLA, near 3215 Å in the observed frame.

The big hump around 4330 Å is Lyman α emission from the quasar itself. Lyα is the n = 2 to 1 transition of hydrogen, Lyβ is the n = 3 to 1 transition, and so on. The rest frame wavelength of Lyα is far into the ultraviolet at 1216 Å; we see it redshifted to z = 2.558. The rest of the spectrum is continuum and emission lines from the quasar with absorption lines from stuff along the line of sight. Note that the red end of the spectrum at wavelengths longer than 4400 Å is mostly smooth with only the occasional absorption line. Blueward of 4300 Å, there is a huge jumble. This is not noise, this is the Lyα forest. Each of those lines is absorption from hydrogen in clouds at different distances, hence different redshifts, along the line of sight.

Most of the clouds in the Lyα forest are ephemeral. The cross section for Lyα is huge so It takes very little hydrogen to gobble it up. Most of these lines represent very low column densities of neutral hydrogen gas. Once in a while though, one encounters a higher column density cloud that has enough hydrogen to be completely opaque to Lyα. These are damped Lyα systems. In damped systems, one can often spot the higher order Lyman lines (these are marked in red in the figure). It also means that there is enough hydrogen present to have a shot at detecting the slightly shifted version of Lyα of deuterium. This is where the abundance ratio D/H is measured.

To measure D/H, one has not only to detect the lines, but also to model and subtract the continuum. This is a tricky business in the best of times, but here its importance is magnified by the huge difference between the primary Lyα line which is so strong that it is completely black and the deuterium Lyα line which is incredibly weak. A small error in the continuum placement will not matter to the measurement of the absorption by the primary line, but it could make a huge difference to that of the weak line. I won’t even venture to discuss the nonlinear difference between these limits due to the curve of growth.

Figure 2. Lyα profile of the absorption system at zabs = 2.52564 toward the quasar Q1243+307 (black histogram) overlaid with the best-fitting model profile (red line), continuum (long dashed blue line), and zero-level (short dashed green line). The top panels show the raw, extracted counts scaled to the maximum value of the best-fitting continuum model. The bottom panels show the continuum normalized flux spectrum. The label provided in the top left corner of every panel indicates the source of the data. The blue points below each spectrum show the normalized fit residuals, (data–model)/error, of all pixels used in the analysis, and the gray band represents a confidence interval of ±2σ. The S/N is comparable between the two data sets at this wavelength range, but it is markedly different near the high order Lyman series lines (see Figures 4 and 5). The red tick marks above the spectra in the bottom panels show the absorption components associated with the main gas cloud (Components 2, 3, 4, 5, 6, 8, and 10 in Table 2), while the blue tick marks indicate the fitted blends. Note that some blends are also detected in Lyβ–Lyε.

The above examples look pretty good. The authors make the necessary correction for the varying spectral sensitivity of the instrument, and take great care to simultaneously fit the emission of the quasar and the absorption. I don’t think they’ve done anything wrong; indeed, it looks like they did everything right – just as the people measuring lithium in stars have.

Still, as an experienced spectroscopist, there are some subtle details that make me queasy. There are two independent observations, which is awesome, and the data look almost exactly the same, a triumph of repeatability. The fitted models are nearly identical, but if you look closely, you can see the model cuts slightly differently along the left edge of the damped absorption around 4278 Å in the two versions of the spectrum, and again along the continuum towards the right edge.

These differences are small, so hopefully don’t matter. But what is the continuum, really? The model line goes through the data, because what else could one possibly do? But there is so much Lyα absorption, is that really continuum? Should the continuum perhaps trace the upper envelope of the data? A physical effect that I worry about is that weak Lyα is so ubiquitous, we never see the true continuum but rather continuum minus a tiny bit of extraordinarily weak (Gunn-Peterson) absorption. If the true continuum from the quasar is just a little higher, then the primary hydrogen absorption is unaffected but the weak deuterium absorption would go up a little. That means slightly higher D/H, which means lower Ωbh2, which is the direction in which the measurement would need to move to come into closer agreement with lithium.

Is the D/H measurement in error? I don’t know. I certainly hope not, and I see no reason to think it is. I do worry that it could be. The continuum level is one thing that could go wrong; there are others. My point is merely that we shouldn’t assume it has to be lithium that is in error.

An important check is whether the measured D/H ratio depends on metallicity or column density. It does not. There is no variation with metallicity as measured by the logarithmic oxygen abundance relative to solar (left panel below). Nor does it appear to depend on the amount of hydrogen in the absorbing cloud (right panel). In the early days of this kind of work there appeared to be a correlation, raising the specter of a systematic. That is not indicated here.

Figure 6. Our sample of seven high precision D/H measures (symbols with error bars); the green symbol represents the new measure that we report here. The weighted mean value of these seven measures is shown by the red dashed and dotted lines, which represent the 68% and 95% confidence levels, respectively. The left and right panels show the dependence of D/H on the oxygen abundance and neutral hydrogen column density, respectively. Assuming the Standard Model of cosmology and particle physics, the right vertical axis of each panel shows the conversion from D/H to the universal baryon density. This conversion uses the Marcucci et al. (2016) theoretical determination of the d(p,γ)3He cross-section. The dark and light shaded bands correspond to the 68% and 95% confidence bounds on the baryon density derived from the CMB (Planck Collaboration et al. 2016).

I’ll close by noting that Ωbh2 from this D/H measurement is indeed in very good agreement with the best-fit Planck CMB value. The question remains whether the physics assumed by that fit, baryons+non-baryonic cold dark mater+dark energy in a strictly FLRW cosmology, is the correct assumption to make.

Some more persistent cosmic tensions

Some more persistent cosmic tensions

I set out last time to discuss some of the tensions that persist in afflicting cosmic concordance, but didn’t get past the Hubble tension. Since then, I’ve come across more of that, e.g., Boubel et al (2024a), who use a variant of Tully-Fisher to obtain H0 = 73.3 ± 2.1(stat) ± 3.5(sys) km/s/Mpc. Having done that sort of work, their systematic uncertainty term seemed large to me. I then came across Scolnic et al. (2024) who trace this issue back to one apparently erroneous calibration amongst many, and correct the results to H0 = 76.3 ± 2.1(stat) ± 1.5(sys) km/s/Mpc. Boubel is an author of the latter paper, so apparently agrees with this revision. Fortunately they didn’t go all Sandage-de Vaucouleurs on us, but even so, this provides a good example of how fraught this field can get. It also demonstrates the opportunity for confirmation bias, as the revised numbers are almost exactly what we find ourselves. (New results coming soon!)

It’s a dang mess.

The Hubble tension is only the most prominent of many persistent tensions, so let’s wade into some of the rest.

The persistent tension in the amplitude of the power spectrum

The tension that cosmologists seem to stress about most after the Hubble tension is that in σ8. σ8 quantifies the amplitude of the power spectrum; it is a measure of the rms fluctuation in mass in spheres of 8h-1 Mpc. Historically, this scale was chosen because early work by Peebles & Yu (1970) indicated that this was the scale on which the rms contrast in galaxy numbers* is unity. This is also a handy dividing line between linear and nonlinear regimes. On much larger scales, the fluctuations are smaller (a giant sphere is closer to the average for the whole universe) so can be treated in the limit of linear perturbation theory. Individual galaxies are “small” by this standard, so can’t be treated+ so simply, which is the excuse many cosmologists use to run shrieking from discussing them.

As we progressed from wrapping our heads around an expanding universe to quantifying the large scale structure (LSS) therein, the power spectrum statistically describing LSS became part of the canonical set of cosmological parameters. I don’t myself consider it to be on par with the Big Two, the Hubble constant H0 and the density parameter Ωm, but many cosmologists do seem partial to it despite the lack of phase information. Consequently, any tension in the amplitude σ8 garners attention.

The tension in σ8 has been persistent insofar as I recall debates in the previous century where some kinds of data indicated σ8 ~ 0.5 while other data preferred σ8 ~ 1. Some of that tension was in underlying assumptions (SCDM before LCDM). Today, the difference is [mostly] between the Planck best-fit amplitude σ8 = 0.811 ± 0.006 and various local measurements that typically yield 0.7something. For example, Karim et al. (2024) find low σ8 for emission line galaxies, even after specifically pursuing corrections in a necessary dust model that pushed things in the right direction:

Fig. 16 from Karim et al. (2024): Estimates of σ8 from emission line galaxies (red and blue), luminous red galaxies (grey), and Planck (green).

As with so many cosmic parameters, there is degeneracy, in this case between σ8 and Ωm. Physically this happens because you get more power when you have more stuff (Ωm), but the different tracers are sensitive to it in different ways. Indeed, if I put on a cosmology hat, I personally am not too worried about this tension – emission line galaxies are typically lower mass than luminous red galaxies, so one expects that there may be a difference in these populations. The Planck value is clearly offset from both, but doesn’t seem too far afield. We wouldn’t fret at all if it weren’t for Planck’s damnably small error bars.

This tension is also evident as a function of redshift. Here are measures of the combination of parameters fσ8  =  Ωm(z)γσ8 measured and compiled by Boubel et al (2024b):

Fig. 16 from Boubel et al (2024b). LCDM matches the data for σ8 = 0.74 (green line); the purple line is the expectation from Planck (σ8 = 0.81). The inset shows the error ellipse, which is clearly offset from the Planck value (crossed lines), particularly for the GR& value of γ = 0.55.

The line representing the Planck value σ8 = 0.81 overshoots most of the low redshift data, particularly those with the smallest uncertainties. The green line has σ8 = 0.74, so is a tad lower than Planck in the same sense as other low redshift measures. Again, the offset is modest, but it does look significant. The tension is persistent but not a show-stopper, so we generally shrug our shoulders and proceed as if it will inevitably work out.

The persistent tension in the cosmic mass density

A persistent tension that nobody seems to worry about is that in the density parameter Ωm. Fits to the Planck CMB acoustic power spectrum currently peg Ωm = 0.315±0.007, but as we’ve seen before, this covaries with the Hubble constant. Twenty years ago, WMAP indicated Ωm = 0.24 and H0 = 73, in good agreement with the concordance region of other measurements, both then and now. As with H0, the tension is posed by the itty bitty uncertainties on the Planck fit.

Experienced cosmologists may be inclined to scoff at such tiny error bars. I was, so I’ve confirmed them myself. There is very little wiggle room to match the Planck data within the framework of the LCDM model. I emphasize that last bit because it is an assumption now so deeply ingrained that it is usually left unspoken. If we leave that part out, then the obvious interpretation is that Planck is correct and all measurements that disagree with it must suffer from some systematic error. This seems to be what most cosmologists believe at present. If we don’t leave that part out, perhaps because we’re aware of other possibilities so are not willing to grant this assumption, then the various tensions look like failures of a model that’s already broken. But let’s not go there today, and stay within the conventional framework.

There are lots of ways to estimate the gravitating mass density of the universe. Indeed, it was the persistent, early observation that the mass density Ωm exceeded that in baryons, Ωb, from big bang nucleosynthesis that got got the non-baryonic dark matter show on the road: there appears to be something out there gravitating that’s not normal matter. This was the key observation that launched non-baryonic cold dark matter: if Ωm > Ωb, there has% to be some kind of particle that is non-baryonic.

So what is Ωm? Most estimates have spanned the range 0.2 < Ωm < 0.4. In the 1980s and into the 1990s, this seemed close enough to Ωm = 1, by the standards of cosmology, that most Inflationary cosmologists presumed it would work out to what Inflation predicted, Ωm = 1 exactly. Indeed, I remember that community directing some rather vicious tongue-lashings at observers, castigating them to look harder: you will surely get Ωm = 1 if you do it right, you fools. But despite the occasional claim to get this “right” answer, the vast majority of the evidence never pointed that way. As I’ve related before, an important step on the path to LCDM – probably the most important step – was convincing everyone that really Ωm < 1.

Discerning between Ωm = 0.2 and 0.3 is a lot more challenging than determining that Ωm < 1, so we tend to treat either as acceptable. That’s not really fair in this age of precision cosmology. There are far too many estimates of the mass density to review here, so I’ll just note a couple of discrepant examples while also acknowledging that it is easy to find dynamical estimates that agree with Planck.

To give a specific example, Mohayaee & Tully (2005) obtained Ωm = 0.22 ± 0.02 by looking at peculiar velocities in the local universe. This was consistent with other constraints at the time, including WMAP, but is 4.5σ from the current Planck value. That’s not quite the 5σ we arbitrarily define to be an undeniable difference, but it’s plenty significant.

There have of course been other efforts to do this, and many of them lead to the same result, or sometimes even lower Ωm. For example, Shaya et al. (2022) use the Numerical Action Method developed by Peebles to attempt to work out the motions of nearly 10,000 galaxies – not just their Hubble expansion, but their individual trajectories under the mutual influence of each other’s gravity and whatever else may be out there. The resulting deviations from a pure Hubble flow depend on how much mass is associated with each galaxy and whatever other density there is to perturb things.

Fig. 4 from Shaya et al (2022): The gravitating mass density as a function of scale. After some local variations (hello Virgo cluster!), the data converge to Ωm = 0.12. Reaching Ωm = 0.24 requires an equal, additional amount of mass in “interhalo matter.” Even more mass would be required to reach the Planck value (red line added to original figure).

This result is in even greater tension with Planck than the earlier work by Mohayaee & Tully (2005). I find the need to invoke interhalo matter disturbing, since it acts as a pedestal in their analysis: extra mass density that is uniform everywhere. This is necessary so that it contributes to the global mass density Ωm but does not contribute to perturbing the Hubble flow.

One can imagine mass that is uniformly distributed easily enough, but what bugs me is that dark matter should not do this. There is no magic segregation between dark matter that forms into halos that contain galaxies and dark matter that just hangs out in the intergalactic medium and declines to participate in any gravitational dynamics. That’s not an option available to it: if it gravitates, it should clump. To pull this off, we’d need to live in a universe made of two distinct kinds of dark matter: cold dark matter that clumps and a fluid that gravitates globally but does not clump, sort of an anti-dark energy.

Alternatively, we might live in an underdense region such that the local Ωm is less than the global Ωm. This is an idea that comes and goes for one reason or another, but it has always been hard to sustain. The convergence to low Ωm looks pretty steady out to ~100 Mpc in the plot above; that’s a pretty big hole. Recall the non-linearity scale discussed above; this scale is a factor of ten larger so over/under-densities should typical be ±10%. This one is -60%, so I guess we’d have to accept that we’re not Copernican observers after all.

The persistent tension in bulk flows

Once we get past the basic Hubble expansion, individual galaxies each have their own peculiar motion, and beyond that we have bulk flows. These have been around a long time. We obsessed a lot about them for a while with discoveries like the Great Attractor. It was weird; I remember some pundits talking about “plate tectonics” in the universe, like there were giant continents of galaxy superclusters wandering around in random directions relative to the frame of the microwave background. Many of us, including me, couldn’t grok this, so we chose not to sweat it.

There is no single problem posed by bulk flows^, and of course you can find those that argue they pose no problem at all. We are in motion relative to the cosmic (CMB) frame$, but that’s just our Milky Way’s peculiar motion. The strange fact is that it’s not just us; the entirety of the local universe seems to have a unexpected peculiar motion. There are lots of ways to quantify this; here’s a summary table from Courtois et al (2025):

Table 1 from Courtois et al (2025): various attempts to measure the scale of dynamical homogeneity.

As we look to large scales, we expect the universe to converge to homogeneity – that’s the Cosmological Principle, which is one of those assumptions that is so fundamental that we forget we made it. The same holds for dynamics – as we look to large scales, we expect the peculiar motions to average out, and converge to a pure Hubble flow. The table above summarizes our efforts to measure the scale on which this happens – or doesn’t. It also shows what we expect on the second line, “predicted LCDM,” where you can see the expected convergence in the declining bulk velocities as the scale probed increases. The third line is for “cosmic variance;” when you see these words it usually means something is amiss so in addition to the usual uncertainties we’re going to entertain the possibility that we live in an abnormal universe.

Like most people, I was comfortably ignoring this issue until recently, when we had a visit and a talk from one of the protagonists listed above, Richard Watkins (W23). One of the problems that challenge this sort of work is the need for a large sample of galaxies with complete sky coverage. That’s observationally challenging to obtain. Real data are heterogeneous; treating this properly demands a more sophisticated treatment than the usual top-hat or Gaussian approaches. Watkins described in detail what a better way could be, and patiently endured the many questions my colleagues and I peppered him with. This is hard to do right, which gives aid and comfort to the inclination to ignore it. After hearing his talk, I don’t think we should do that.

Panel from Fig. 7 of Watkins et al. (2023): The magnitude of the bulk flow as a function of scale. The green points are the data and the red dashed line is the expectation of LCDM. The blue dotted line is an estimate of known systematic effects.

The data do not converge with increasing scale as expected. It isn’t just the local space density Ωm that’s weird, it’s also the way in which things move. And “local” isn’t at all small here, with the effect persisting out beyond 300 Mpc for any plausible h = H0/100.

This is formally a highly significant result, with the authors noting that “the probability of observing a bulk flow [this] large … is small, only about 0.015 per cent.” Looking at the figure above, I’d say that’s a fairly conservative statement. A more colloquial way of putting it would be “no way we gonna reconcile this!” That said, one always has to worry about systematics. They’ve made every effort to account for these, but there can always be unknown unknowns.

Mapping the Universe

It is only possible to talk about these things thanks to decades of effort to map the universe. One has to survey a large area of sky to identify galaxies in the first place, then do follow-up work to obtain redshifts from spectra. This has become big business, but to do what we’ve just been talking about, it is further necessary to separate peculiar velocities from the Hubble flow. To do that, we need to estimate distances by some redshift-independent method, like Tully-Fisher. Tully has been doing this his entire career, with the largest and most recent data product being Cosmicflows-4. Such data reveal not only large bulk flows, but extensive structure in velocity space:

The Laniakea supercluster of galaxies (Tully et al. 2014).

We have a long way to go to wrap our heads around all of this.

Persistent tensions persist

I’ve discussed a few of the tensions that persist in cosmic data. Whether these are mere puzzles or a mounting pile of anomalies is a matter of judgement. They’ve been around for a while, so it isn’t fair to suggest that all of the data are consistent with LCDM. Nevertheless, I hear exactly this asserted with considerable frequency. It’s as if the definition of all is perpetually shrinking to include only the data that meet the consistency criterion. Yet it’s the discrepant bits that are interesting for containing new information; we need to grapple with them if the field is to progress.

*This was well before my time, so I am probably getting some aspect of the history wrong or oversimplifying it in some gross way. Crudely speaking, if you randomly plop down spheres of this size, some will be found to contain the cosmic average number of galaxies, some twice that, some half that. That the modern value of σ8 is close to unity means that Peebles got it basically right with the data that were available back then and that galaxy light very nearly traces mass, which is not guaranteed in a universe dominated by dark matter.


+It amazes me how pervasively “galaxies are complicated” is used as an excuse++ to ignore all small scale evidence.

Not all of us are limited to working on the simplest systems. In this case, it doesn’t matter. The LCDM prediction here is that galaxies should be complicated because they are nonlinear. But the observation is that they are simple – so simple that they obey a single effective force law. That’s the contradiction right there, regardless of what flavor of complicated might come out of some high resolution simulation.

++At one KITP conference I attended, a particle-cosmologist said during a discussion session, in all seriousness and with a straight face, “We should stop talking about rotation curves.” Because scientific truth is best revealed by ignoring the inconvenient bits. David Merritt remarked on this in his book A Philosophical Approach to MOND. He surveyed the available cosmology textbooks, and found that not a single one of them mentioned the acceleration scale in the data. I guess that would go some way to explaining why statements of basic observational facts are often met with stunned silence. What’s obvious and well-established to me is a wellspring of fresh if incredible news to them. I’d probably give them the stink-eye about the cosmological constant if I hadn’t been paying the slightest attention to cosmology for the past thirty years.


&There is an elegant approach to parameterizing the growth of structure in theories that deviate modestly from GR. In this context, such theories are usually invoked as an alternative to dark energy, because it is socially acceptable to modify GR to explain dark energy but not dark matter. The curious hysteresis of that strange and seemingly self-contradictory attitude aside, this approach cannot be adapted to MOND because it assumes linearity while MOND is inherently nonlinear. My very crude, back-of-the-envelope expectation for MOND is very nearly constant γ ~ 0.4 (depending on the scale probed) out to high redshift. The bend we see in the conventional models around z ~ 0.6 will occur at z > 2 (and probably much higher) because structure forms fast in MOND. It is annoyingly difficult to put a more precise redshift on this prediction because it also depends on the unknown metric. So this is a more of a hunch than a quantitative prediction. Still, it will be interesting to see if roughly constant fσ8 persists to higher redshift.


%The inference that non-baryonic dark matter has to exist assumes that gravity is normal in the sense taught to us by Newton and Einstein. If some other theory of gravity applies, then one has to reassess the data in that context. This is one of the first considerations I made of MOND in the cosmological context, finding Ωm ≈ Ωb.


^MOND is effective at generating large bulk flows.


$Fun fact: you can type the name of a galaxy into NED (the NASA Extragalactic Database) and it will give you lots of information, including its recession velocity referenced to a variety of frames of reference and the corresponding distance from the Hubble law V = H0D. Naively, you might think that the obvious choice of reference from is the CMB. You’d be wrong. If you use this, you will get the wrong distance to the galaxy. Of all the choices available there, it consistently performs the worst as adjudicated by direct distance measurements (e.g., Cepheids).

NED used to provide a menu of choices for the value of H0 to use. It says something about the social-tyranny of precision cosmology that it now defaults to the Planck value. If you use this, you will get the wrong distance to the galaxy. Even if the Planck H0 turns out to be correct in some global sense, it does not work for real galaxies that are relatively near to us. That’s what it means to have all the “local” measurements based on direct distance measurements (e.g., Cepheids) consistently give a larger H0.

Galaxies in the local universe are closer than they appear. Photo by P.S. Pratheep, www.pratheep.com

Some persistent cosmic tensions

Some persistent cosmic tensions

I took the occasion of the NEIU debate to refresh my knowledge of the status of some of the persistent tensions in cosmology. There wasn’t enough time to discuss those, so I thought I’d go through a few of them here. These issues tend to get downplayed or outright ignored when we hype LCDM’s successes.

When I teach cosmology, I like to have the students do a project in which they each track down a measurement of some cosmic parameter, and then report back on it. The idea, when I started doing this back in 1999, was to combine the different lines of evidence to see if we reach a consistent concordance cosmology. Below is an example from the 2002 graduate course at the University of Maryland. Does it all hang together? I ask the students to debate the pros and cons of the various lines of evidence.

The mass density parameter Ωm = ρmcrit and the Hubble parameter h = H0/(100 km/s/Mpc) from various constraints (colored lines) available in 2002. I later added the first (2003) WMAP result (box). The combination of results excludes the grey region; only the white portion is viable: this is the concordance region.

The concordance cosmology is the small portion of this diagram that was not ruled out. This is the way in which LCDM was established. Before we had either the CMB acoustic power spectrum or Type Ia supernovae, LCDM was pretty much a done deal based on a wide array of other astronomical evidence. It was the subsequentα agreement of the Type Ia SN and the CMB that cemented the picture in place.

The implicit assumption in this approach is that we have identified the correct cosmology by process of elimination: whatever is left over must be the right answer. But what if nothing is left over?

I have long worried that we’ve painted ourselves into a corner: maybe the concordance window is merely the least unlikely spot before everything is excluded. Excluding everything would effectively falsify LCDM cosmology, if not the more basic picture of an expanding universe% emerging from a hot big bang. Once one permits oneself to think this way, then it occurs to one that perhaps the reason we have to invoke the twin tooth fairies of dark matter and dark energy is to get FLRW to approximate some deeper, underlying theory.

Most cosmologists do not appear to contemplate this frightening scenario. And indeed, before we believe something so drastic, we have to have thoroughly debunked the standard picture – something rather difficult to do when 95% of it is invisible. It also means believing all the constraints that call the standard picture into question (hence why contradictory results experience considerably more scrutiny* than conforming results). The fact is that some results are more robust than others. The trick is deciding which to trust.^

In the diagram above, the range of Ωm from cluster mass-to-light ratios comes from some particular paper. There are hundreds of papers on this topic, if not thousands. I do not recall which one this particular illustration came from, but most of the estimates I’ve seen from the same method come in somewhat higher. So if we slide those green lines up, the allowed concordance window gets larger.

The practice of modern cosmology has necessarily been an exercise in judgement: which lines of evidence should we most trust? For example, there is a line up there for rotation curves. That was my effort to ask what combination of cosmological parameters led to dark matter halo densities that were tolerable to the rotation curve data of the time. Dense cosmologies give birth to dense dark matter halos, so everything above that line was excluded because those parameters cram too much dark matter into too little space. This was a pretty conservative limit at the time, but it is predicated on the insistence of theorists that dark matter halos had to have the NFW form predicted by dark matter-only simulations. Since that time, simulations including baryons have found any number of ways to alter the initial cusp. This in turn means that the constraint no longer applies as the halo might have been altered from its original, cosmology-predicted initial form. Whether the mechanisms that might cause such alterations are themselves viable becomes a separate question.

If we believed all of the available constraints, then there is no window left and FLRW is already ruled out. But not all of those data are correct, and some contradict each other, even absent the assumption of FLRW. So which do we believe? Finding one’s path in this field is like traipsing through an intellectual mine field full of hardened positions occupied by troops dedicated to this or that combination of parameters.

H0 = 100! No, repent you fools, H0 = 50! (Comic by Paul North)

It is in every way an invitation to confirmation bias. The answer we get depends on how we weigh disparate lines of evidence. We are prone to give greater weight to lines of evidence that conform to our pre-established+ beliefs.

So, with that warning, let’s plunge ahead.

The modern Hubble tension

Gone but not yet forgotten are the Hubble wars between camps Sandage (H0 = 50!) and de Vaucouleurs (H0 = 100!). These were largely resolved early this century thanks to the Hubble Space Telescope Key Project on the distance scale. Obtaining this measurement was the major motivation to launch HST in the first place. Finally, this long standing argument was resolved: nearly everyone agreed that H0 = 72 km/s/Mpc.

That agreement was long-lived by the standards of cosmology, but did not last forever. Here is an illustration of the time dependence of H0 measurements this century, from Freedman (2021):

There are many illustrations like this; I choose this one because it looks great and seems to have become the go-to for illustrating the situation. Indeed, it seems to inform the attitude of many scientists close to but not directly involved in the H0 debate. They seem to perceive this as a debate between Adam Riess and Wendy Freedman, who have become associated with the Cepheid and TRGB$ calibrations, respectively. This is a gross oversimplification, as they are not the only actors on a very big stage&. Even in this plot, the first Cepheid point is from Freedman’s HST Key Project. But this apparent dichotomy between calibrators and people seems to be how the subject is perceived by scientists who have neither time nor reason for closer scrutiny. Let’s scrutinize.

Fits to the acoustic power spectrum of the CMB agreed with astronomical measurements of H0 for the first decade of the century. Concordance was confirmed. The current tension appeared with the first CMB data from Planck. Suddenly the grey band of the CMB best-fit no longer overlapped with the blue band of astronomical measurements. This came as a shock. Then a new (red) band appears, distinguishing between the “local” H0 calibrated by the TRGB from that calibrated by Cepheids.

I think I mentioned that cosmology was an invitation to confirmation bias. If you put a lot of weight on CMB fits, as many cosmologists do, then it makes sense from that perspective that the TRGB measurement is the correct one and the Cepheid H0 must be wrong. This is easy to imagine given the history of systematic errors that plagued the subject throughout the twentieth century. This confirmation bias makes one inclined to give more credence to the new# TRGB calibration, which is only in modest tension with the CMB value. The narrative is then simplified to two astronomical methods that are subject to systematic uncertainty: one that agrees with the right answer and one that does not. Ergo, the Cepheid H0 is in systematic error.

This narrative oversimplifies that matter to the point of being actively misleading, and the plot above abets this by focusing on only two of the many local measurements. There is no perfect way to do this, but I had a go at it last year. In the plot below, I cobbled together all the data I could without going ridiculously far back, but chose to show only one point per independent group, the most recent one available from each, the idea being that the same people don’t get new votes every time they tweak their result – that’s basically what is illustrated above. The most recent points from above are labeled Cepheids & TRGB (the date of the TRGB goes to the full Chicago-Carnegie paper, not Freedman’s summary paper where the above plot can be found). See McGaugh (2024) for the references.

When I first made this plot, I discovered that many measurements of the Hubble constant are not all that precise: the plot was an indecipherable forest of error bars. So I chose to make a cut at a statistical uncertainty of 3 km/s/Mpc: worse than that, the data are shown as open symbols sans error bars; better than that, the datum gets explicit illustration of both its statistical and systematic uncertainty. One could make other choices, but the point is that this choice paints a different picture from the choice made above. One of these local measurements is not like the others, inviting a different version of confirmation bias: the TRGB point is the outlier, so perhaps it is the one that is wrong.

Recent measurements of the Hubble constant (left) and the calibration of the baryonic Tully-Fisher relation (right) underpinning one of those measurements.

I highlight the measurement our group made not to note that we’ve done this too so much as to highlight an underappreciated aspect of the apparent tension between Cepheid and TRGB calibrations. There are 50 galaxies that calibrate the baryonic Tully-Fisher relation, split nearly evenly between galaxies whose distance is known through Cepheids (blue points) and TRGB (red points). They give the same answer. There is no tension between Cepheids and the TRGB here.

Chasing this up, it appears to me that what happened was that Freedman’s group reanalyzed the data that calibrate the TRGB, and wound up with a slightly different answer. This difference does not appear to be in the calibration equation (the absolute magnitude of the tip of the red giant branch didn’t change that much), but in something to do with how the tip magnitude is extracted. Maybe, I guess? I couldn’t follow it all the way, and I got bad vibes reminding me of when I tried to sort through Sandage’s many corrections in the early ’90s. That doesn’t make it wrong, but the point is that the discrepancy is not between Cepheids and TRGB calibrations so much as it is between the TRGB as implemented by Freedman’s group and the TRGB as implemented by others. The depiction of the local Hubble constant debate as being between Cepheid and TRGB calibrations is not just misleading, it is wrong.

Can we get away from Cepheids and the TRGB entirely? Yes. The black points above are for megamasers and gravitational lensing. These are geometric methods that do not require intermediate calibrators like Cepheids at all. It’s straight trigonometry. Both indicate H0 > 70. Which way is our confirmation bias leaning now?

The way these things are presented has an impact on scientific consensus. A fascinating experiment on this has been done in a recent conference report. Sometimes people poll conference attendees in an attempt to gauge consensus; this report surveys conference attendees “to take a snapshot of the attitudes of physicists working on some of the most pressing questions in modern physics.” One of the topics queried is the Hubble tension. Survey says:

Table XII from arXiv:2503.15776 in which scientists at the 2024 conference Black Holes Inside and Out vote on their opinion about the most likely solution of the Hubble tension.

First, a shout out to the 1/4 of scientists who expressed no opinion. That’s the proper thing to do when you’re not close enough to a subject to make a well-informed judgement. Whether one knows enough to do this is itself a judgement call, and we often let our arrogance override our reluctance to over-share ill-informed opinions.

Second, a shout out to the folks who did the poll for including a line for systematics in the CMB. That is a logical possibility, even if only 3 of the 72 participants took it seriously. This corroborates the impression I have that most physicists seem to think the CMB is prefect like some kind of holy scripture written in fire on the primordial sky, so must be correct and cannot be questioned, amen. That’s silly; systematics are always a possibility in any observation of the sky. In the case of the CMB, I suspect it is not some instrumental systematic but the underlying assumption of LCDM FLRW that is the issue; once one assumes that, then indeed, the best fit to the Planck data as published is H0 = 67.4, with H0 > 68 being right out. (I’ve checked.)

A red flag that the CMB is where the problem lies is the systematic variation of the best-fit parameters along the trench of minimum χ2:

The time evolution of best-fit CMB cosmology parameters. These have steadily drifted away from the LCDM concordance window while the astronomical measurements that established it have not.

I’ve shown this plot and variations for other choices of H0 before, yet it never fails to come as a surprise when I show it to people who work closely on the subject. I’m gonna guess that extends to most of the people who participated in the survey above. Some red flags prove to be false alarms, some don’t, but one should at least be aware of them and take them into consideration when making a judgement like this.

The plurality (35%) of those polled selected “systematic error in supernova data” as the most likely cause of the Hubble tension. It is indeed a common attitude, as I mentioned above, that the Hubble tension is somehow a problem of systematic errors in astronomical data like back in the bad old days** of Sandage & de Vaucouleurs.

Let’s unpack this a bit. First, the framing: systematic error in supernova data is not the issue. There may, of course, be systematic uncertainties in supernova data, but that’s not a contender for what is causing the apparent Hubble tension. The debate over the local value of H0 is in the calibrators of supernovae. This is often expressed as a tension between Cepheid and TRGB calibrators, but as we’ve seen, even that is misleading. So posing the question this way is all kinds of revealing, including of some implicit confirmation bias. It’s like putting the right answer of a multiple choice question first and then making up some random alternatives.

So what do we learn from this poll for consensus? There is no overwhelming consensus, and the most popular choice appears to be ill-informed. This could be a meme. Tell me you’re not an expert on a subject by expressing an opinion as if you were.

The kicker here is that this was a conference on black hole physics. There seems to have been some fundamental gravitational and quantum physics discussed, which is all very interesting, but this is a community that is pretty far removed from the nitty-gritty of astronomical observations. There are many other polls reported in this conference report, many of them about esoteric aspects of black holes that I find interesting but would not myself venture an opinion on: it’s not my field. It appears that a plurality of participants at this particular conference might want to consider adopting that policy for fields beyond their own expertise.

I don’t want to be too harsh, but it seems like we are repeating the same mistakes we made in the 1980s. As I’ve related before, I came to astronomy from physics with the utter assurance that H0 had to be 50. It was Known. Then I met astronomers who were actually involved in measuring H0 and they were like, “Maybe it is ~80?” This hurt my brain. It could not be so! and yet they turned out to be correct within the uncertainties of the time. Today, similar strong opinions are being expressed by the same community (and sometimes by the same people) who were wrong then, so it wouldn’t surprise me if they are wrong now. Putting how they think things should be ahead of how they are is how they roll.

There are other tensions besides the Hubble tension, but I’ll get to them in future posts. This is enough for now.


αAs I’ve related before, I date the genesis of concordance LCDM to the work of Ostriker & Steinhardt (1995), though there were many other contributions leading to it (e.g., Efstathiou et al. 1990). Certainly many of us anticipated that the Type Ia SN experiments would confirm or deny this picture. Since the issue of confirmation bias is ever-present in cosmic considerations, it is important to understand this context: the acceleration of the expansion rate that is often depicted as a novel discovery in 1998 was an expect result. So much so that at a conference in 1997 in Aspen I recall watching Michael Turner badger the SN presenters to Proclaim Lambda already. One of the representatives from the SN teams was Richard Ellis, who wasn’t having it: the SN data weren’t there yet even if the attitude was. Amusingly, I later heard Turner claim to have been completely surprised by the 1998 discovery, as if he hadn’t been pushing for it just the year before. Aspen is a good venue for discussion; I commented at the time that the need to rehabilitate the cosmological constant was a big stop sign in the sky. He glared at me, and I’ve been on his shit list ever since.

%I will not be entertaining assertions that the universe is not expanding in the comments: that’s beyond the scope of this post.

*Every time a paper corroborating a prediction of MOND is published, the usual suspects get on social media to complain that the referee(s) who reviewed the paper must be incompetent. This is a classic case of admitting you don’t understand how the process works by disparaging what happened in a process to which you weren’t privy. Anyone familiar with the practice of refereeing will appreciate that the opposite is true: claims that seem extraordinary are consistently held to a higher standard.

^Note that it is impossible to exclude the act of judgement. There are approaches to minimizing this in particular experiments, e.g., by doing a blind analysis of large scale structure data. But you’ve still assumed a paradigm in which to analyze those data; that’s a judgement call. It is also a judgement call to decide to believe only large scale data and ignore evidence below some scale.

+I felt this hard when MOND first cropped up in my data for low surface brightness galaxies. I remember thinking How can this stupid theory get any predictions right when there is so much evidence for dark matter? It took a while for me to realize that dark matter really meant mass discrepancies. The evidence merely indicates a problem, the misnomer presupposes the solution. I had been working so hard to interpret things in terms of dark matter that it came as a surprise that once I allowed myself to try interpreting things in terms of MOND I no longer had to work so hard: lots of observations suddenly made sense.

$TRGB = Tip of the Red Giant Branch. Low metallicity stars reach a consistent maximum luminosity as they evolve up the red giant branch, providing a convenient standard candle.

&Where the heck is Tully? He seldom seems to get acknowledged despite having played a crucial role in breaking the tyranny of H0 = 50 in the 1970s, having published steadily on the topic, and his group continues to provide accurate measurements to this day. Do physics-trained cosmologists even know who he is?

#The TRGB was a well-established method before it suddenly appears on this graph. That it appears this way shortly after the CMB told us what answer we should get is a more worrisome potential example of confirmation bias, reminiscent of the situation with the primordial deuterium abundance.

**Aside from the tension between the TRGB as implemented by Freedman’s group and the TRGB as implemented by others, I’m not aware of any serious hint of systematics in the calibration of the distance scale. Can it still happen? Sure! But people are well aware of the dangers and watch closely for them. At this juncture, there is ample evidence that we may indeed have gotten past this.

Ha! I knew the Riess reference off the top of my head, but lots of people have worked on this so I typed “hubble calibration not a systematic error” into Google to search for other papers only to have its AI overview confidently assert

The statement that Hubble calibration is not a systematic error is incorrect

Google AI

That gave me a good laugh. It’s bad enough when overconfident underachievers shout about this from the wrong peak of the Dunning-Kruger curve without AI adding its recycled opinion to the noise, especially since its “opinion” is constructed from the noise.

The best search engine for relevant academic papers is NASA ADS; putting the same text in the abstract box returns many hits that I’m not gonna wade through. (A well-structured ADS search doesn’t read so casually; apparently the same still applies to Google.)