The local missing baryon problem

Last time, we started talking about the data in the recent paper The Baryonic Mass-Halo Mass Relation of Extragalactic Systems. Here, we’ll put on our dark matter hat, and use the data to make an accounting of the mass – both the dark matter and the baryons in all their various forms. From this conventional perspective we will obtain a method for relating what we see to what we don’t. In the context of LCDM cosmology, this provides an alternative approach to abundance matching. It also provides a test: are the two consistent?

The conventional picture we have in mind is a baryonic galaxy residing in a dark matter halo bathed in a background of intergalactic matter.

**Fig. 1** of McGaugh et al. (2026): Conceptual elements of a galaxy: the stars (yellow/blue) and atomic gas (green) of NGC 6946 (Spitzer 3.6µ and 21 cm data: F. Walter et al. 2008) are shown embedded in an extended dark matter halo (black). The dark matter density decreases continuously with radius so the halo has no hard edge, but for convenience we adopt the common convention that the radius r₂₀₀ marks the boundary of the dark matter halo and the dividing line between the circumgalactic medium (CGM) and the intergalactic medium (IGM; orange). The stars and atomic gas illustrated here appear within r < 20 kpc while r₂₀₀ ≈ 220 kpc (not shown to scale).

I’ve talked here about the stars and gas a lot because that’s what we see. These are the essential components that define a galaxy and comprise the mass that correlates with rotation velocity to make the baryonic Tully-Fisher relation (BTFR). I’ve talked a bit about the stuff between the galaxies, the intergalactic medium (IGM), but I don’t think I’ve previously had cause to talk much about the circumgalactic medium (CGM). As the name implies, this is gas in the vicinity of a galaxy, but not in the galaxy itself – at least not the part we can readily see. In the notional picture above, the distinction between the CGM and the IGM is the boundary of the dark matter halo that nominally demarcates gravitationally bound from unbound material.

Notional is doing a lot of work here. There’s a lot of gas in the IGM, and some of it is certainly in the vicinity of galaxies, so in that regard counts as circum-galactic. But there’s no hard and fast distinction between these components just as there’s no hard edge to a dark matter halo. Our brains don’t like that, so we impose notional boundaries and proceed as if these are meaningful.

Proceeding thus, we expect our dark matter halo* to contain its fair share of the cosmic baryon fraction, f_b = M_b/M₂₀₀ = 0.157 according to the Planck flavor of LCDM cosmology. We can test this by adding up all the baryons and comparing that to the total mass enclosed by r₂₀₀. This is straightforward for the stars and gas we see, but not for the stuff we don’t see – both dark matter and the gas in the CGM.

There are some measurements of the CGM, but these tend to be statistical in nature (if we stack data for a bunch of galaxies, we sorta see something), not the precise, individual, galaxy-by-galaxy measurements that we have for the stars and atomic gas. The stars and atomic gas are the mass in the extended Tully-Fisher relations we discussed previously, and are the bulk of the normal material in the galaxies we see. The bulk of the CGM lies at much larger radii, beyond the stars and atomic gas, but within the notional edge of the dark matter halo, as depicted above. Since we don’t measure it directly in individual galaxies, we’re gonna leave the mass of the CGM as an open question rather than something to be included in the sum of known baryonic mass.

The situation is even murkier for the dark matter, which we don’t see at all, so we don’t have a good way to measure the “total” mass of dark matter halos. This isn’t even a well-defined quantity in principle since halos are not expected to have a hard edge. Conventionally, we adopt the mass within a radius that contains a density two hundred times the cosmic critical density, r₂₀₀, as the notional edge. There are obscure historical reasons for this choice that I do not have the patience to describe. One could make other choices, arguably better choices, but r₂₀₀ is the most common choice used in the literature so we’ll stick with it here. The halo mass is the mass enclosed by this radius, M₂₀₀. If one goes through the math, it turns out that the circular speed of a test particle, V₂₀₀, orbiting at r₂₀₀ scales with the Hubble parameter [h = H₀/(100 km/s/Mpc)] such that V₂₀₀ = h r₂₀₀ when V₂₀₀ is in km/s and r₂₀₀ is in kpc. The dynamical mass (rV²/G) can then be written

M_{200} = (3.3 \times 10^5\;\mathrm{M}_{\odot}\,\mathrm{km}^{-3}\,\mathrm{s}^3)\,V_{200}^3.

That is a lot of huffing and puffing to get a way to relate the halo mass to something we can (kinda sorta) measure. The flat rotation velocity V_f has always been taken as the signature of the dark matter halo. One therefore expects V₂₀₀ ~ V_f. Indeed, these quantities cannot differ by much if dark matter is what explains flat rotation curves. However, the notional radius of the dark matter halo where V₂₀₀ occurs is much larger, by roughly an order of magnitude or more, than the radius where V_f is measured. So they need not be identical, depending on the halo model. So to relate what we measure to what we’d like to know we define a little ol’ fudge factor, f_v, such that:

V_f = f_v V_{200}

If a rotation curve stays flat indefinitely (as our empirical experience suggests), f_v = 1. If instead dark matter halos behave as they should in LCDM, then the rotation speed should gradually decline as we approach the halo’s edge so that f_v > 1. How much greater?

One way to estimate the fudge factor f_v is to fit dark matter halo models to data. This process does not directly measure V₂₀₀, but it does provide an estimate of that quantity based on the data available a smaller radii. One can do this for as many halo models as one has the patience to consider. For example, here are the results for two common halo models, the traditional pseudo-isothermal halo first adopted to explain flat rotation curves and the CDM-expected NFW halo:

**Figure 2** from McGaugh et al. (2026): The observed flat velocity V_f as it relates to the fitted V₂₀₀ for pseudo-isothermal (left panel) and NFW (right panel) halos (Li et al. 2020). Filled points have formal uncertainties <20% in V₂₀₀; open points are less accurate fits. The solid line shows V_f = V₂₀₀. The gray line in the right panel shows Equation (2a) of Katz et al. (2019), which corresponds roughly to f_v ≈ 1.4.

The result for pseudo-isothermal halos is consistent with f_v = 1, as expected – this model was adopted to make flat rotation curves. There is nevertheless some scatter. This typically happens because the observed rotation is not observed to be flat over a large enough range of radii to enforce flatness further out (as often happens in dwarf galaxies) or because the stars account for so much of the mass over the observed range that the inferred dark matter component is still rising (as often happens in bright, high surface brightness galaxies). This sort of haziness is inevitable when one only measures the inner few percent of the notional virial radius.

The result for NFW halos is approximately f_v = 1.4, albeit with a lot more scatter. This happens for the same reasons as above, with the additional problem that the dark matter profile in real galaxies rarely looks like NFW. Of all the many halo models considered by Li et al. (2020), NFW consistently performs the worst. One is forcing a fit of a function that would rather not. One signature of this misfit is the occurrence of very large V₂₀₀ for dwarf galaxies with small V_f. Taken literally, this would mean that some of the smallest dwarf galaxies reside in dark matter halos that outweigh those of giants like the Milky Way. This seems absurd, and it is. For example, by this approach, the dwarf galaxy NGC 3109 residing just outside the Local Group outweighs the Local Group and both its giants, Andromeda and the Milky Way, put together. But it is pretty clear from the local velocity field that the entire Local Group is not orbiting this little dwarf.

The estimation of huge V₂₀₀ for galaxies with small V_f happens because of the cusp-core problem. The density cusp predicted by NFW expects a curved shape for the inner rotation curve while the data show a more gradual, quasi-linear rise. Any decent fitting program will realize that it can make a curve look like a straight line if it stretches it out enough, so it does exactly this by making the halo very large. That sorta fits the data, but it makes no physical sense. Between this systematic effect and the large scatter induced by the other effects discussed above, one is better off inferring V₂₀₀ from V_f with a fixed fudge factor. So we’ll do that, leaving the exact value of f_v as an open question, but noting that for most objects it almost certainly resides in the narrow range

1 \le f_v \le 1.4.

That’s a lot of words to say the observed flat rotation speed gives us our best kinematic estimator or the dark matter halo mass. In this context, bear in mind the small scatter in the extended Tully-Fisher relations. This contrasts with the large scatter seen in the fits above. This strongly implies that V_f is more closely tied to the underlying mass^{^} than are the model-specific halo fits to the entire rotation curve. That might seem counterintuitive given that V_f is only a portion of the rotation curve (albeit a well-defined portion). However, it makes more sense when one considers that rotation curve fits must consider the contribution of stars as well as dark matter. Since the stellar mass-to-light ratio is never perfectly known, there is a degeneracy between the two that contributes to the scatter seen above. That variation is not real, it’s just an artifact of the fitting procedure. But when we get to large radii, beyond the confounding effects of the stellar population, the signature of the dominant mass becomes apparent in the flat rotation speed.

We saw above that we expect the halo mass M₂₀₀ to correlate with V₂₀₀. We observe that baryonic mass M_b correlates with the flat rotation velocity V_f. The natural assumption is that the stuff we see is proportional to the total (mostly dark) mass while the observed flat velocity is a property of the halo. Hence M_b ~ M₂₀₀ and V_f ~ V₂₀₀. This simple argument has been the basis for many papers claiming to explain the Tully-Fisher relation over the course of many years. This would be entirely satisfactory if it weren’t so completely wrong.

Here we need to introduce another fudge factor, m_b, that relates the mass we see to the halo that spawned each galaxy:

M_b = m_b\,M_{200}

The obvious assumption is that m_b is a constant for all galaxies, in which case Tully-Fisher follows because M_b ~ M₂₀₀ ~ V₂₀₀³ and V₂₀₀ ~ V_f. The wee problem is that this predicts a Tully-Fisher relation with slope 3: M_b ~ V_f³ when we observe one with slope 4: M_b ~ V_f⁴. In order to reconcile these two, our new fudge factor cannot be a constant. Worse, we need to fine tune it to transform the predicted power law into the observed one: m_b ~ V_f. That… doesn’t make any sense.

We can refrain from thinking and plunge ahead to simply plot the baryon fraction. While we’re at it, let’s also plot the stellar mass fraction m_* = M_*/M₂₀₀ because that is more commonly discussed in the literature. (Often stellar masses are available for galaxies without the corresponding gas mass measurements.) These fractions have to be increasing functions of circular velocity, or equivalently, mass (m_b ~ V_f ~ M_b^1/4):

**Figure 4** from McGaugh et al. (2026): The stellar mass fraction as a function of stellar mass (top) and the baryonic mass fraction as a function of baryonic mass (bottom). Data and symbols as in Figure 3 with the additional distinction that large squares in the top panel represent the sum of the stellar mass of all galaxies in a group or cluster while small squares are the stellar mass of the brightest galaxy only. The horizontal line is the cosmic baryon fraction f_b = 0.157 (Planck Collaboration et al. 2020). The colored lines in the top panels show the stellar mass–halo mass relations from abundance matching given by B. P. Moster et al. (2013; dashed–dotted green line), P. S. Behroozi et al. (2013; dashed–triple dotted pink line), and A. V. Kravtsov et al. (2018; red dashed line). The black line in the lower panel is m_b = f_b tanh(M_b/M₀)^1/4 where f_b is the cosmic baryon fraction (0.157) and M₀ = 5 x 10¹³ M_☉.

To be specific, I’ve computed the halo mass assuming f_v = 1. Different assumptions just slide the data up and down; the trend persists. This is discussed more in the paper if you’re interested in such details.

This gives a nifty way to relate what we can see to what we can’t. There’s a simple formula:

m_b = f_b \tanh\left(\frac{M_b}{M_0}\right)^{1/4}

where f_b = 0.157 is the cosmic baryon fraction and and M₀ = 5 x 10¹³ M_☉ is the scale where the function bends, transitioning from the M_b ~ V_f⁴ of the BTFR that holds over most of the mass range to the m_b = f_b of rich galaxy clusters. The precise value of the turnover mass is not well constrained, as it happens in the one place that is not well sampled by the available data. Indeed, there is nothing special about the functional form; it is simply a choice that transitions nicely from one regime to the other. There’s no physics in it^&. Still, this is a useful way to estimate the halo mass of pretty much any extragalactic object just by summing up its observed baryonic mass.

Indeed, this kinematic mass-matching relation is better than the widely used abundance matching relations in that it has less scatter. Abundance matching generally relies on stellar mass; that results in more scatter for the same reasons discussed for Tully-Fisher. This is particularly apparent at the low mass end of the top panel above, where galaxies of the same circular velocity (halo mass) have very different stellar masses. This goes away when baryonic mass is used instead.

There is reasonable agreement between abundance matching and kinematics at intermediate masses. The lines representing various abundance matching relations parallel the kinematic data. The offsets that are apparent can be cured by an appropriate choice of f_v. Always a free parameter to the rescue there is.

At the high mass end, things go amiss again. Partly this is because abundance matching relations reference the stellar mass of the “central” galaxy. The picture is that each halo contains one central galaxy with many satellite galaxies in subhalos, so what matters is the stellar mass of the central. This is overly simplistic: galaxy clusters are messy, the brightest galaxy isn’t necessarily at the center, and most have substructure with multiple groups rather than a single hierarchy. Besides that, the stellar mass tells you little about the halo mass without further environmental context: a galaxy with M_* ~ 4 x 10¹¹ M_☉ could reside in halo masses spanning a couple of orders of magnitude.

Setting aside the issue of centrals, there is a serious tension for individual high mass galaxies. The stellar mass fraction suggested by kinematics keeps going up where that of abundance matching turns over. This is due to the linearity of the Tully-Fisher relation compared to the knee in the Schechter function shape of the stellar mass function. The two don’t match up, as discussed previously. This same tension has long been with us; in the ’90s we were concerned with the difference between “the luminosity function normalization” and “the Tully-Fisher normalization.” This tension never went away. Still, the tension between abundance matching and kinematics doesn’t seem tragic, and might be remedied with some appropriate finagling of both the baryon fraction and the velocity fudge factor.

But where are all the baryons? They’re all accounted for in clusters, which reach the cosmic baryon fraction. But in no other system is the checksum complete. There is a missing baryon problem locally in each and every dark matter halo below the cluster scale. To confound matters further, there is a fine-tuning problem: the amount of missing baryons scales precisely with the amount of observed baryons.

The logarithmic plot above may understate the magnitude of the problem. To clarify this, we can plot the ratio of missing-to-observed baryons on a linear scale, at least in part:

**Figure 7** from McGaugh et al. (2026): The ratio of missing-to-observed baryonic mass as a function of baryonic mass. Data and symbols are the same as above. The ratio is linear in the bottom half of the diagram, then switches to logarithmic in the top half. Spiral galaxies are shown twice: once with f_v = 1.0 (solid blue circles) and again with f_v = 1.4 (small open circles). The Milky Way is the yellow point at the top of the gray band, which shows the range from zero CGM to that required to explain all of the locally missing baryons when f_v = 1. Stars represent the CGM measurements of Milky Way–mass galaxies by Miller & Bregman (2015), Bregman et al. (2022), and Zhang et al. (2026) from bottom to top. These suffice to explain the missing baryons provided that f_v ≈ 1.4. This explanation becomes progressively less plausible for lower mass galaxies.

The scatter blows up when we plot linear ratios; this is an artifact of error propagation. Nevertheless, it is helpful to see that the local missing baryon problem is not subtle. It is already a factor of ~2 for groups and ~3 for bright galaxies. It’s not as if we’ve misplaced a few percent of the baryons. Most of the baryons that should be associated with galaxy dark matter halos are not in evidence.

This problem has been known for a while, but doesn’t seem to be acknowledged to be a problem. Not all baryons need condense down into the central galaxy; some might be left behind, still mixed in with the dark matter halo. The widespread assumption seems to be that the missing baryons are probably in the CGM.

Accounting for the missing baryons with gas in the CGM almost works in bright galaxies like the Milky Way where we need “only” a factor of a few. Recent estimates suggest that the CGM is comparable in mass to the stars, or even somewhat more. These are very uncertain, as this mass is dispersed in diffuse gas over an enormous volume, and the total mass estimates often involve large extrapolations: the CGM is detected most readily nearby the central galaxy, but most of its implied mass is way far out near r₂₀₀. Accepting these estimates at face value leads to the star symbols in the plot above. This makes the checksum complete provided the halo is not too massive, as happens if f_v ≈ 1.4. This is what we expect for NFW halos, so it might work out if those were viable. However, there is a bigger issue.

The local missing baryon problem gets progressively worse for lower mass galaxies. For 10¹⁰ M_☉ galaxies – not all that much smaller than the Milky Way (M_b = 7 x 10¹⁰ M_☉), the problem isn’t a factor of two or three: there are ~6 baryons missing for every one that is observed. For 10⁹ M_☉ galaxies, the deficit is an order of magnitude. For even lower mass galaxies, the difference is so large we have to abandon the linear plot lest the interesting parts for bright galaxies get scrunched into invisibility. By the time we get to small dwarf galaxies of 10⁶ M_☉, the ratio of missing-to-observed baryons approaches 100:1. It is not plausible to imagine that the CGM of dwarf galaxies explains this deficit. (And yes, we’ve looked.)

A common explanation for this variation is that low mass dark matter halos have shallower potential wells, so have a harder time holding onto their baryons. Supernova can drive material out of galaxies; these go off with the same energy regardless of the galaxy they’re in so they may be more effective at blowing baryons out of lower mass systems. There is sufficient energy (IF properly^% distributed) to completely unbind the baryons, so they might wind up in the IGM, defeating any hope of completing the checksum. This is the sort of argument that sounds clever but fails to address the real problem. The difficulty isn’t just ridding ourselves of these meddlesome baryons, it is getting rid of exactly the right amount each and every time.

As awkward as it is to realize that most of the baryons that should be in low mass halos are not in evidence, it is not difficult to imagine ways in which this might happen, like the aforementioned supernova-driven galactic winds. The more dire aspect of the problem is the fine-tuning. Galaxies of the same observed baryonic mass are always missing the same amount of baryons, whether that’s a factor of 2 or 10 or 100. If the visible parts of a dwarf galaxy are only 1% of the available baryons, you’d expect a lot of scatter. Sometimes a halo of that mass might have 2% or even 3% of its baryons condense to the parts we see. That would show up in the scatter in a way it does not: galaxies of the same circular velocity (halo mass) have the same baryonic mass every time. They don’t vary by factors of two (or more). So while we can build models that makes the baryon fraction just so, the fact that we can write a simple equation for it with practically zero scatter is profoundly uncomfortable.

An extra bit of weirdness is that in LCDM, galaxies are built hierarchically by merging small objects into large ones. This poses a teleological problem. Consider a small halo at high redshift. If it remains alone, then it it will contain a dwarf galaxy at low redshift that has a low baryon fraction. But if it mergers into a larger system, then by the current time that larger system has to have a larger baryon fraction. In effect, a low mass halo has to know where it will end up some billions of years in the future. Will it remain alone and unmerged? Better blow out all those baryons! Will it merge into a larger system? Better hang on to the right amount of baryons. Does that system merge into a still larger object? Hope it held onto even more baryons, in exactly the right amount at every step along dozens of mergers.

I can imagine all this happening in a stochastic fashion with the net result being that more massive systems wind up with a higher baryon fraction, at least on average. I cannot give credence to this process resulting in the small observed scatter. As people are always telling me, “galaxies are complicated.” Indeed, they should be – in LCDM. But in reality they’re not! They obey simple scaling laws, laws that do not follow naturally from LCDM.

The local missing baryon problem encapsulates one of the fine-tuning problems that has never been satisfactorily explained. This alone would be considered fatal for most theories. For LCDM, it is just another problem to be addressed through the eternal tweaking of models and simulations.

*Strictly speaking, M₂₀₀ refers to all mass within r₂₀₀, baryons as well as dark matter. I’m going to call it halo mass anyway, because that’s what we mean, the baryons are a small fraction of the total, and because that’s what everybody does in the literature. If we make some other choice for the definition of the mass of the halo, M_Δ, then the inferred baryon fraction of an objects scales by M₂₀₀/M_Δ. The cosmic baryon fraction does not care what choice we make, so the implicit assumption is that one asymptotes to the cosmic fraction if one gets far enough out, irrespective of what r_Δ we adopt. While this is a sensible assumption – individual objects must merge into the larger cosmos at some point – there is no guarantee that the universe cooperates. For example, the baryon fraction in galaxies declines with increasing radius, but that in galaxy clusters increases with radius. I’ve seen hints that it doesn’t really settle down to the cosmic (or any particular) value. These are only hints – considerable extrapolation is involved – so we’ll ignore this inconvenience and assume that the baryon fractions of individual objects do in fact converge to the cosmic value far enough out.

^{^}It makes the most sense if the underlying total mass is the observed baryonic mass.

^&I made a very similar fit in McGaugh et al. (2010) but didn’t publish it because there was no physics in it. Since then the field has been awash in abundance matching relations that were similarly fit sans physics. There has been much ink spilled justifying it post-facto with feedback, but I have refrained from this exercise in intellectual onanism.

^%It is common to assume in simulations that a large fraction (50 – 100%) of the energy from supernovae is returned to the surrounding gas. This process is not resolved in cosmological simulations, all the energy return happens as part of the “subgrid” physics, so the feedback efficiency is set, in practice, to make things work out as well as possible.

Observationally, most of the SN energy finds its way out along the path of least resistance where the density of the surrounding gas is smallest (“chimneys”). This process couples to the surrounding gas with only a few percent efficiency.

Extended Tully-Fisher relations

Previously I had alluded to some of the major projects I’ve been working on. One has come to fruition and can be found on the arXiv and in the Astrophysical Journal^&. It has taken many years to assemble the data in this paper, during which time the models purporting to explain some of it have evolved considerably while consistently failing to address the real problems they raise. There is a lot to explore, so it will take more than one post.

Here I start with the empirical basis: the stellar mass and baryonic Tully-Fisher relations. The Tully-Fisher relation was originally discovered as a relation between luminosity and linewidth in rotationally supported galaxies – spirals and irregulars. It immediately proved useful as an extragalactic distance indicator. As such, it was instrumental in breaking the impasse in the Hubble constant^* debate (back when it was 50 vs. 100, not 67 vs. 73), and it remains useful in this role.

Physically, the obvious interpretation was that luminosity is a proxy for stellar mass and linewidth^{*^} is a proxy for rotation speed. This is correct. Of the various rotation speeds one can define and measure, the one that works best, in terms of minimizing the scatter in the relation, is the flat rotation speed measured in the outer parts of extended rotation curves. See Stark et al. (2009) and Trachternach et al. (2009) for further examples. The scatter is basically a function of data quality.

On the mass axis, converting measured flux to luminosity to mass is a bit dicier, as we need to know the distance for the first step and the stellar mass-to-light ratio for the second. There is inevitably some intrinsic scatter in the mass-to-light ratio of a stellar population. While I don’t doubt that luminosity is a proxy for stellar mass, improving on it is hard to do: there are many instances in which simply assuming a straight mapping of light to mass can be as effective as applying fancier population models. We might^{^} finally be getting past that, so it is worth discussing a bit.

The procedure to convert starlight into stellar mass involves the construction of stellar population models that use the color(s) or spectral energy distribution of a galaxy to infer the types of stars that make the light. This is a long-argued subject; suffice it to say there are a number of points where it can go wrong. The most obvious is the IMF; the initial spectrum of masses with which stars are born. Most of the light we see from galaxies is produced by its higher mass stars, which are disproportionately bright (there is a steep scaling of stellar luminosity with mass). But most of the mass is locked up in low mass stars that contribute little to the total luminosity. So we are, in effect, using the light of the few to represent the mass of the many. That would go badly wrong if we don’t know the relative mix, i.e., the shape of the IMF. This has been the subject of much research, and over many decades has been narrowed down pretty well. While I hope that this is almost settled, the specter of the IMF lurks as a menace to all stellar mass determinations.

There is a lot else we need to know to build a stellar population model. This includes such essentials as the spectra of individual stars of each and every type and stellar evolution as a function of mass and composition including exotic phases like the asymptotic giant branch. There are a lot of places where this can go badly wrong, and sometimes^{^%} does. So I wouldn’t say we know how to do this perfectly, but we have become pretty good at it.

Converting light to mass suffices to plot the stellar mass Tully-Fisher relation. That accounts for most of the baryonic mass of high mass spirals, but it ignores the mass of the interstellar gas. This can be appreciable in lower mass systems. Indeed, the standard issue dwarf galaxy in the field is more gas than stars:

**Figure 1** from McGaugh et al. (2019): The gas and stellar masses of rotating galaxies. Blue points are galaxies in the SPARC database (Lelli et al. 2016b) and the gas rich galaxies discussed by McGaugh (2012). The location of the Milky Way is noted in red (McGaugh 2016): it is a typical bright spiral. Grey points are the sample of Bradford et al. (2015). The line is the line of equality where M_* = M_g.

With measurements of mass and rotation speed, we can construct the Tully-Fisher relation:

**Figure 4** from McGaugh et al. (2019): The stellar mass (left) and baryonic Tully-Fisher relation (right). Data from Lelli et al. (2016b) and McGaugh (2012) are shown as blue points if both axes are measured with at least 20% accuracy; less accurate data are shown in grey. The latter include cases for which the rotation curve does not extend far enough to measure V_f, in which case the last measure point is used. These cases are systematically offset to lower velocity. Inclination uncertainties and distance errors also contribute to the scatter. The better the data, the tighter the relation. *The location of the Milky Way is noted in red (you are here).*

The stellar mass Tully-Fisher relation is a good correlation by the standards of extragalactic astronomy. The majority of studies in the literature are restricted to massive^% galaxies, mostly those with M_* > 10¹⁰ M_☉ where stars dominate the baryonic mass budget so the omission of gas is not obvious. As we look to lower masses, the relation bends and the scatter increases. That this happens right where gas starts to become important to the mass budget suggests that we’re missing an important component, and voila – a nice, continuous relation that is linear in log space is restored when we plot the baryonic mass Mb = M_*+M_g. Indeed, the data are consistent with a simple power law

M_b = A \, V_f^4

with A = 50 M_☉ km^-4 s⁴. The intercept A has consistently been measured within 10% of this value over the past couple of decades. That this is an integer power law so that the intercept has real physical units is intriguing. That doesn’t happen in most astronomical scaling laws, which are usually more happenstance, like the mass-luminosity relation for main sequence stars.

Why limit ourselves to rotationally supported galaxies? Let’s plots every known type of gravitationally bound extragalactic object, from the smallest ultrafaint dwarfs to the largest clusters of galaxies. Note that I’ve flipped the axes to accommodate the huge dynamic range in baryonic mass, roughly twelve (12) orders of magnitude. This is like having gnats at one end of the scale and blue whales at the other. On that scale, a person is a regular galaxy like the Milky Way.

**Figure 3** from McGaugh et al. (2026): Extended Tully-Fisher relations plotting the flat-equivalent circular velocity of extragalactic systems as a function of stellar mass (top panel) and baryonic mass (bottom panel). Data for rotationally supported galaxies are depicted by circles; squares represent pressure supported systems. The blue circles are galaxies with directly measured distances, V_f from rotation curves, and stellar masses from WISE photometry from Duey et al. (2026, in preparation). Green circles are gas-rich galaxies (M_g > M_*; Stark et al. 2009; Trachternach et al. 2009; Bernstein-Cooper et al. 2014; McNichols et al. 2016; Iorio et al. 2017; Namumba et al. 2025; Xu et al. 2025) not already in Duey et al. (2026). Yellow points are Local Group galaxies, both spirals and dwarfs (McGaugh et al. 2021); gray squares are ultrafaint dwarfs (Lelli et al. 2017). Lensing results for early- and late-type galaxies (Mistele et al. 2024a) are shown as pink squares and magenta circles, respectively. Red squares are clusters of galaxies (Mistele et al. 2025), and purple squares are groups of galaxies (McGaugh et al. 2026). The orange line is the BTFR fit only to rotating galaxies over a more limited range (about three orders of magnitude in baryonic mass, from M_b ~ 4 x 10⁸ to 4 x 10¹¹ M_☉) by McGaugh (2005).

One improvement from twenty years ago, aside from the greater number of objects and the increase in dynamic range, is the accuracy of the mass measurements. I tried a number of prescriptions for the stellar mass-to-light ratio in McGaugh (2005), which resulted in a range of possible slopes. Now we just use the stellar mass from precise population models (Duey et al. 2025) and recover my best estimate from back then. The room to dodge the obvious conclusion about the slope of the relation by complaining about the choice of stellar mass estimator – a popular course of action back then – is gone. Another technical issue we’ve spent a lot of effort working on is how to put all these very different systems on the same scale of V_f. I won’t elaborate on this here: if you’re interested in that level of detail, you can go read the paper and references there in. If we got this wrong, it would add to the scatter in the relation, and/or create offsets between different types of data.

Both of the extended Tully-Fisher relations, that in stellar mass (top panel) and that in baryonic mass (bottom panel, the extended BTFR) are good correlations. That in baryonic mass is clearly better in the sense that it is tighter over a larger dynamic range. From small dwarf galaxies (M_b ~ 5 x 10⁵) to groups of galaxies (5 x 10¹² M_☉), the data are consistent with a single power law (M_b ~ V_f⁴) for all systems with remarkably little scatter. Outside this range, the data for both the lowest and the highest mass systems deviate from a straight line towards higher mass at a given flat velocity. I don’t put much credence in the smallest systems as I think there is little chance that their measured velocity dispersions are representative of their equilibrium gravitational potential. For all practical purposes, our knowledge runs out as we hit the regime of ultrafaint^# dwarfs. The deviations of the most massive systems, clusters of galaxies, are more difficult to dismiss.

Restricting our attention for the moment to the range where a single power law suffices to describe the data, we note that there is not much scatter in the BTFR. Some of it is from random uncertainties; these dominate most studies and lead to a lot more scatter than seen here: these data are very good. We can account for the known observational errors and subtract off their contribution to estimate the intrinsic scatter in the relation. This is the variance of the data from a perfect line. The intrinsic scatter for the best data (the WISE-SPARC sample of Duey et al. 2026) is about 0.11 dex in mass – about what we expect^$ for stellar populations. That doesn’t leave much room for other sources of scatter, so the underlying physical relation has to be very tight indeed: essentially perfect over the range 5 x 10⁵ < M_b < 5 x 10¹² M_☉.

Scatter will also occur if our mass budget is incomplete. We can see this in the transition from the stars-only relation to the BTFR. There is a lot of scatter in the stellar mass Tully-Fisher relation around 10⁷ < M_b < 10⁹ M_☉. Galaxies in this mass range are sometimes star-dominated and sometimes gas-dominated. The gas fraction is all over the place. This shows up as scatter in the stellar mass Tully-Fisher relation. That’s not real; it is a sign that we’ve missed an important mass reservoir. This is cured when we add in the gas mass, which is dominated by atomic gas (HI to spectroscopists and astronomers). That this addition removes the scatter and restores a single power law relation strongly suggests that there are no further substantial reservoirs^** of baryonic material that we’re missing.

This logic applies to other systems as well. Bright spirals do not need much correction because their baryonic mass is dominated by stars. Their stellar mass Tully-Fisher relation is pretty much already their BTFR.

Perhaps this applies to clusters of galaxies as well? There was a huge correction from stars-only to stars plus gas. The gas in this case is the hot, ionized plasma of the intracluster medium (ICM) that belongs to the cluster itself and not any individual galaxy within it. That goes most of the way to close the gap between the stars-only cluster data and the extrapolation of the BTFR fit to individual galaxies, but not all the way. So perhaps we are still missing an important baryonic mass component? It happened before – we didn’t know about the ICM for decades after Zwicky first identified the missing mass problem in clusters – so perhaps there are still more baryons to discover there.

It could also be that the apparent offset occurs because we’ve failed to put clusters on the same V_f scale as galaxies. This is not easy to do, and we’ve spent a lot of time worrying about it. I don’t think this is what’s going on, though it would make my life a lot simpler if it were. Different indicators – dynamics vs. ICM hydrostatics vs. gravitational lensing – can give somewhat different answers, but not in a way that “fixes” the problem: I see no viable path in which the offset turns out to be a simple difference in the way the depth of the gravitational potential is measured. I would love to be wrong here, but I’m not dismissing the offset for clusters as I am for ultrafaint dwarfs (which don’t do lightly).

Perhaps the extrapolation of the BTFR from individual galaxies to clusters is simply not appropriate. They’re very different kinds of systems, after all. To dig into that, we need some theoretical perspective – why does the observed power law happen? Should we expect different systems to share the same BTFR?

Theory is something I’ve studiously avoided in this post: the possibility that there are baryons that remain to be discovered in clusters can be inferred empirically. All the other data line up, so why not clusters? But unless and until these hypothetical additional baryons are discovered, that’s just one possibility. How likely this possibility seems to be diverges rapidly once we overlay a theoretical preference, which I will leave to future posts. (I did warn it would take more than one.)

^&This paper appears in ApJ volume 1001. The literature has grown quite a bit since I started contributing to it in volume 342. The Astrophysical Journal was founded in 1895. So I’ve been contributing to it for a little over a quarter of its temporal existence, but nearly twice the number of volumes have been published in that shorter time. It’s no wonder none of us can keep up.

^*Indeed, Tully & Fisher’s “preliminary estimate of the Hubble constant is H₀ = 80 km/s/Mpc” remains correct to this day, within the uncertainties (hard to estimate at the time, but roughly ±10 km/s/Mpc).

^{*^}There appears to be an irreducible intrinsic scatter in the linewidth: it is not a perfect proxy for rotation speed. Linewidths are observationally easier to obtain than resolved, extended rotation curves, so the numbers of galaxies in samples using linewidths can be very large without ever approaching the quality provided by resolved interferometric observations. Bigger samples are not necessarily better.

^{^}I emphasize might here because the community seems to have moved towards reporting stellar masses as if we observe these rather than the luminosities and colors/SEDs that the mass estimates are based upon. The latter are data – observed quantities – while stellar masses are a derived quantity that is inevitably model dependent. This doesn’t stop being true just because we decide to invest a lot of faith in our models.

^{*^}The Sloan Digital Sky Survey provides stellar masses based on models that are known to be wrong in the near infrared. Since SDSS itself is entirely optical, one might not notice. If one mixes SDSS data with near-IR data, one will get the wrong answer.

^%This is a classic selection effect. Brighter objects can be seen at a much greater distance than dim ones, so probe a much larger volume. Consequently, their raw numbers always dominate surveys even if their number density is low. Stars are a great example: most of the stars you can see at night are intrinsically luminous: bright stars that are rather far away. Mundane, low mass stars do not stand out even when nearby.

^#This isn’t for lack of observations of ultrafaint dwarfs, it’s the underlying assumptions.

^$No amount of information suffices to perfectly specify the stellar mass that produces an observed luminosity and SED (spectral energy distribution/set of colors), so one always expects at least some intrinsic scatter in the stellar mass-to-light ratio. I’ve seen estimates that range from 0.1 – 0.2 dex for near-IR colors. That’s as good as it can get as there is always some transient population (e.g., AGB stars) that produce an amount of light that depends on the star formation rate some time ago, not what we measure now. Optical colors are worse in the sense of having more intrinsic scatter, as they are more susceptible to the comings and goings of bright but short-lived stars whose numbers fluctuate with the stochastic star formation rate. Finding 0.11 dex intrinstic scatter is pretty much as good as it can get. (By dex we mean the scatter in log space.)

^**We noted this effect in the original BTFR paper to argue that it was unlikely that we were missing substantial amounts of molecular gas (H₂), which was a concern at the time. Flash forward, and we were right: the molecular gas mass is almost always a distant third behind stars and atomic gas in the baryonic mass budgets of individual galaxies. Nowadays, the concern is about the mass of baryons in the circumgalactic medium (CGM). That’s getting ahead of the story, which I’ll save for a future post. For now, it suffices to note that any baryonic mass in the CGM is far beyond the radius where the flat velocity is measured, so is not relevant to the sums here.

Yep, it’s a religion

I have been concerned for years that dark matter was morphing from legitimate science into a cold, dark religion. I have been reluctant to put it that way, because there are lots of scientists who work on dark matter that have not fallen entirely down that rabbit hole and who continue to make valuable contributions working in that context. But a recent experience reminded me that my concerns were not misplaced, and there are plenty of scientists who have fallen irredeemably down this rabbit hole. No matter what answer the future holds to be correct, many current scientists will have gone to their graves in denial of it.

Where is the boundary between science and religion? It is hard to assess where the borderline is. But it is easy to see when people are far over the line – so far over that it doesn’t really matter where exactly the line is. One can attend any conference on the subject to find people who unabashedly assert that dark matter exists without question. Not just that acceleration discrepancies have been amply demonstrated empirically, but that the only possible interpretation is dark matter. If asked whether this invisible mass is in the room with us now, they will enthusiastically^# answer yes! Since dark matter has not been detected in the laboratory, this assertion is an expression of faith – the hallmark of religion – not of an established scientific fact. What we have established is that there are discrepancies between what we see and what we get when we assume Newtonian gravity (or GR, if needed). What we don’t know is whether the cause of these discrepancies is some form of invisible mass (dark matter) or if the equations we employ are inadequate (modified gravity [or more generally, dynamics]).

Indeed, these days many people will assert that dark matter has already been detected, usually citing astronomical evidence that used to be considered too feeble to merit a Nobel prize. Funny how repeating a mantra long enough morphs an aspiration into accepted reality. Modern physics is not providing a strong falsification of the supposition that science is a social construct.

A prominent example of an observation of the sky that is frequently cited as absolutely requiring cold dark matter is the acoustic power spectrum of the cosmic microwave background. Quoting clayton from a few years ago:

the primary reason to believe in the phenomenon of cold dark matter is the very high precision with which we measure the CMB power spectrum, especially modes beyond the second acoustic peak. There is a stone-cold, qualitative, crystal clear prediction of CDM about the relative sizes of the second and third peaks that modified gravity profoundly and irredeemably gets wrong: it thinks the third peak should be relatively larger* than the second… whereas CDM thinks they should be about the same

I would accept that this were conclusive proof of dark matter if this were the unique prediction of dark matter: that there was no other way to do it, so all other approaches were indeed irredeemable. (Quite the strong language, eh?) The problem is that CDM is not the one unique was to fit these data. Skordis & Zlosnik showed that it is possible to write a modified gravity theory that also fits the CMB data:

*CMB power spectrum observed by Planck fit by AeST (Skordis & Zlosnik 2021).*

This does not prove the AeST theory of Skordis & Zlosnik is correct, but it does demonstrate that it is possible to write a modified gravity theory that does indeed do what it is frequently asserted to be impossible for a modified gravity theory to do. I’ve heard of a couple of other theories that can also do this (the relativistic Khronon theory of Blanchet and nonlocal MOND as discussed by Deffayet & Woodard), so clearly this success is not uniquely limited to cold dark matter, or even a particular modified gravity theory. The work of Skordis & Zlosnik (2021) was known and in the literature before clayton made the assertion above in late 2022, so either he wasn’t paying attention (likely) or is convinced that it is impossible so doesn’t even consider the possibility (also likely). The former just says we’re all too busy, but the latter is a mark of religious thinking: my god is the only god, thou shalt have no other hypotheses before^& me.

Many people are very impressed with the quality of the LCDM fit to the CMB. That is indeed very good, but there are enough free parameters that we were going to get a fit to any physically plausible power spectrum. If not, we’ve never been shy about making up new parameters. (Evolving dark energy, anyone? How about a running power spectrum? There’s a whole bag of possibilities!) What I’ve been more impressed with is the consistency of the fit to the CMB data with the many independent constraints on conventional cosmology. Or at least it was, until it wasn’t.

The Hubble tension has gotten steadily worse (in terms of statistical significance), and it really does not look like local measurements are to blame, nor is it the only tension. People seem to miss that it is the CMB-fitted value of the Hubble constant that has evolved over time to spoil the concordance that got us to believe in LCDM in the first place. But if the CMB is the cornerstone of your religion, all other data must inevitably be at fault and can be ignored: there is an entire community of cosmologists who choose to believe the best-fit Planck cosmology to the exclusion of all other data. It’s like the bad old days of the Hubble tension all over again, with the physics community choosing to believe the lower value of H₀ because it makes more sense for the aspects of cosmology that they care about while those in the astronomical community who actually measure H₀ find a persistently higher value.

A real tension in LCDM implies the need for new physics of the unknown variety. One doesn’t want to go there if it can be helped. I didn’t consider MOND until I was already concerned for the viability of dark matter. There are real problems for the paradigm that its more intense advocates simply deny, brush aside without real thought, or choose to remain ignorant of. When they are confronted with a problem, they are pretty creative about making stuff up on the spot. Anything to avoid having to confront the unspeakable – another hallmark of religion.

For example, cold dark matter is scale free. That’s foundational to the hypothesis. So the existence of an acceleration scale in the kinematic data is anathema to CDM. When I first pointed this contradiction out, there were a variety of assertions to the effect of “does too!” One example is provided by Kaplinghat & Turner, who claim to show “how Milgrom’s law comes about in the cold dark matter theory of structure formation.” That would, indeed, be ideal, and is a requirement for any theory to be successful.

Wee problem: they demonstrat no such thing. CDM is scale free, yet K&T claim that it explains Milgrom’s Law, which is predicated on the existence of an acceleration scale. Well, which is it? Is CDM scale free? Or does it explains the acceleration scale? We can’t have it both ways: their very premise is self-contradictory. It is absurd on its face.

The acceleration scale is defined by baryons, for which K&T have no model. To connect baryons with dark matter, they make a hand-waving argument about galaxies reaching a₀ at the edge of their disks. This is not even a concept of a model and does not begin to suffice as an explanation for many reasons, a prominent one being that low surface brightness galaxies have accelerations less than a₀ everywhere:

Centripetal acceleration curves color coded by galaxy surface brightness. Low surface brightness galaxies (blue colors) have low (sub-a₀) accelerations everywhere: there is no edge at which they reach a₀. (Adapted from McGaugh 2020.)

Milgrom pointed out this and many other shortcomings of their scenario, so I feel no need to elaborate further. Milgrom eviscerated their paper so thoroughly that the proper course of action would have been to retract it. Instead, they simply never acknowledge the criticism, and persist to this day in pushing it as some sort of valid scientific explanation. It is not; it does not withstand even mild critical scrutiny. But it doesn’t need to: it reassures the faithful that all is well. They hear what they want to hear without questioning its veracity. That’s another hallmark of religion.

I have refrained from saying these things in the past because I’m too nice. For example, a few years ago I started then abandoned the draft text below, which I simply cut & paste:

One of the things that attracted me to a career in science is the notion of objectivity. I grew up for a time in the bible belt, where people earnestly believed things that were obviously untrue, even to the eyes of a small child. On the occasions that I had the temerity to point out the obvious, the contradictions posed by facts never had an impact on their belief system. Rather, it inevitably earned me a warning that I was going to hell. No few of these people seemed to think it was their religious duty to send me there prematurely, or at least to make life on Earth a living hell.

Scientists eschew such behavior, but are also human, so often engage in it anyway. I’ve encountered it a lot. I get it; I went through the same denial, grief, and anger over the prospect of losing my good friend cold dark matter. The stages of grief never brought something back from the dead, but it has engendered a lot of blame-the-messenger.

Here’s an example, from a review by Mike Turner:

There is a lot of misinformation packed into this short paragraph.

The first clue is right there at the beginning, in red: the heading “False starts.” This is false framing, a classic tool of propagandists. It starts from the outset by asserting that the topic to be discussed is wrong at a level of knowledge so common it requires no justification. This is not the way one starts an objective discussion, much less a scientific one.

Turner then misconstrues what Milgrom did. He didn’t notice the scale a₀ in the data, for which there was scant evidence at the time. Rather, Milgrom made the obvious statement that the inference of dark matter relied on the assumption that dynamics, as encapsulated by the laws of inertia and gravity, is the same on the very different scales of galaxies as in the solar system where they were established, so we ought to consider if dynamics might change in some way. He quickly excluded a size dependence as a possibility. How he settled on acceleration is beyond the scope of this post, and not for me to say. Neither is it for Turner to say.

After a brief and incomplete description of what MOND is, Turner allows that “this one-parameter model fits all the rotation-curve data”. Even in making this admission, he chooses to call it a model rather than a theory. A model is something specific you build in the context of a theory, like a halo model in CDM. MOND is more than that.

Turner quickly moves on without contemplating any meaning that rotation curves might hold. Let’s pause to consider that.

First, I would not say that MOND fits all the rotation curve data. It fits most galaxies, but there are a minority of weird cases that are not well fit. The weird cases inevitably don’t make sense in terms of dark matter either, so on the whole I interpret this to be the usual price of dealing with astronomical data – some of it is just goofy. Setting such cases aside, I can and have fit the same data with all sorts of dark matter halo models. MOND requires fewer parameters, which is important, but the difference isn’t in the fitting. The difference is in predictive ability. I can use MOND to predict the dynamics of galaxies a priori, and have done so many times. I cannot use any flavor of dark matter theory to do the same, and it’s not for lack of trying.

The predictive power of MOND must be telling us something, even if it is something about the nature of dark matter or the process of galaxy formation. There are many papers written on this, some deep and profound, others absurd and banal. Turner cites none of them, nor displays any awareness that such work exists. I would venture to guess that is because acknowledging such work would imply that there is something to debate here, something he would apparently rather not admit.

That’s where I left off. It’s exhausting deciphering other people’s false assertions. Moreover, I just don’t like criticizing other people, no matter how richly they deserve it. (Turner has never refrained from criticizing me in ad hominem terms: on one occasion^$ he showed my picture to an audience and called me “the enemy.”) A large segment of the particle physics and cosmology community appears to think this way, and has succumbed to a scientific version of bible thumping in which you can assert any absurd thing so long as it falls within the framework of the holy LCDM. They really need to find something better to do.

I had hoped we were past this, but I heard a talk last week that was exactly in this mode. To paraphrase, the talk went

We’re sure dark matter exists. We have been sure about it for decades. In that time, we have been repeatedly proven wrong about what it is. Rather than re-think our paradigm in the face of these repeated failures, we double down yet again on the existence of this invisible, undetected mass, asserting aggressively^% that it must be true while eliding or misrepresenting the evidence that it is not. This enables us to make up a whole lot of exciting new possibilities for what the dark matter might be and conceive of ever more grandiose experiments to continue not to detect it. You must believe in dark matter!

This was not a science talk so much as an indoctrination session. It was as if I had stumbled into a revivalist tent where some hothead was preaching to the choir. This is the kind of talk that misled an entire generation into wasting their careers at the bottom of a mine shaft searching for WIMPs. At least WIMPs were a well-motivated hypothesis; this kind of talk could lead a new generation down an even greater variety of garden paths.

I am well aware that I might fall prey to this attitude myself. That’s why I set criteria by which I would change my mind: detect dark matter already, or at least provide a satisfactory explanation as to how MOND comes about. Neither of those criteria have been met. There are claims to do the latter, but so far these are just variations on models I tried and found to fail long ago. If I thought these could work, I would have said so. At the same time, I don’t see any dark matter advocates taking up the challenge to specify what would change their minds. When I ask them what could falsify dark matter, I get dumbfounded looks – the deer-in-the-headlight face one gets when the immediate response why would you even ask that? is checked by a distant memory that scientific theories are supposed to be falsifiable.

Personally, I found it humbling to encounter MOND in my own data. I too thought we understood the universe with dark matter. But who ordered this? Certainly not me: my own conventional, dark-matter based predictions were falsified. No one else working in the context of dark matter had got it right at the time either. Only Milgrom ordered this.

And what is this? There is a direct connection between what we see and what we get. Even in ignorance of MOND, the radial acceleration relation encodes a one-to-one relation between the distribution of baryons and the effective force. This is so direct that one can right down a single equation connecting the two:

g_{obs} = F(g_N/a_0)\,g_N.

The observed acceleration is a simple function of that predicted by Newton for the stars and gas that we see. There is no mention of unseen mass; everything is specified by what we can see is there.

I’ve sometimes heard astronomers complain about the reductionist ethos of physics, trying to cram all the complexity of the entire universe into a theory of everything. But here it is appropriate: there is a single, apparently universal force-law at work in galaxies. That’s telling us something profound. And yet if questioned about this, the physicists are the ones who will complain that galaxies are complicated, so they should be exempted from having to explain them. Galaxies should be complicated – in LCDM. But they’re observed not to be, in the sense that a single equation suffices to describe their kinematics. The problem isn’t that galaxies are inexplicably complicated, it’s that they should be but aren’t.

I am deeply disappointed that many scientists apparently lack the physical intuition to immediately recognize the import of the simple relation between what we see and what we get. It is the same sort of thing Newton noticed in the solar system: everything happens as if the gravitational force is proportional to the product of the masses and the inverse square of their separation. He didn’t understand why at the time, and was criticized for indulging in magical thinking: how can there be action at a distance? But that’s what the data were saying, and the same applies now. We might not yet understand the why, but that the data look as if MOND is what’s happening in this universe.

^#The framing has morphed over the years. A recent advent is that some people have started proactively asserting that invisible mass is in the room with us now in order to avoid having to answer it as a question that makes them sound like loonies.

*He means the third peak should be smaller than the second, not larger, if by “it” he means modified gravity with the baryon density expected from big bang nucleosynthesis, which was the hypothesis that correctly predicted the first-to-second peak ratio but does indeed get the second-to-third peak ratio wrong. Funny how the CMB community was able to completely ignore the successful prediction for several years, but were then suddenly all over the latter failure. The third peak falsifies the ansatz on which that particular prediction was built, not the entire concept of modified gravity. This would be like asserting that all possible forms of dark matter are excluded because we haven’t yet detected WIMPs. It is a classic failure of objectivity, which is another hallmark of faith-based argumentation: we know His name is [insert favorite deity], not [insert any other deity].

^&Or after me. Dark matter was my first hypothesis, and I’m here to tell you that True Believers do not suffer second hypotheses or those who stray from the fold. I guess that’s why so many scientists who are MOND-curious keep it on the down low. Wise, perhaps (that’s why tenure needs to be a thing), but hardly the ideal of the open and free exchange of scientific ideas.

^$I wasn’t there, but one audience member (not someone I knew) thought it was so over the top that he told me about it, sharing a link with a video. (I did not retain that link, and doubt the hosting conference website is still active.)

^%Argument weak here. RAISE VOICE!

Very thin galaxies

The stability of spiral galaxies was a foundational motivation to invoke dark matter: a thin disk of self-gravitating stars is unstable unless embedded in a dark matter halo. Modified dynamics can also stabilize galactic disks. A related test is provided by how thin such galaxies can be.

Thin galaxies exist

Spiral galaxies seen edge-on are thin. They have a typical thickness – their short-to-long axis ratio – of q ≈ 0.2. Sometimes they’re thicker, sometimes they’re thinner, but this is often what we assume when building mass models of the stellar disk of galaxies that are not seen exactly* edge-on. One can employ more elaborate estimators, but the results are not particularly sensitive to the exact thickness so long as it isn’t the limit of either razor thin (q = 0) or a spherical cow (q = 1).

Sometimes galaxies are very thin. Behold the “superthin” galaxy UGC 7321:

*UGC 7321 as seen in optical colors by the Sloan Digital Sky Survey.*

It also looks very thin in the infrared, which is the better tracer of stellar mass:

**Fig. 1** from Matthews et al (1999): *H-band (1.6 micron) image of UGC 7321. Matthews (2000) finds a near-IR axis ratio of 14:1. That’s super thin (q = 0.07)!*

UGC 7321 is very thin, would be low surface brightness if seen face-on (Matthews estimates a central B-band surface brightness of 23.4 mag arcsec^-2), has no bulge component thickening the central region, and contains roughly as much mass in gas as stars. All of these properties dispose a disk to be fragile (to perturbations like mergers and subhalo crossings) and unstable, yet there it is. There are enough similar examples to build a flat galaxy catalog, so somehow the universe has figured out a way for galaxy disks to remain thin and dynamically cold^# for the better part of a Hubble time.

We see spiral galaxies at various inclinations to our line of sight. Some will appear face on, others edge-on, and everything in between. If we observe enough of them, we can work out what the intrinsic distribution is based on the projected version we see.

First, some definitions. A 3D object has three principle axes of lengths a, b, and c. By convention, a is the longest and c the shortest. An oblate model imagines a galaxy like a frisbee: it is perfectly round seen face-on (a = b); seen edge-on q = c/a. More generally, an object can be triaxial, with a ≠ b ≠ c. In this case, a galaxy would not appear perfectly round even when seen perfectly face-on^{^} because it is intrinsically oval (with similar axis lengths a ≈ b but not exactly equal). I expect this is fairly common among dwarf Irregular galaxies.

The observed and intrinsic distribution of disk thicknesses

Benevides et al. (2025) find that the distribution of observed axis ratios q is pretty flat. This is a consequence of most galaxies being seen at some intermediate viewing angle. One can posit an intrinsic distribution, model what one would see at a bunch of random viewing angles, and iterate to extract the true distribution in nature, which they do:

**Figure 6** from Benevides et al. (2025): Comparison between the observed (projected) $q$ distribution and the inferred intrinsic 3D axis ratios for a subsample of dwarfs in the GAMA survey with $M_{⋆} = 10^{9}$ – $10^{9.5} M_{⊙}$ . The observed shapes are shown with the solid black line and are used to derive an intrinsic $c / a$ (long-dashed) and $b / a$ (dotted) distribution when projected. Solid color lines in each panel corresponds to the $q$ values obtained from the 3D model after random projections. Note that a wide distribution of $q$ values is generated by a much narrower intrinsic $c / a$ distribution. For example, the blue shaded region in the left panel shows that an observed $5 %$ of galaxies with $q < 0.2$ requires $41 %$ of galaxies to have an intrinsic $c / a < 0.2$ for an oblate model. Similarly, for a triaxal model (right panel, red curve) $43 %$ of galaxies are required to be thinner than $c / a = 0.2$ . The additional freedom of $b \neq a$ in the triaxial model helps to obtain a better fit to the projected $q$ distribution, but the changes mostly affect large $q$ values and changes little the $c / a$ frequency derived from highly elongated objects.

That we see some thin galaxies implies that they they have to be common, as most of them are not seen edge-on. For dwarf^$ galaxies of a specific mass range, which happens to include UGC 7321, Benevides et al. (2025) infer a lot^% of thin galaxies, at least 40% with q < 0.2. They also infer a little bit of triaxiality, a ≈ b.

The existence and numbers of thin dwarfs seems to come as a surprise to many astronomers. This is perhaps driven in part by theoretical expectations for dwarf galaxies to be thick: a low surface brightness disk has little self-gravity to hold stars in a narrow plane. This expectation is so strong that Benevides et al. (2025) feel compelled to provide some observed examples, as if to say look, really:

**Figure 8** – images of real galaxies from Benevides et al. (2025): Examples of $10$ highly elongated dwarf galaxies with $q \leq 0.2$ and $M_{⋆} = 10^{7}$ – $10^{8.5} M_{⊙}$ . They resemble thin edge-on disks and can be found even among the faintest dwarfs in our sample. Legends in each panel quote the stellar mass, the shape parameter $q$ , as well as the GAMA identifier. Objects are sorted by increasing $M_{⋆}$ , left to right.

As an empiricist who has spent a career looking at low mass and low surface brightness galaxies, this does not come as a surprise to me. These galaxies look normal. That’s what the universe of late type dwarf^$ galaxies looks like.

Edge-on galaxies in LCDM simulations

Thin galaxies do not occur naturally in the hierarchical mergers of LCDM (e.g., Haslbauer et al. 2022), where one would expect a steady bombardment by merging masses to mess things up. The picture above is not what galaxy-like objects in LCDM simulations look like. Scraping through a few simulations to find the flattest galaxies, Benevides et al. (2025) find only a handful of examples:

**Figure 11** – images of simulated galaxies from Benevides et al. (2025): *Edge-on projection of examples of the flattest galaxies in the TNG50 simulation, in different bins of stellar mass.*

Note that only the four images on the left here occupy the same stellar mass range as the images of reality above. These are as close as it gets. Not terrible, but also not representative^&. The fraction of galaxies this thin is a tiny fraction of the simulated population whereas they are quite common in reality. Here the two are compared: three different surveys (solid lines) vs. three different simulations (dashed lines).

**Figure 9** from Benevides et al. (2025): Fraction of galaxies that are derived to be intrinsically thinner than $c / a \leq 0.2$ as a function of stellar mass. Thick solid lines correspond to our observational samples while dashed lines are used to display the results of cosmological simulations. Different colors highlight the specific survey or simulation name, as quoted in the legend. In all observational surveys, the frequency of thin galaxies peaks for dwarfs with $M_{⋆} \sim 10^{9} M_{⊙}$ , almost doubling the frequency observed on the scale of MW-mass galaxies. Thin galaxies do not disappear at lower masses: we infer a significant fraction of dwarf galaxies with $M_{⋆} < 10^{9} M_{⊙}$ to have $c / a < 0.2$ . This is in stark contrast with the negligible production of thin dwarf galaxies in all numerical simulations analyzed here.

Note that the thinnest galaxies in nature are dwarfs of mass comparable to UGC 7321. Thin disks aren’t just for bright spirals like the Milky Way with log(M_*) > 10.5. They are also common^*$ for dwarfs with log(M_*) = 9 and even log(M_*) = 8, which are often gas dominated. In contrast, the simulations produce almost no galaxies that are thin at these lower masses.

The simulations simply do not look like reality. Again. And again, etc., etc., ad nauseam. It’s almost as if the old adage applies: garbage in, garbage out. Maybe it’s not the resolution or the implementation of the simulations that’s the problem. One could get all that right, but it wouldn’t matter if the starting assumption of a universe dominated by cold dark matter was the input garbage.

Galaxy thickness in Newton and MOND

Thick disks are not merely a product of simulations, they are endemic to Newtonian dynamics. As stars orbit around and around a galaxy’s center, they also oscillate up and down, bobbing in and out of the plane. How far up they get depends on how fast they’re going (the dynamical temperature of the stellar population) and how strong the restoring force to the plane of the disk is.

In the traditional picture of a thin spiral galaxy embedded in a quasi-spherical dark matter halo, the restoring force is provided by the stars in the disk. The dark matter halo is there to boost the radial force to make the rotation curve flat, and to stabilize the disk, for which it needs to be approximately spherical. The dark matter halo does not contribute much to the vertical restoring force because it adds little mass near the disk plane. In order to do that, the halo would have to be very squashed (small q) like the disk, in which case we revive the stability problem the halo was put there to solve.

This is why we expect low surface brightness disks to be thick. Their stars are spread thin, the surface mass density is low, so the restoring force to the disk should be small. Disks as thin as UGC 7321 shouldn’t be possible unless they are extremely cold^*# dynamically – a situation that is unlikely to persist in a cosmogony built by hierarchical merging. The simulations discussed above corroborate this expectation.

In MOND, there is no dark matter halo, but the modified force should boost the vertical restoring force as well as the radial force. One thus expects thinner disks in MOND than in Newton.

I pointed this out in McGaugh & de Blok (1998) along with pretty much everything else in the universe that people tell me I should consider without bothering to check if I’ve already considered. Here is the plot I published at the time:

**Figure 9** of McGaugh & de Blok (1998): Thickness q = z₀/h expected for disks of various central surface densities ₀. Shown along the top axis is the equivalent B-band central surface brightness ₀ for _* = 2. Parameters chosen for illustration are noted in the figure (a typical scale length h and two choices of central vertical velocity dispersion _z). Other plausible values give similar results. The solid lines are the Newtonian expectation and the dashed lines that of MOND. The Newtonian and MOND cases are similar at high surface densities but differ enormously at low surface densities. Newtonian disks become very thick at low surface brightness. In contrast, MOND disks can remain reasonably thin to low surface density.

There are many approximations that have to be made in constructing the figure above. I assumed disks were plane-parallel slabs of constant velocity dispersion, which they are not. But this suffices to illustrate the basic point, that disks should remain thinner^&% in MOND than in Newton as surface density decreases: as one sinks further into the MOND regime, there is relatively more restoring force keep disks thin. To duplicate this effect in Newton, one must invent two kinds of dark matter: a dissipational kind of dark matter that forms a dark matter disk in addition to the usual dissipationless cold dark matter that makes a quasi-spherical dark matter halo.

The idea of the plot above was to illustrate the trend of expected thickness for galaxies of different central surface brightness. One can also build a model to illustrate the expected thickness as a function of radius for a pair of galaxies, one high surface brightness (so it starts in the Newtonian regime at small radii) and one of low surface brightness (in the MOND regime everywhere). I have chosen numbers^** resembling the Milky Way for the high surface brightness galaxy model, and scaled the velocity dispersion of the low surface brightness model so it has very nearly the same thickness in the Newtonian regime. In MOND, both disks remain thin as a function of radius (they flare a lot in Newton) and the lower surface brightness disk model is thinner thanks to the relatively stronger restoring force that follows from being deeper in the MOND regime.

The thickness of two model disks, one high surface brightness (solid lines) and the other low surface brightness (dashed lines), as a function of radius. The two are similar in Newton (black), but differ in MOND *(blue)*. The restoring force to the disk is stronger in MOND, so there is less flaring with increasing radius. The low surface brightness galaxy is further in the MOND regime, leading naturally to a thinner disk.

These are not realistic disk models, but they again suffice to illustrate the point: thin disks occur naturally in MOND. Low surface brightness disks should be thick in LCDM (and in Newtonian dynamics in general), but can be as thin as UGC 7321 in MOND. I didn’t aim to make q ≈ 0.1 in the model low surface brightness disk; it just came out that way for numbers chosen to be reasonable representations of the genre.

What the distribution of thicknesses is depends on the accretion and heating history of each individual disk. I don’t claim to understand that. But the mere existence of dwarf galaxies with thin disks is a natural outcome in MOND that we once again struggle to comprehend in terms of dark matter.

*Seeing a galaxy highly inclined minimizes the inclination correction to the kinematic observations [V_rot = V_obs/sin(i)] but to build a mass model we also need to know the face-on surface density profile of the stars, the correction for which depends on 1/cos(i). So as a practical matter, the competition between sin(i) and cos(i) makes it difficult to analyze galaxies at either extreme.

^#Dynamically cold means the random motions (quantified by the velocity dispersion of stars σ) are small compared to ordered rotation (V) in the disk, something like V/σ ≈ 10. As a disk heats (higher σ) it thickens, as some of that random motion goes in the vertical direction perpendicular to the disk. Mergers heat disks because they bring kinetic energy in from random directions. Even after an object is absorbed, the splash it made is preserved in the vertical distribution of the stars which, once displaced, never settle back into a thin disk. (Gas can settle through dissipation, but point masses like stars cannot.)

^Oval distortions are a major source of systematic error in galaxy inclination estimates, especially for dwarf Irregulars. It is an asymmetric error: a galaxy with a mild oval distortion can be inferred to have an inclination (i > 0) even when seen face-on (i = 0), but it can never have an inclination more face-on (i < 0) than exactly face-on. This is one of the common drivers of claims that low mass galaxies fall off the Tully-Fisher relation. (Other common problems include a failure to account for gas mass, bad distance estimates, or not measuring V_flat.)

^$In a field with abominable terminology, what is meant by a “dwarf” galaxy is one of the worst offenders. One of my first conference contributions thirty years ago griped about the [mis]use of this term, and matters have not improved. For this particular figure, Benevides et al. (2025) define it to mean galaxies with stellar masses in the range 9 < log(M_*) < 9.5, which seems big to me, but at least it is below the mass of a typical L* spiral, which has log(M_*) ~ 10.5. For comparison, see Fig. 6 of the review of Bullock & Boylan-Kolchin (2017), who define “bright dwarfs” to have 7 < log(M_*) < 9, and go lower from there, but not higher into the regime that we’re calling dwarf right now. So what a dwarf galaxy is depends on context.

^%Note that the intrinsic distribution peaks below q = 0.2, so arguably one should perhaps adopt as typical the mode of the distribution (q ≈ 0.17).

^&Another way in which even the thin simulated objects are not representative of reality is that they are dynamically hot, as indicated by the κ_rot parameter printed with the image. This is the fraction of kinetic energy in rotation. One of the more favorable cases with κ_rot = 0.67 corresponds to V/σ = 2.5. That happens in reality, but higher values are common. Of course, thin disks and dynamical coldness go hand in hand. Since the simulations involve a lot of mergers, the fraction of kinetic energy in rotation is naturally small. So I’m not saying the simulations are wrong in what they predict given the input physics that they assume, but I am saying that this prediction does not match reality.

^*$The fraction of thin galaxies observed by DESI is slightly higher than found in the other surveys. Having looked at all these data, I am inclined to suspect the culprit is image quality: that of DESI is better. Regardless of the culprit for this small discrepancy between surveys, thin disks are much more common in reality than in the current generation of simulations.

^*#There seems to be a limit to how cold disks get, with a minimum velocity dispersion around ~7 km/s observed in face-on dwarfs when the appropriate number, according to Newton, would be more like 2 km/s, tops. I remember this number from observations in the ’80s and ’90s, along with lots of discussion then to the effect of how can it be so? but it is the new year and I’m feeling too lazy to hunt down all the citations so you get a meme instead.

^&%In an absolute sense, all other things being equal, which they’re not, disks do become thicker to lower surface brightness in both Newton and MOND. There is less restoring force for less surface mass density. It is the relative decline in restoring force and consequent thickening of the disk that is much more precipitous in Newton.

^**For the numerically curious, these models are exponential disks with surface density profiles Σ(R) = Σ₀ e^-R/R_d. Both models have a scale length R_d = 3 kpc. The HSB has Σ₀ = 866 M_☉ pc^-2; this is a good match to the Eilers et al. (2019) Milky Way disk; see McGaugh (2019). The LSB has Σ₀ = 100 M_☉ pc^-2, which corresponds roughly to what I consider the boundary of low surface brightness, a central B-band surface brightness of ~23 mag. arcsec^-2. For the velocity dispersion profile I also assume an exponential with scale length 2R_d (that’s what supposed to happen). The central velocity dispersion of the HSB is 100 km/s (an educated guess that gets us in the right ballpark) and that of the LSB is 33 km/s – the mass is down by a factor of ~9 so the velocity dispersion should be lower by a factor of $\sqrt{9}$ . (I let it be inexact so the solid and dashed Newtonian lines wouldn’t exactly overlap.)

These models are crude, being single-population (there can be multiple stellar populations each with their own velocity dispersion and vertical scale height) and lacking both a bulge and gas. The velocity dispersion profile sometimes falls with a scale length twice the disk scale length as expected, sometimes not. In the Milky Way, R_d ≈ 2.5 or 3 kpc, but the velocity dispersion falls off with a scale length that is not 5 or 6 kpc but rather 21 or 25 kpc. I have also seen the velocity dispersion profile flatten out rather than continue to fall with radius. That might itself be a hint of MOND, but there are lots of different aspects of the problem to consider.

Has dark matter been detected in the Milky Way?

If a title is posed as a question, the answer is usually

No.

There has been a little bit of noise that dark matter might have been detected near the center of the Milky Way. The chatter seems to have died down quickly, for, as usual, this claim is greatly exaggerated. Indeed, the claim isn’t even made in the actual paper so much as in the scuttlebutt^# related to it. The scientific claim that is made is that

The halo excess spectrum can be fitted by annihilation with a particle mass $m_{χ} \sim$ 0.5–0.8 TeV and cross section $⟨ σ υ ⟩ \sim$ (5–8) $\times 10^{- 25} {cm}^{3} s^{- 1}$ for the $b \bar{b}$ channel.

Totani (2025)

What the heck does that mean?

First, the “excess spectrum” refers to a portion of the gamma ray emission detected by the Fermi telescope that exceeds that from known astrophysical sources. This signal might be from a WIMP with a mass in the range of 500 – 800 GeV. That’s a bit heavier than originally anticipated (~100 GeV), but not ridiculous. The cross-section is the probability for an interaction with bottom quarks and anti-quarks. (The Higgs boson can decay into b quarks.)

Astrophysical sources at the Galactic center

There is a long-running issue with the interpretation of excess signals as dark matter. Most of the detected emission is from known astrophysical sources, hence the term “excess.” There being an excess implies that we understand all the sources. There are a lot of astrophysical sources at the Galactic center:

The center of the Milky Way as seen by the South African MeerKAT radio telescope with a close up from JWST. Image credit: NASA, ESA, CSA, STScI, SARAO, S. Crowe (UVA), J. Bally (CU), R. Fedriani (IAA-CSIC), I. Heywood (Oxford).

As you can see, the center of the Galaxy is a busy place. It is literally the busiest place in the Galaxy. Attributing any “excess” to non-baryonic dark matter is contingent on understanding all of the astrophysical sources so that they can be correctly subtracted off. Looking at the complexity of the image above, that’s a big if, which we’ll come back to later. But first, how does dark matter even come unto a discussion of emission from the Galactic center?

Indirect WIMP detection

Dark matter does not emit light – not directly, anyway. But WIMP dark matter is hypothesized to interact with Standard Model particles through the weak nuclear force, which is what provides a window to detect it in the laboratory. So how does that work? Here is the notional Feynman diagram:

Conceivable Interactions between WIMPs (X) and standard model particles (q). The diagram can be read left to right to represent WIMPs scattering off of atomic nuclei, top to bottom to represent WIMPs annihilating into standard model particles, or bottom to top to represent the production of dark matter particles in high energy collisions.

The devious brilliance of this Feynman diagram is that we don’t need to know how the interaction works. There are many possibilities, but that’s a detail – that central circle is where the magic happens; what exactly that magic is can remain TBD. All that matters is that it can happen (with some probability quantified by the interaction cross-section), so all the pathways illustrated above should be possible.

Direct detection experiments look for scattering of WIMPs off of nuclei in underground detectors. They have not seen anything. In principle, WIMPs could be created in sufficiently high-energy collisions of Standard Model particles. The LHC has more than adequate energy to produce dark matter particles in this way, but no such signal has been seen^$. The potential signal we’re discussing here is an example of indirect detection. There are a number of possibilities for this, but the most obvious^{^} one follows from WIMPs being their own anti-particles, so they occasionally meet in space and annihilate into Standard Model particles.

The most obvious product of WIMP annihilations is a pair of gamma rays, hence the potential for the Fermi gamma ray telescope to detect their decay products. Here is a simulated image of the gamma ray sky resulting from dark matter annihilations:

*Simulated image from the via Lactea II simultion (Fig. 1 of Kuhlen et al. 2008).*

The dark regions are the brightest, where the dark matter density is highest. That includes the center of the Milky Way (white circle) and also sub-halos that might contain dwarf satellite galaxies.

Since we don’t really know how the magic interaction happens, but have plenty of theoretical variations, many other things are also possible, some of which might be cosmic rays:

Fig. 3 of Topchiev et al. (2017) illustrating possible decay channels for WIMP annihilations. Gamma rays are one inevitable product, but other particles might also be produced. These would be born with energies much higher than their rest masses (~100 GeV, while electrons and positrons have masses of 0.5 MeV) so would be moving near the speed of light. In effect, dark matter could be a source of cosmic rays.

The upshot of all this is that the detection of an “excess” of unexpected but normal particles might be a sign of dark matter.

Sociology: different perspectives from different communities

A lot hinges on the confidence with which we can disentangle expected from unexpected. Once we’ve accounted for the sources we already knew about, there are always new sources to be discovered. That’s astronomy. So initially, the communal attitude was that we shouldn’t claim a signal was due to dark matter until all astrophysical signals had been thoroughly excluded. That never happened: we just kept discovering new astrophysical sources. But at some point, the communal attitude transformed into one of eager credulity. It was no longer embarrassing to make a wrong claim; instead, marginal and dubious claims were made eagerly in the hopes of claiming a Nobel prize. If it didn’t work out, oh well, just try again. And again and again and again. There is apparently no shame in claiming to see the invisible when you’re completely convinced it is there to be seen.

This switch in sociology happened in the mid to late ’00s as people calling themselves astroparticle^& physicists became numerous. These people were remarkably uninterested in astrophysics or astrophysical sources in their own right but very interested in dark matter. They were quick to claim that any and every quirk in data was a sign of dark matter. I can’t help but wonder if this behavior is inherited from the long drought in interesting particle collider results, which gradually evolved into a propensity for high energy particle phenomenologists to leap on every two-sigma blip as a sign of new physics, dumping hundreds of preprints on arXiv after each signal of marginal significance was announced. It is always a sprint to exercise the mental model-building muscles and make up some shit in the brief weeks before the signal inevitably goes away again.

Let’s review a few examples of previous indirect dark matter detection claims.

Cosmic rays from Kaluza-Klein dark matter – or not

This topic has a long and sordid history. In the late ’00s, there were numerous claims of an excess in cosmic rays – ATIC saw too many electrons for the astrophysical background, and and PAMELA saw an apparent rise in the positron fraction, perhaps indicating a source with a peak energy around 620 GeV. (If the signal is from dark matter, the rest mass of the WIMP is imprinted in the energy spectrum of its decay products.) The combination of excess electrons and extra positrons seemed fishy enough* to some to point to new physics: dark matter. There were of course more sober analyses, for example:

Fig. 3 from Aharonian et al. (2009): The energy spectrum E³ dN/dE of cosmic-ray electrons measured by H.E.S.S. and balloon experiments. Also shown are calculations for a Kaluza-Klein signature in the H.E.S.S. data with a mass of 620 GeV and a flux as determined from the ATIC data (dashed-dotted line), the background model fitted to low-energy ATIC and high-energy H.E.S.S. data (dashed line) and the sum of the two contributions (solid line). The shaded regions represent the approximate systematic error as in Fig. 2.

A few things to note about this plot: first, the data are noisy – science is hard. The ATIC and H.E.S.S. data are not really consistent – one shows an excess, the other does not. The excess is over a background model that is overly simplistic – the high energy astrophysicists I knew were shouting that the apparent signal could easily be caused by a nearby pulsar^##. The advocates for a detection in the astroparticle community simply ignored this point, or if pressed, asserted that it seemed unlikely.

One problem that arose with the dark matter interpretation was that there wasn’t enough of it. Space is big and the dark matter density is low, so it is hard to get WIMPs together to annihilate. Indeed, the expected signal scales as the square of the WIMP density, so is very sensitive to just how much dark matter is lurking about. The average density in the solar neighborhood needed to explain astronomical data is around 0.3 to 0.4 GeV cm^-3; this falls short of producing the observed signal (if real) by a factor of ~500.

An ordinary scientist might have taken this setback as a sign that he^$$ was barking up the wrong tree. Not to be discouraged, the extraordinary astroparticle physicists started talking about the “boost factor.” If there is a region of enhanced dark matter density, then the gamma ray/cosmic ray signal would be boosted, potentially by a lot given the density-squared dependence. This is not quite as crazy as it sounds, as cold dark matter halos are predicted to be lumpy: there should be lots of sub-halos within each halo (and many sub-sub halos within those, right the way down). So, what are the odds that we happen to live near enough to a subhalo that could result in the required boost factor?

The odds are small but nonzero. I saw someone at a conference in 2009 make a completely theoretical attempt to derive those odds. He took a merger tree from some simulation and calculated the chance that we’d be near one of these lumps. Then he expanded that to include a spectrum of plausible merger trees for Milky Way-mass dark matter halos. The noisier merger histories gave higher probabilities, as halos with more recent mergers tend to be lumpier, having had a fresh injection of subhalos that haven’t had time to erode away through dynamical friction into the larger central halo.

This was all very sensible sounding, in theory – and only in theory. We don’t live in any random galaxy. We live in the Milky Way and we know quite a bit about it. One of those things is that it has had a rather quiet merger history by the standards of simulated merger trees. To be sure, there have been some mergers, like the Gaia-Enceladus Sausage. But these are few and far between compared to the expectations of the simulations our theorist was considering. Moreover, we’d know if it weren’t, because mergers tend to heat the stellar disk and puff up its thickness. The spiral disk of the Milky Way is pretty cold dynamically, which places limits on how much mass has merged and when. Indeed, there is a whole subfield dedicated to the study of the thick disk, which seems to have been puffed up in an ancient event ~8 Gyr ago. Since then it has been pretty quiet, though more subtle things can and do happen.

The speaker did not mention any of that. He had a completely theoretical depiction of the probabilities unsullied by observational evidence, and was succeeding in persuading those who wanted to believe that the small probability he came up with was nevertheless reasonable. It was a mixed audience: along with the astroparticle physicists were astronomers like myself, including one of the world’s experts on the thick disk, Rosy Wyse. However, she was too polite to call this out, so after watching the discussion devolve towards accepting the unlikely as probable, I raise my hand to comment: “We know the Milky Way’s merger history isn’t as busy as the models that give a high probability.” This was met with utter incredulity. How could astronomy teach us anything about dark matter? It’s not like the evidence is 100% astronomical in nature, or… wait, it is. But no, no waiting or self-reflection was involved. It rapidly became clear that the majority of people calling themselves astroparticle physicists were ignorant of some relevant astrophysics that any astronomy grad student would be expected to know. It just wasn’t in their training or knowledge base. Consequently, it was strange and shocking^&& for them to learn about it this way. So the discussion trended towards denial, at which point Rosy spoke up to say yes, we know this. Duh. (I paraphrase.)

The interpretation of the excess cosmic ray signal as dark matter persisted a few years, but gradually cooler heads prevailed and the pulsar interpretation became widely accepted to be more plausible – as it always had been. Indeed, claiming cosmic rays were from dark matter became almost disreputable, as it richly deserved to be. So much so that when the AMS cosmic ray experiment joined the party late, it had essentially zero impact. I didn’t hear anyone advocating for it, even in whispers at workshops. It seemed more like its Nobel laureate PI just wanted a second Nobel prize, please and thank you, and even the astroparticle community felt embarrassed for him.

This didn’t preclude the same story from playing out repeatedly.

Gamma rays from WIMPs – or not

In the lead-up to a conference on dark matter hosted at Harvard in 2014, there were claims that the Fermi telescope – the same one that is again in the news – had seen a gamma ray line around 126 GeV that was attributed to dark matter. This claim had many red flags. The mass was close to the Higgs particle mass, which was kinda weird. The signal was primarily seen on the limb of the Earth, which is exactly where you’d expect garbage noise to creep in. Most telling, the Fermi team itself was not making this claim. It came from others who were analyzing their data. I am no fan of science by big teams – they tend to become bureaucratic behemoths that create red tape for their participants and often suppress internal dissent** – but one thing they do not do is leave Nobel prizes unanalyzed in their data. The Fermi team’s silence in this matter was deafening.

In short, this first claim of gamma rays from dark matter looked to be very much on the same trajectory as that from cosmic rays. So I was somewhat surprised when I saw the draft program for the Harvard conference, as it had an entire afternoon session devoted to this topic. I wrote the organizers to politely ask if they really thought this would still be a thing by the time the conference happened. One of them was an enthusiastic proponent, so yes.

Narrator: it was not.

By the time the conference happened, the related claims had all collapsed, and all the scientists invited to speak about it talked instead about something completely different, as if it had never been a thing at all.

X-rays from sterile neutrinos – or not

Later, there was the 3.5 keV line. If one squinted really hard at X-ray data, it looked like there might sorta kinda be an unidentified line. This didn’t look particularly convincing, and there are instances when new lines have been discovered in astronomical data rather than laboratory data (e.g., helium was first recognized in the spectrum of the sun, hence the name; also nebulium, which was later recognized to be ionized oxygen), so again, one needed to consider the astrophysical possibilities.

Of course, it was much more exciting to claim it was dark matter. Never mind that it was a silly energy scale, being far too low mass to be cold dark matter (people seem to have forgotten^*# the Lee-Weinberg limit, which requires m_X > 2 GeV); a few keV is rather less than a few GeV. No matter, we can always come up with an appropriate particle – in this case, sterile neutrinos^*$.

If you’ve read this far, you can see how this was going to pan out.

Gamma rays from WIMPs again, maybe maybe

So now we have a renewed claim that the Fermi excess is dark matter. Given the history related above, the reader may appreciate that my first reaction was Really? Are we doing this again?

This is different from the claim a decade ago. The claimed mass is different, and the signal is real, being part of the mess of emission from the Galactic center. The trick, as so often the case, is disentangling the dark matter signal from the plausible astrophysical sources.

Indeed, the signal is not new, only this particular fit with WIMP dark matter is. There had, of course, been discussion of all this before, but it faded out when it became clear that the Fermi signal was well explained by a population of millisecond pulsars. Astrophysics was again the more obvious interpretation^*%. Or perhaps not: I suppose if you’re part of a community convinced that dark matter exists who is spending an enormous amount of time and resources looking for a signal from dark matter and whose basic knowledge of astrophysics extends little beyond “astronomical data show dark matter exists but are messy so there’s always room to play” then maybe invoking an invisible agent from an unknown dark sector seems just as plausible as an obvious astrophysical source. Hmmm… that would have sounded crazy to me even back when, like them, I was sure that dark matter had to exist and be made of WIMPs, but here we are.

Looking around in the literature, I see there is still a somewhat active series of papers on this subject. They split between no way and maybe.

For example, Manconi et al. (2025) show that the excess signal has the same distribution on the sky as the light from old stars in the Galaxy. The distribution of stars is asymmetrical thanks to the Galactic bar, which we see at an angle somewhere around ~30 degrees, so one end is nearer to us than the other, creating a classic “X/peanut” shape seen in other edge-on barred spiral galaxies. So not only is the spectrum of the signal consistent with millisecond pulsars, it has the same distribution on the sky as the stars from which millisecond pulsars are born. So no way is this dark matter: it is clearly an astrophysical signal.

Not to be dissuaded by such a completely devastating combination of observations, Muru et al. (2025) argue that sure, the signal looks like the stars, but the dark matter could have exactly the same distribution as the stars. They cite the Hestia simulations of the Local Group as an example where this happens. Looking at those, they’re not as unrealistic as many simulations, but they appear to suffer the common affliction of too much dark mass near the center. That leaves the dark matter more room to be non-spherical so maybe be lumpy in the same was as the stars, and also provide a higher annihilation signal from the high density of dark matter. So they say maybe, calling the pulsar and dark matter interpretations “equally compelling.”

Returning to Totani’s sort-of claimed detection, he also says

This cross section is larger than the upper limits from dwarf galaxies and the canonical thermal relic value, but considering various uncertainties, especially the density profile of the MW halo, the dark matter interpretation of the 20 GeV “Fermi halo” remains feasible.

Totani (2025)

OK, so there’s a lot to break down in this one sentence.

The canonical thermal relic value is kinda central to the whole WIMP paradigm, so needing a value higher than that is a red flag reminiscent of the need for a boost factor for the cosmic ray signal. There aren’t really enough WIMPs there to do the job unless we juice their effectiveness at making gamma rays. The juice factor is an order of magnitude here: Steigman et al. (2012) give 2.2 x 10^-26 cm³s^-1 for what the thermal cross-section should be vs. the (5-8) x 10^-25 cm³s^-1 suggested by Totani (2025).

It is also worth noting that one point of Steigman’s paper is that as a well-posed hypothesis, the WIMP cross section can be calculated; it isn’t a free parameter to play with, so needing the cross-section to be larger than the upper limits from dwarf galaxies is another red flag. If this is indeed a dark matter signal from the Galactic center, then the subhalos in which dwarf satellites reside should also be visible, as in the simulated image from via Lactea above. They are not, despite having fewer messy astrophysical signals to compete with.

So “remains feasible” is doing a lot of work here. That’s the scientific way of saying “almost certainly wrong, but maybe? Because I’d really like for it to work out that way.”

The dark matter distribution in the Milky Way

One of the critical things here is the density of dark matter near the Galactic center, as the signal scales as the square of the density. Totani (2025) simply adopts the via Lactea simulation to represent the dark matter halo of the Galaxy in his calculations. This is a reasonable choice from a purely theoretical perspective, but it is not a conservative choice for the problem at hand.

What do we know empirically? The via Lactea simulation was dark matter only. There is no stellar disk, just a dark matter halo appropriate to the Milky Way. So let’s add that halo to a baryonic mass model of the Galaxy:

*The rotation curve of the via Lactea dark matter halo (red curve) combined with the Milky Way baryon distribution (light blue line). The total rotation (dark blue line) overshoots the data.*

The important part for the Galactic center signal is the region at small radius – the first kpc or two. Like most simulations, via Lactea has a cuspy central region of high dark matter density that is inconsistent with data. This overshoots the equivalent circular velocity curve from observed stellar motions. I could fix the fit above by reducing the stellar mass, but that’s not really an option in the Milky Way – we need a maximal stellar disk to explain the microlensing rate towards the center of the Galaxy. The “various uncertainties, especially the density profile of the MW halo” statement elides this inconvenient fact. Astronomical uncertainties are ever-present, but do not favor a dark matter signal here.

We can subtract the baryonic mass model from the rotation curve data to infer what the dark matter distribution needs to be. This is done in the plot below, where it is compared to the via Lactea halo:

*The empirical dark matter halo density profile of the Milky Way (blue line) compared to the via Lactea simulation (red line).*

The empirical dark matter density profile of the Milky Way does not continue to rise inwards as steeply as the simulation predicts. It shows the same proclivity for a shallower core as pretty much every other galaxy in the sky. This reduced density of dark matter in the central couple of kpc means the signal from WIMP annihilation should be much lower than calculated from the simulated distribution. Remember – the WIMP annihilation signal scales as the square of the dark matter density, so the turn-down seen at small radii in the log-log plot above is brutal. There isn’t enough dark matter there to do what it is claimed to be doing.

Cry wolf

There have now been so many claims to detect dark matter that have come and gone that it is getting to be like the fable of the boy who cried wolf. A long series of unpersuasive claims does not inspire confidence that the next will be correct. Indeed, it has the opposite effect: it is going to be really hard to take future claims seriously.

It’s almost as if this invisible dark matter stuff doesn’t exist.

Note added: Jeff Grube points out in the comments that Wang & Duan (2025) have a recent paper showing that the dark matter signal discussed here also predicts an antiproton signal that is already excluded by AMS data. While I find this unsurprising, it is an excellent check. Indeed, it would have caused me to think again had the antiproton signal been there: independent corroboration from a separate experiment is how science is supposed to work.

^#It has become a pattern for advocates of dark matter to write a speculative paper for the journals that is fairly restrained in its claims, then hype it as an actual detection to the press. It’s like “Even I think this is probably wrong, but let’s make the claim on the off chance it pans out.”

^$Ironically, a detection from a particle collider would be a non-detection. The signature of dark matter produced in a collision would be an imbalance between the mass-energy that goes into the collision and that measured in detected particles coming out of it. The mass-energy converted into WIMPs would escape the detector undetected. This is analogous to how neutrinos were first identified, though Fermi was reluctant to make up an invisible, potentially undetectable particle – a conservative value system that modern particle physicists have abandoned. The 13,000 GeV collision energy of the LHC is more than adequate to make ~100 GeV WIMPs, so the failure of this detection mode is telling.

^{^}A less obvious possibility is spontaneous decay. This would happen if WIMPs are unstable and decay with a finite half-life. The shorter the half-life, the more decays, and the stronger the resulting signal. This implies some fine-tuning in the half-life – if it is much longer than a Hubble time, then it happens so seldom it is irrelevant; if it is shorter than a Hubble time, then dark matter halos evaporate and stable galaxies don’t exist.

^&Astroparticle physics, also known as particle astrophysics, is a relatively new field. It is also an oxymoron, being a branch of particle physics with only aspirational delusions of relevance to astrophysics. I say that to be rude to people who are rude to astronomers, but it is also true. Astrophysics is the physics of objects in the sky, and as such, requires all of physics. Physics is a broad field, so some aspects are more relevant than others. When I teach a survey course, it touches on gravity, electromagnetism, atomic and molecular quantum mechanics, nuclear physics, and with the discovery of exoplanets, increasingly on geophysics. Particle physics doesn’t come up. It’s just not relevant, except where it overlaps with nuclear physics. (As poorly as particle physicists think of astronomers, they seem to think even less of nuclear physicists, whom they consider to be failed particle physicists (if only they were smart enough!) and nuclear physicists hate them in return.) This new field of astroparticle physics seems to be all about dark matter as driven by early universe cosmology, with contempt for everything that happens in the 13 billion years following the production of the relic radiation seen as the microwave background. Anything later is dismissed as mere “gastrophysics” that is too complicated to understand so cannot possibly inform fundamental physics. I guess that’s true if one chooses to remain ignorant of it.

*Fishy results can also indicate something fishy with the data. I had a conversation with an instrument builder at the time who pointed out that PAMELA had chosen to fly without a particular discriminator in order to save weight; he suggested that its absence could explain the apparent upturn in positrons.

^##There is a relatively nearby pulsar that fits the bill. It has a name: Geminga. This illustrates the human tendency to see what we’re looking for. The astroparticle community was looking for dark matter, so that’s what many of them saw in the excess cosmic ray signal. High energy astrophysicists work on neutron stars, so the obvious interpretation to them was a pulsar. One I recall being particularly scornful of the dark matter interpretation when there was an obvious astrophysical source. I also remember the astroparticle people being quick to dismiss the pulsar interpretation because it seemed unlikely to them for one to be so close but really they hadn’t thought about it before: that pulsars could do this was news to them, and many preferred to believe the dark matter interpretation.

^$$All the people barking were men.

^&&This experience opened my eyes to the existence of an entire community of scientists who were working on dark matter in somewhat gratuitous ignorance of the astronomical evidence for dark matter. To them, the existence of the stuff had already been demonstrated; the interesting thing now was to find the responsible particle. But they were clearly missing many important ingredients – another example is disk stability, a foundational reason to invoke dark matter that seems to routinely come as a surprise to particle physicists. This disconnect is part of what motivated me to develop an entire semester course on dark matter, which I’ve taught every other year since 2013 and will teach again this coming semester. The first time I taught it, I worried that there wasn’t enough material for a whole semester. Now a semester isn’t enough time.

**I had a college friend (sadly now deceased) who was part of the team that discovered the Higgs. That was big business, to the extent that there were two experiments – one to claim the detection, and another on the same beam to do the confirmation. The first experiment exceeded the arbitrary 5σ threshold to claim a 5.2σ detection, but the second only reached 4.9σ. So, in all appropriateness, he asked in a meeting if they could/should really announce a detection. A Nobel prize was on the line, so the answer was straightforward: Do you want a detection or not? (His words.)

^*#Rather than forget, some choose to fiddle ways around the Lee-Weinberg limit. This has led to the sub-genre of “light dark matter” which means lightweight, not luminous. I’d say this was the worst name ever, but the same people talk about dark photons with a straight face, so irony continues to bleed out.

^*$Ironically, a sterile neutrino has also been invoked to address problems in MOND.

^*%I was amused once to see one of the more rabid advocates of dark matter signals of this type give an entire talk hyping the various possibilities only to mention pulsars at the end with a sigh, admitting that the Fermi signal looked exactly like that.

The odd primordial halo of the Milky Way

The mass distribution of dark matter halos that we infer from observations tells us where the dark matter needs to be now. This differs form the mass distribution it had to start, as it gets altered by the process of galaxy formation. It is the primordial distribution that dark matter-only simulations predict most robustly. We* reverse-engineer the collapse of the baryons that make up the visible Galaxy to infer the primordial distribution, which turns out to be… odd.

The Gaia rotation curve and the mass of the Milky Way

As we discussed a couple of years ago, Gaia DR3 data indicate a declining rotation curve for the Milky Way. This decline becomes more steep, nearly Keplerian, in the outskirts of the Milky Way (17 < R < 30 kpc). This is may or may not be consistent with data further out, which gets hard to interpret as the LMC (at 50 kpc) perturbs orbits and the observed motions may not correspond to orbits in dynamical equilibrium. So how much do the data inform us about the gravitational potential?

Milky Way rotation curve (various data) including Gaia DR3 (multiple analyses). Also shown is the RAR model (blue line) that was fit to the terminal velocities from 3 < R < 8.2 kpc (gray points) and predates other data illustrated here.

I am skeptical of the Keplerian portion of this result (as discussed at length at the time) because other galaxies don’t do that. However, I am a big fan of listening to the data, and the people actually doing the work. Taken at face value, the Gaia data show a Keplerian decline with a total mass around 2 x 10¹¹ M_☉. If correct, this falsifies MOND.

How does dark matter fare? There is an implicit assumption made by many in the community that any failing of MOND is an automatic win for dark matter. However, it has been my experience that observations that are problematic for MOND are also problematic for dark matter. So let’s check.

Short answer: this is really weird in terms of dark matter. How weird? For starters, most recent non-Gaia dynamical analyses suggest a total mass closer to 10¹² M_☉, a factor of five higher than the Gaia value. I’m old enough to remember when the accepted mass was 2 x 10¹² M_☉, an order of magnitude higher. Yet even this larger mass is smaller than suggested by abundance matching recipes, which give more like 4 x 10¹² M_☉. So somewhere in the range 2 – 40 x 10¹¹ M_☉.

The Milky Mass has been adjusted so often, have we finally hit it?

The guy was all over the road. I had to swerve a number of times before I hit him.
Boston Driver’s Handbook (1982 edition)^&

If it sounds like we’re all over the map, that’s because we are. It is very hard to constrain the total mass of a dark matter halo. We can’t see it, nor tell where it ends. We infer, indirectly, that the edge is way out beyond the tracers we can see. Heck, even speaking of an “edge” is ill-defined. Theoretically, we expect it to taper off with the density of dark matter falling as ρ ~ r^-3, so there is no definitive edge. Somewhat arbitrarily,** we adopt the radius that encloses a density 200 times the average density of the universe as the “virial” radius. This is all completely notional, and it gets worse, as the process of forming a galaxy changes the initial mass distribution. What we observe today is the changed form, not the primordial initial condition for which the notional mass is defined.

Adiabatic compression during galaxy formation

To form a visible galaxy, baryons must dissipate and sink to the center of their parent dark matter halo. This process changes the mass distribution and alters the halo from its primordial state. In effect, the gravity of the sinking baryons drags some dark matter along^# with them.

The change to the dark matter halo is often called adiabatic compression. The actual process need not be adiabatic, but that’s how we approximate it. We’ve tested this approximation with detailed numerical simulations, and it works pretty well, at least if you do it right (there are boring debates about technique). What happens makes sense intuitively: the response of the primordial halo to the infall of baryons is to become more dense at the center. While this makes sense physically, it is problematic for LCDM as it takes an NFW halo that is already too dense at the center to be consistent with data and makes it more dense. This has been known forever, so opposing this is one thing feedback is invoked to do, which it may or may not do, depending on how it really works. Even if feedback can really turn a compressed cusp into a core, it is widely to expected to be important only in low mass galaxies where the gravitational potential well isn’t too deep. It isn’t supposed to be all that important in galaxies as massive as the Milky Way, though I’m sure that can change as needed.

There are a variety of challenges to implementing an accurate compression computation, so we usually don’t bother: the standard practice is to assume a halo model and fit it to the data. That will, at best, given a description of the current dark matter halo, not what it started as, which is our closest point of comparison with theory. To give an example of the effect, here is a Milky Way model I built a decade ago:

**Figure 13** from McGaugh (2016): Milky Way rotation curve from the data of Luna et al. (2006, red points) and McClure-Griffiths & Dickey (2007, gray points) together with a bulgeless baryonic mass model (black line). The total rotation is approximately fit (blue line) with an adiabatically compressed NFW halo (solid green line) using the procedure implemented by Sellwood & McGaugh (2005). The primordial halo before compression is shown as the dashed line. The parameters of the primordial halo are a concentration c = 7 and a mass *M₂₀₀ = 6 x 10¹¹ M_☉*. *Fitting NFW to the present halo instead gives c = 14, M₂₀₀ = 4 x 10¹¹ M_☉, so the difference is appreciable and depend on the quality and radial extent of the available data.*

The change from the green dashed line to the solid green line is the difference compression makes. That’s what happens if a baryon distribution like that of the Milky Way settles in an NFW halo. The inferred mass M₂₀₀ is lower and the concentration c higher than it originally was – and it is the original version that we should compare to the expectations of LCDM.

When I built this model, I considered several choices for the bulge/bar fraction: something reasonable, something probably too large, and something definitely too small (zero). The model above is the last case of zero bulge/bar. I show it because it is the only case for which the compression procedure worked. If there is a larger central concentration of baryons – i.e., a bulge and/or a bar – then the compression is greater. Too great, in fact: I could not obtain a fit (see also Binney & Piffl and this related discussion).

The calculation of the compression requires knowledge of the primordial halo parameters, which is what one is trying to obtain. So one has to guess an initial state, run the code, check how close it came, then iterate the initial guess. This is computationally expensive, so I was just eyeballing the fit above. Pengfei has done a lot of work to implement a method that iteratively computes the compression and rigorously fits it to data. So we decided to apply it to the newer Gaia DR3 data.

Fitting the Gaia rotation curve with adiabatically compressed halos

We need two inputs here: one, the rotation curve to fit, and two, the baryonic distribution of the Milky Way. The latter is hard to specify given our location within the Milky Way, so there are many different estimates. We tried a dozen.

Another challenge of doing this is deciding which data rotation curve data to fit. We chose to focus on the rotation curve of Jiao et al. (2023) because they made estimates of the systematic as well as random errors. The statistics of Gaia are so good it is practically impossible to fit any equilibrium model to them. There are aspects of the data for which we have to consider non-equilibrium effects (spiral arms, the bar, “snails” from external perturbations) so the usual assumptions are at best an approximation, plus there can always be systematic errors. So the approach is to believe the data, but with the uncertainty estimate of Jiao et al. (2023) that includes systematics.

For a halo model, we started with the boilerplate LCDM NFW halo^$. This doesn’t fit the data. Indeed, all attempts to fit NFW halos fail in similar ways for all of the different baryonic mass models we tried. The quasi-Keplerian part of the Gaia rotation curve simply cannot be fit: the NFW halo inevitably requires more mass further out.

Here are a few examples of the NFW fits:

**Fig. A.3** from Li et al. (2025). Fits of Galactic circular velocities using the NFW model implementing adiabatic halo contraction using 3 baryonic models. [Another 9 appear in the paper.] Data points with errors are the rotation velocities from Jiao et al. (2023), while open triangles show the data from Eilers et al. (2019), which are not fitted. [The radius ranges from 5 to 30 kpc.] Blue, purple, green and black solid lines correspond to the contributions by the stellar disk, central bar, gas (and dust if any), and compressed dark matter halo, respectively. The total contributions are shown using red solid lines. Black dashed lines are the inferred primordial halos.

LCDM as represented by NFW suffers the same failure mode as seen in MOND (plot at top): both theories overshoot the Gaia rotation curve at R > 17 kpc. This is an example of how data that are problematic for MOND are also problematic for dark matter.

We do have more freedom in the case of dark matter. So we tried a different halo model, Einasto. (For this and many other halo models, see Pengfei’s epic compendium of dark matter halo fits.) Where NFW has two parameters, a concentration c and mass M₂₀₀, Einasto has a third parameter that modulates the shape of the density profile^%. For a very specific choice of this third parameter (α = 0.17), it looks basically the same as NFW. But if we let α be free, then we can obtain a fit. Of all the baryonic models, the RAR model+compressed Einasto fits best:

**Fig. 1** from Li et al. (2025). Example of a circular velocity fit using the McGaugh19^$$ model for baryonic mass distributions. The purple, blue, and green lines represent the contributions of the bar, disk, and gas components, respectively. The solid and dashed black lines show the current and primordial dark matter halos, respectively. The solid red line indicates the total velocity profile. The black points show the latest Gaia measurements (Jiao et al. 2023), and the gray upward triangles and squares show the terminal velocities from (McClure-Griffiths & Dickey 2007, 2016), and Portail et al. (2017), respectively. The data marked with open symbols were not fit because they do not consider the systematic uncertainties.

So it is possible to obtain a fit considering adiabatic compression. But at what price? The parameters of the best-fit primordial Einasto halo shown above are c = 5.1, M₂₀₀ = 1.2 x 10¹¹ M_☉, and α = 2.75. That’s pretty far from the α = 0.17 expected in LCDM. The mass is lower than low. The concentration is also low. There are expectation values for all these quantities in LCDM, and all of them miss the mark.

**Fig. 2** from Li et al. (2025). Halo masses and concentrations of the primordial Galactic halos derived from the Gaia circular velocity fits using 12 baryonic models. The red and blue stars with errors represent the halos with and without adiabatic contraction, respectively. The predicted halo mass-concentration relation within 1 σ from simulations (Dutton & Macciò 2014) is shown as the declining band. The vertical band shows the expected range of the MW halo mass according to the abundance-
matching relation (Moster et al. 2013). The upper and lower limits are set by the highest stellar mass model plus 1 σ and the lowest stellar mass model minus 1 σ, respectively.

The expectation for mass and concentration is shown as the bands above. If the primordial halo were anything like what it should be in LCDM, the halo parameters represented by the red stars should be where the bands intersect. They’re nowhere close. The same goes for the shape parameter. The halo should have a density profile like the blue band in the plot below; instead it is more like the red band.

**Fig. 3** from Li et al. (2025). Structure of the inferred primordial and current Galactic halos, along with predictions for the cold and warm dark matter. The density profiles are scaled so that there is no need to assume or consider the masses or concentrations for these halos. The gray band indicates the range of the current halos derived from the Gaia velocity fits using the 12 baryonic models, and the red band shows their corresponding primordial halos within 1σ. The blue band presents the simulated halos with cold dark matter only (Dutton & Macciò 2014). The purple band shows the warm dark matter halos (normalized to match the primordial Galactic halo) with a core size spanning from 4.56 kpc (WDM5 in Macciò et al. 2012) to 7.0 kpc, corresponding to a particle mass of 0.05 keV and lower.

So the primordial halo of the Milky Way is pretty odd. From the perspective of LCDM, the mass is too low and the concentration is too low. The inner profile is too flat (a core rather than a cusp) and the outer profile is too steep. This outer steepness is a large part of why the mass comes out so low; there just isn’t a lot of halo out there. The characteristic density ρ_s is at least in the right ballpark, so aside from the inner slope, the outer slope, the mass, and the concentration, LCDM is doing great.

What if we ignore the naughty bits?

It is really hard for any halo model to fit the steep decline of the Gaia rotation curve at R > 17 kpc. Doing so is what makes the halo mass so small. I’m skeptical about this part of the data, so do things improve if we don’t sweat that part?

Ignoring the data at R > 17 kpc allows the mass to be larger, consistent with other dynamical determinations if not quite with abundance matching. However, the inner parts of the rotation curve still prefer a low density core. That is, something like the warm dark matter halo depicted as the purple band above rather than NFW with its dense central cusp. Or self-interacting dark matter. Or cold dark matter with just-so feedback. Or really anything that obfuscates the need to confront the dangerous question: why does MOND perform better?

*This post is based on the recently published paper by my former student Pengfei Li, who is now faculty at Nanjing University. They have a press release about it.

^&A few months after reading this in the Boston Driver’s Handbook, this exact thing happened to me.

**This goes back to BBKS in 1986 when the bedrock assumption was that the universe had Ω_m = 1, for which the virial radius was 188 times the critical density. 200 was close enough, and stuck, even though for LCDM the virial radius is more like an overdensity close to 100, which is even further out.

^#This is one of many processes that occur in simulations, which are great for examining the statistics of simulated galaxy-like objects but completely useless for modeling individual galaxies in the real universe. There may be similar objects, but one can never say “this galaxy is represented by that simulated thing.” To model a real galaxy requires a customized approach.

^$NFW halos consistently perform worse in fitting data than any other halo model, of which there are many. It has been falsified as a viable representation of reality so many times that I can’t recall them all, and yet they remain the go-to model. I think that’s partly thanks to their simplicity – it is mathematically straightforward to implement – and to the fact that is what simulations predict: LCDM halos should look like NFW. People, including scientists, often struggle to differentiate simulation from reality, so we keep flogging the dead horse.

^%The density profile of the NFW halo model asymptotes to power laws at both small and large radii: ρ → r^-1 as r → 0 and ρ → r^-3 as r → ∞. The third parameter of Einasto allows a much wider ranges of shapes.

^$$The McGaugh19 model user here is the one with a reasonable bulge/bar. This dense component can be fit in this case because we start with a halo model with a core rather than a cusp (closer to α = 1 than to the α = 0.17 of NFW/LCDM).

The baryonic sizes and masses of late type galaxies, and a bit about their angular momentum

I have always been interested in the extremes of galaxy properties, especially to low surface brightness (LSB). LSB galaxies are hard to find and observe, so they present an evergreen opportunity for discovery. They also expose theories built to explain bright galaxies to novel tests.

Fundamental properties of galaxies include their size and luminosity. The luminosity L is a proxy for stellar mass while the size R is one measure of how those stars are distributed. The surface brightness S is the luminosity spread over an area 2πR², so S = L/2πR². One may define different types of radii and corresponding surface brightnesses, but whatever the choice, only two of these three quantities are independent. At a minimum, one needs at least two parameters to quantitatively describe a galaxy, as galaxies of the same luminosity* can have their light spread over different areas.

Being composed of tens of billions of stars, it ought to take a lot more than two parameters to describe a galaxy. A useful shorthand for galaxy appearance is provided by morphological types. I’m not a huge fan (they’re not quantitative and don’t relate simply to quantitative measures), but saying a spiral galaxy is an Sa or an Sc does provide a great shorthand for evoking their appearance.

**Fig. 9** from Buta (2011): Examples of spiral galaxy morphologies Sa, Sb, Sc, Sd, and Sm (from left to right). The corresponding Hubble stages are T = 1, 3, 5, 7, 9. As one proceeds from early (Sa) to late (Sm) types, the bulge component becomes less prominent and the winding of spiral arms less tight until the appearance becomes irregular (T ≥ 9).

If we step back from the detailed difference in the appearance of the spiral arms of Sb and Sbc and Sc galaxies, there are some interesting physical distinctions between early type spirals (Sa – Sc) and later types (Sd on through Irr). These are all late type galaxies (LTGs) that are thin, rotationally supported disks of stars and gas. I’m not going to talk about pressure supported early type galaxies (ETGs) here, just early (Sa – Sc) and late (Sd – Irr) LTGs⁺.

My colleague Jim Schombert pointed out in 2006 that LTGs segregated into two sequences in size and stellar mass if not in gas mass. So early LTGs are more compact for their mass and late LTGs more diffuse.

**Fig. 2** from Schombert (2006): Stellar and gas mass vs. optical scale length (α) in kiloparsecs. The open symbols are from the LSB dwarf catalog, crosses show disks from de Jong (1996), and asterisks show Sc galaxies from Courteau (1996). The separation of dwarfs and disks into two sequences is evident in the left panel. Sm class galaxies from de Jong are shown as filled symbols and are typically found on the dwarf sequence. Biweight fits to each sample are shown as dashed lines.

Another distinction is in the gas fraction. This correlates with surface brightness, and early and late LTGs tend to be either star-dominated or gas dominated.

Gas fraction as a function of effective surface brightness (stellar surface density). Red points are early type spirals (T < 5); blue points are later type (T > 6) spirals and irregular galaxies. Orange points are Sc (T = 5) spirals, which reside mostly with the early types. Green points are Scd (T = 6) galaxies, which reside mostly with the later types. There is a steady trend of increasing gas fraction with decreasing surface brightness. Early type spirals are star-dominated, high surface brightness galaxies; late types are gas-rich, low surface brightness galaxies^!.

There are early LTGs with such low gas fractions that their current star formation rate risks using up all the available gas in just a Gyr or so. This seems a short time for a galaxy that has been forming stars for the past 13 Gyr, which has led to a whole subfield obsessed with how such galaxies may be resupplied with fresh gas from the IGM to keep things going. That may happen, and I’m sure it does at some level, but I think the concern with this being a terrible timing coincidence is misplaced, as there are lots of late LTGs with ample gas. The median gas fraction is 2/3 for the late LTGs above: they have twice as much gas as stars, and they can sustain their observed star formation rates for tens of Gyr, sometimes hundreds of Gyr. There are plenty of galaxies that need no injection of fresh gas. Similarly, there are genuine ETGs that are “red and dead”: some galaxies do stop forming stars. So perhaps those with short depletion times are just weary giants near the end of the road?

That paragraph may cause an existential crisis for an entire subfield, but I didn’t come here to talk about star formation winding down. No, I wanted to highlight an update to the size-mass relation provided by student Zichen Hua. No surprise, Schombert was right. Here is the new size-mass relation for gas, stars, and baryons (considering both stars and gas together):

**Fig. 2** from Hua et al. (2025): The mass-size relations of SPARC galaxies in gas (left), stars (middle), and both together (baryons, right). Data points are color-coded by the gas fraction: red means gas poor, blue gas rich. The three panels span the same dynamic range on both axes. Two sequences are evident in the stellar and baryonic mass-size relations.

The half-mass radius R₅₀ is a distinct quantity for each component: gas alone, stars alone, or both^$ together. All the galaxies are on the same sequence if we only look at the gas: the surface density of atomic gas is similar in all of them^#. When we look at the stars, there are two clear groups: the star-dominated early LTGs (red points) and the gas-rich late LTGs (blue points). This difference in the stars persists when translated into baryons – since the stars dominate the baryonic mass budget of the early LTGs, the gas makes little difference to their baryonic size. The opposite is the case for the gas rich galaxies, and the scatter is reduced as gas is included in the baryonic size. There are some intermediate cases, but the gap between between distinct groups is real, as best we can tell. Certainly it has become more clear than it was in 2006 when Schombert had only optical data (the near-IR helps for getting at stellar mass), and the two sequences are more clearly defined in baryons than in stars alone.

A related result is that of Tully & Verheijen (1997), who found a bimodality in surface brightness. Remember above, only two of luminosity, size, and surface brightness are independent. So a bimodality in surface brightness would be two parallel lines cutting diagonally across the size-stellar mass plane. That’s pretty much what we see in the two sequences.

Full disclosure: I was the referee of Tully & Verheijen (1997), and I didn’t want to believe it. I did not see such an effect in the data available to me, and they were looking at the Ursa Major cluster, which I suspected might be a special environment. However, they were the first to have near-IR data, something I did not have at the time. Moreover, they showed that the segregation into different groups was not apparent with optical data; it only emerged in the near-IR K-band. I had no data to contradict that, so while it seemed strange to me, I recommended the paper for publication. Turns out they were right^.

I do not understand why there are two sequences. Tully & Verheijen (1997) suggest that there are different modes of disk stability, so galaxies fall into one or the other. That seems reasonable in principle, but I don’t grasp how it works. I am not alone. There is an enormous literature on disk stability; it is largely focused on bars and spirals in star-dominated systems. It’s a fascinating and complex subject that people have been arguing about for decades. Rather less has been done for gas-dominated systems.

It is straightforward to simulate stellar dynamics. Not easy, mind you, but at least stars are very well approximated as point masses on the scale of galaxies. Not so the gas, for which one needs a hydro code. These are notoriously messy. One persistent result is that systems tend to become unstable when there is too much gas. And yet, nature seems to have figured it out as we see lots of gas rich galaxies. Their morphology is different, so there seems to be an interplay between surface brightness, gas content, and disk stability. Perhaps Tully & Verheijen’s supposition about stability modes is related to the gas content.

That brings us to other scaling relations. Whatever is going on to segregate galaxies in the size-mass plane is not doing it in the velocity-mass plane (the BTFR). There should be a dependence on radius or surface brightness along the BTFR. There really should be, but there is not. Another, related scaling relation is that of specific angular momentum with mass. These three are shown together here:

**Fig. 5** from Hua et al. (2025): Scaling relations of galaxy disks: the baryonic Tully-Fisher relation (left panel), the baryonic mass-size relation (middle panel), and the baryonic angular-momentum relation (right panel). The *crosses* and *circles* are early and late type spirals, respectively, color-coded by the effective baryonic surface density. The blue and gold solid lines are the best-fit lines for LSD galaxies and HSD galaxies, respectively. The dashed black line in the right panel shows the best-fit line considering all 147 galaxies together.

As with luminosity, size, and surface brightness, only two of these three plots are independent. Velocity and size specify the specific angular momentum j ~ V*R, so the right panel is essentially a convolution of the left and middle panels. There is very little scatter in the BTFR (left) but a lot in size-mass (middle), so you wind up with something intermediary in the j-M plane (right).

I hope that sounds trivial, because it is. It hardly warrants mention, in my opinion. However, my opinion on this point is not widely shared; there are a lot of people who make a lot of hay about the specific angular momentum of disk galaxies.

In principle this attention to j-M makes sense. Angular momentum is a conserved quantity, after all. Real physics, not just astronomical scaling relations. Moreover, one can quantify the angular momentum acquired by dark matter halos in simulations. The spin parameter thus defined seems to do a good job of explaining the size-mass relation, which appears to follow if angular momentum is conserved. In this picture, LSB galaxies form in halos with large initial spin, so they end up spread out, while HSB galaxies form in low spin halos. How far the baryons collapse just depends on that initial angular momentum.

This is one of those compelling idea that nature declined to implement. First, an objection in principle: this hinges on the baryons conserving their share of the angular momentum. The angular momentum of the whole must be conserved (absent external torques), but the whole includes both baryons and dark matter. These two components are free to exchange angular momentum with each other, and there is every reason to expect they do so. In that case, the angular momentum of the baryons need not appear to be conserved: some could be acquired from or lost to the dark matter, where it becomes invisible. As baryons collapse to form a visible galaxy at the center of a dark matter halo, it is easy for them to lose angular momentum to the dark matter. That’s exactly what happens simulations, even in the first simulations to look into this: it was an eye-opening result to me in 1993, and yet in 2025 people still pretend like baryon-only angular momentum conservation has something to do with galaxy formation. They tend to argue that it gets the size-mass relation right, so it must work out, no?

Does it though? I’ve written about this before, and the answer is not really. Models that predict about the right size-mass relation predict the wrong Tully-Fisher relation, and vice-versa. You can squeeze the toothpaste tube on one end to make it flat, but the bulge simply moves somewhere else. So I find the apparent agreement between disk sizes and angular momenta to be more illusory than compelling. Heck, even Frank van den Bosch agrees with me that you can’t get a realistic disk from the initial distribution of angular momentum j(r). Frank built his career^& contradicting me, so if we agree about something y’all should take note.

That was all before the current results. The distribution of initial spins is a continuous function that is lognormal: it has a peak and a width. Translating that^% into the size distribution predicts a single size-mass relation with finite scatter. It does not predict two distinct families for gas-poor and gas-rich disk galaxies. The new results are completely at odds with this picture.

That might not be apparent to advocates of the spin-size interpretation. If one looks at the j-M (right) panel, it seems like a pretty good correlation by the standards of extragalactic astronomy. So if you’re thinking in those terms, all may seem well, and the little kink between families is no big deal. Those are the wrong terms to think in. The correlation in j-M is good because that in the BTFR plane is great. The BTFR is the more fundamental relation; j is not fundamental, it’s just the BTFR diluted by the messier size-mass relation. That’s it.

One can work out the prediction for angular momentum in MOND. That’s the dotted line in the j-M panel above. MOND gets the angular momentum right: the observed trend follows the dotted line. It is possible for galaxies to have more or less angular momentum at a given mass, so there is some scatter, as observed. Again, that’s it.

*A common assertion I frequently hear, mostly from theorists, is that mass is the only galaxy parameter that matters. This is wrong now just as it was thirty years ago. I never cease to be amazed at the extent to which a simple, compelling concept outweighs actual evidence.

⁺So there are “early” late types. I suppose the earliest of LTGs is the S0, which is also the latest of ETGs. There are only a few S0’s in the SPARC sample, so I’m just gonna lump them in with the other early LTGs. Morphology is reproducible – experts can train others who subsequently perform as well as the experts – but it’s not like all experts agree about all classifications, and S0 is the most confounding designation.

^$I recall giving a talk about LSB galaxies at UC Santa Cruz in the ’90s. In the discussion afterwards, Sandy Faber asked whether, instead of optical scale lengths, we should be talking about baryonic scale lengths instead. Both the audience and I were like

wut?

All that we had then were measures of the scale size of the stars in optical light, so the phrasing didn’t even compute at the time. But of course she was right, and R_50,bar above is such a measure.

^#A result I recall from my thesis is that the dynamic range in stellar surface brightness was huge while that in the gas surface density was small: a factor of 1,000 in Σ_* might correspond to a factor of 2 or maybe 3 in Σ_g.

^It happens a lot in astronomy that a seemingly unlikely result later proves to be correct. That’s why we need to be open-minded as referees. Today’s blasphemy is tomorrow’s obvious truth.

^&Career advice for grad students: find some paper of mine from 15 – 20 years ago. Update it with a pro-LCDM spin. You’ll go far.

^%There was a time when the narrow distribution of spins in simulations was alleged to explain the narrow distribution of surface brightness known as Freeman’s Law. This wasn’t right. Doing the actual math, the “narrow” spin distribution maps to a broad surface brightness distribution – not a single value, nor a bimodal distribution. Here is an example spin distribution:

*The spin distribution for galaxy and cluster mass dark matter halos from Eisenstein & Loeb (1995).*

Rather than a narrow Freeman’s Law, there should be galaxies of all different surface brightness, over a broad range. The spin distribution above maps into the dashed line below:

**Fig. 8** from McGaugh & de Blok (1998): Surface brightness distribution (data points from various sources) together with the distribution expected from the variation of spin parameters. Dotted line: Efstathiou & Jones (1979). Dashed line: Eisenstein & Loeb (1995). Theory predicts a very broad distribution with curvature inconsistent with observations. Worse, a cutoff must be inserted by hand to reconcile the high surface brightness end of the distribution.

Mapping spin to surface brightness predicts galaxies that are well above the Freeman value. Such very HSB galaxies do not exist, at least not as disks, so one had to insert a cut off by hand in dark matter models that would otherwise support such galaxies.

In contrast, an upper limit to galaxy surface brightness arises naturally in MOND. Only disks with surface density less than a₀/G are stable.

^!OK, I guess an obvious question is how surface brightness correlates with morphological type. I didn’t want to get into how the morphological T-type does or doesn’t correlate with quantitative measures, but here is this one example. Yes, there’s a correlation, but there is also a lot of meaningless scatter. LSBs tend to be late LTGs, but can be found among the early LTGs, and vice-versa for HSBs. Despite the clear trend, a galaxy with a central baryonic surface density of 1,000 M_☉ pc^-2 could be in any bin of morphology.

The central surface density of baryons as a function of morphological type. Colors have the same meaning as in the gas fraction plot. (This measure of surface density is different from Σ_50,bar used by Hua et al. above (see McGaugh 2006), but the details are irrelevant here.)

This messy correlation is par for the course for plots involving morphology, and for extragalactic astronomy in general. This is why the small scatter in the BTFR and the RAR is so amazing – that never happens!

Non-equilibrium dynamics in galaxies that appear to have lots of dark matter: ultrafaint dwarfs

This is a long post. It started focused on ultrafaint dwarfs, but can’t avoid more general issues. In order to diagnose non-equilibrium effects, we have to have some expectation for what equilibrium would be. The Tully-Fisher relation is a useful empirical touchstone for that. How the Tully-Fisher relation comes about is itself theory-dependent. These issues are intertwined, so in addition to discussing the ultrafaints, I also review some of the many predictions for Tully-Fisher, and how our theoretical expectation for it has evolved (or not) over time.

In the last post, we discussed how non-equilibrium dynamics might make a galaxy look like it had less dark matter than similar galaxies. That pendulum swings both ways: sometimes non-equilibrium effects might stir up the velocity dispersion above what it would nominally be. Some galaxies where this might be relevant are the so-called ultrafaint dwarfs (not to be confused with ultradiffuse galaxies, which are themselves often dwarfs). I’ve talked about these before, but more keep being discovered, so an update seems timely.

Galaxies and ultrafaint dwarfs

It’s a big universe, so there’s a lot of awkward terminology, and the definition of an ultrafaint dwarf is somewhat debatable. Most often I see them defined as having an absolute magnitude limit M_V > -8, which corresponds to a luminosity less than 100,000 suns. I’ve also seen attempts at something more physical, like being a “fossil” whose star formation was entirely before cosmic reionization, which ended way back at z ~ 6 so all the stars would be at least^{*&^#} 12.5 Gyr old. While such physics-based definitions are appealing, these are often tied up with theoretical projection: the UV photons that reionized the universe should have evaporated the gas in small dark matter halos, so these tiny galaxies can only be fossils from before that time. This thinking pervades much of the literature despite it being obviously wrong, as counterexamples^! exist. For example, Leo P is practically an ultrafaint dwarf by luminosity, but has ample gas (so a larger baryonic mass) and is currently forming stars.

A luminosity-based definition is good enough for us here; I don’t really care exactly where we make the cut. Note that ultrafaint is an appropriate moniker: a luminosity of 10⁵ L_☉ is tiny by galaxy standards. This is a low-grade globular cluster, and some ultrafaints are only a few hundred solar luminosities, which is barely even^# a star cluster. At this level, one has to worry about stochastic effects in stellar evolution. If there are only a handful of stars, the luminosity of the entire system changes markedly as a single star evolves up the red giant branch. Consequently, our mapping from observed quantities to stellar mass is extremely dodgy. For consistency, to compare with brighter dwarfs, I’ve adopted the same boilerplate M_*/L_V = 2 M_☉/L_☉. That makes for a fair comparison luminosity-to-luminosity, but the uncertainty in the actual stellar mass is ginormous.

It gets worse, as the ultrafaints that we know about so far are all very nearby satellites of the Milky Way. They are not discovered in the same way as other galaxies, where one plainly sees a galaxy on survey plates. For example, NGC 7757:

A faint galaxy in the night sky, surrounded by numerous distant star-like points. — *The spiral galaxy NGC 7757 as seen on plates of the Palomar Sky Survey.*

While bright, high surface brightness galaxies like NGC 7757 are easy to see, lower surface brightness galaxies are not. However, they can usually still be seen, if you know where to look:

A faint galaxy amidst numerous distant stars in a dark sky, illustrating the challenges of observing low surface brightness galaxies. — *UGC 1230 as seen on the Palomar Sky Survey. It’s in the middle.*

I like to use this pair as an illustration, as they’re about the same distance from us and about the same angular size on the sky – at least, once you crank up the gain for the low surface brightness UGC 1230:

Comparison of two astronomical images: the left side shows a spiral galaxy with visible structure and brightness, while the right side features a lower surface brightness galaxy, appearing more diffuse and less distinct. — Zoom in on deep CCD images of NGC 7757 (left) and UGC 1230 (right) with the contrast of the latter enhanced. The chief difference between the two is surface brightness – how spread out their stars are. They have a comparable physical diameter, they both have star forming regions that appear as knots in their spiral arms, etc. These galaxies are clearly distinct from the emptiness of the cosmic void around them, being examples of giant stellar systems that gave rise to the term “island universe.”

In contrast to objects that are obvious on the sky as independent island universes, ultrafaint dwarfs are often invisible to the eye. They are recognized as a subset of stars near each other on the sky that also share the same distance and direction of motion in a field that might otherwise be crowded with miscellaneous, unrelated stars. For example, here is Leo IV:

Wide field image of the Ultra-Faint Dwarf Galaxy Leo IV, featuring a zoomed-in view of its faint structure surrounded by numerous background stars and galaxies. — *The ultrafaint dwarf Leo IV as identified by the Sloan Digital Sky Survey and the Hubble Space Telescope.*

See it?

I don’t. I do see a number of background galaxies, including an edge-on spiral near the center of the square. Those are not the ultrafaint dwarf, which is some subset of the stars in this image. To decide which ones are potentially a part of such a dwarf, one examines the color magnitude diagram of all the stars to identify those that are consistent with being at the same distance, and assigns membership in a probabilistic way. It helps if one can also obtain radial velocities and/or proper motions for the stars to see which hang together – more or less – in phase space.

Part of the trick here is deciding what counts as hanging together. A strong argument in favor of these things residing in dark matter halos is that the velocity differences between the apparently-associated stars are too great for them to remain together for any length of time otherwise. This is essentially the same situation that confronted Zwicky in his observations of galaxies in clusters in the 1930s. Here are these objects that appear together in the sky, but they should fly apart unless bound together by some additional, unseen force. But perhaps some of these ultrafaints are not hanging together; they may be in the process of coming apart. Indeed, they may have so few stars because they are well down the path of dissolution.

Since one cannot see an ultrafaint dwarf in the same way as an island universe, I’ve heard people suggest that being bound by a dark matter halo be included in the definition of a galaxy. I see where they’re coming from, but find it unworkable. I know a galaxy when I see one. As did Hubble, as did thousands of other observers since, as can you when you look at the pictures above. It is absurd to make the definition of an object that is readily identifiable by visual inspection be contingent on the inferred presence of invisible stuff.

So are ultrafaints even galaxies? Yes and no. Some of the probabilistic identifications may be mere coincidences, not real objects. However, they can’t all be fakes, and I think that if you put them in the middle of intergalactic space, we would recognize them as galaxies – provided we could detect them at all. At present we can’t, but hopefully that situation will improve with the Rubin Observatory. In the meantime, what we have to work with are these fragmentary systems deep in the potential well of the seventy billion solar mass cosmic gorilla that is the Milky Way. We have to be cognizant that they might have gotten knocked around, as we can see in more massive systems like the Sagittarius dwarf. Of course, if they’ve gotten knocked around too much, then they shouldn’t be there at all. So how do these systems evolve under the influence of a comic gorilla?

Let’s start by looking at the size-mass diagram, as we did before. Ultrafaint dwarfs extend this relation to much lower mass, and also to rather small sizes – some approaching those of star clusters. They approximately follow a line of constant surface density, ~0.1 M_☉ pc^-2 (dotted line)..

A graph illustrating the size-mass relationship of galaxies, plotting effective radius (Re) against stellar mass (M*). Black squares represent data points of larger galaxies, while green squares indicate ultrafaint dwarfs. The dotted line suggests a correlation between size and mass. — *The size and stellar mass of Local Group dwarfs* as discussed previously, with the addition of ultrafaint dwarfs^$ (small gray squares).

This looks weird to me. All other types of galaxies scatter all over the place in this diagram. The ultrafaints are unique in following a tight line in the size-mass plane, and one that follows a line of constant surface brightness. Every element of my observational experience screams that this is likely to be an artifact. Given how these “galaxies” are identified as the loose association of a handful of stars, it is easy to imagine that this trend might be an artifact of how we define the characteristic size of a system that is essentially invisible. It might also arise for physical reasons to do with the cosmic gorilla; i.e., it is a consequence of dynamical evolution. So maybe this correlation is real, but the warning lights that it is not are flashing red.

The Baryonic Tully-Fisher relation as a baseline

Ideally, we would measure accelerations to test theories, particularly MOND. Here, we would need to use the size to estimate the acceleration, but I straight up don’t believe these sizes are physically meaningful. The stellar mass, dodgy as it is, seems robust by comparison. So we’ll proceed as if we know that much – which we don’t, really – but let’s at least try.

With the stellar mass (there is no gas in these things), we are halfway to constructing the baryonic Tully-Fisher relation (BTFR), which is the simplest test of the dynamics that we can make with the available data. The other quantity we need is the characteristic circular speed of the gravitational potential. For rotating galaxies, that is the flat rotation speed, V_f. For pressure supported dwarfs, what is usually measured is the velocity dispersion σ. We’ve previously established that for brighter dwarfs in the Local Group, a decent approximation is V_f = 2σ, so we’ll start by assuming that this should apply to the ultrafaints as well. This allows us to plot the BTFR:

A scatter plot showing the relationship between velocity (Vf in km/s) and baryonic mass (Mb in solar masses), with data points represented by different shapes and colors for various galaxy types. — The baryonic mass and characteristic circular speeds of both rotationally supported galaxies (circles) and pressure supported dwarfs (squares). The colored points follow the same baryonic Tully-Fisher relation (BTFR), but the data for low mass ultrafaint dwarfs (gray squares) *flattens ou*t, *having nearly the same characteristic speed over several decades in mass.*

The BTFR is an emprical relation of the form V_f ~ M_b^1/4 over about six decades in mass. Somewhere around the ultrafaint scale, this no longer appears to hold, with the observed velocity flattening out to become approximately constant for these lowest mass galaxies. I’m not sure this is real, as there many practical caveats to interpreting the observations. Measuring stellar velocities is straightforward but demanding at this level of accuracy. There are many potential systematics, pretty much all of which cause the intrinsic velocity dispersion to be overestimated. For example, observations made with multislit masks tend to return larger dispersions than observations of the same object with fibers. That’s likely because it is hard to build a mask so well that all of the stars perfectly hit the centers of the slitlets assigned to them; offsets within the slit shift the spectrum in a way that artificially adds to the apparent velocity dispersion. Fibers are less efficient in their throughput, but have the virtue of blending the input light in a way that precludes this particular systematic. Another concern is physical – some of the stars that are observed are presumably binaries, and some of the velocity will be due to motion within the binary pair and nothing to do with the gravitational potential of the larger system. This can be addressed with repeated observations to see if some velocities change, but it is hard to do that for each and every system, especially when it is way more fun to discover and explore new systems than follow up on the same one over and over and over again.

There are lots of other things that can go wrong. At some level, some of them probably do – that’s the nature of observational astronomy^&. While it seems likely that some of the velocity dispersions are systematically overestimated, it seems unlikely that all of them are. Let’s proceed as if the bulk of the data is telling us something, even if we treat individual objects with suspicion.

MOND

MOND makes a clear prediction for the BTFR of isolated galaxies: the baryonic mass goes as the fourth power of the flat rotation speed. Contrary to Newtonian expectation, this holds irrespective of surface brightness, which is what attracted my attention to the theory in the first place. So how does it do here?

A graph depicting the relationship between the flat rotation speed (Vf in km/s) and the baryonic mass (Mb in solar masses), showing data points for various galaxies, including ultrafaint dwarfs highlighted with unique markers. — *The same data as above with the addition of the line predicted by MOND (Milgrom 1983).*

Low surface density means low acceleration, so low surface brightness galaxies would make great tests of MOND if they were isolated. Oh, right – they already did. Repeatedly. MOND also correctly predicted the velocities of low mass, gas-rich dwarfs that were unknown when the prediction was made. These are highly nontrivial successes of the theory.

The ultrafaints we’re discussing here are not isolated, so they do not provide the clean tests that isolated galaxies provide. However, galaxies subject to external fields should have low velocities relative to the BTFR, while the ultrafaints have higher velocities. They’re on the wrong side of the relation! Taking this at face value (i.e., assuming equilibrium), MOND fails here.

Whenever MOND has a problem, it is widely seen as a success of dark matter. In my experience, this is rarely true: observations that are problematic for MOND usually don’t make sense in terms of dark matter either. For each observational test we also have to check how LCDM fares.

LCDM

How LCDM fares is often hard to judge because its predictions for the same phenomena are not always clear. Different people predict different things for the same theory. There have been lots of LCDM-based predictions made for both dwarf satellite galaxies and the Tully-Fisher relation. Too many, in fact – it is a practical impossibility to examine them all. Nevertheless, some common themes emerge if we look at enough examples.

The halo mass-velocity relation

The most basic prediction of LCDM is that the mass of a dark matter halo scales with the cube of the circular velocity of a test particle at the virial radius (conventionally taken to be the radius R₂₀₀ that encompasses an average density 200 times the critical density of the universe. If that sounds like gobbledygook to you, just read “halo” for “200”): M₂₀₀ ~ V₂₀₀³. This is a very basic prediction that everyone seems to agree to.

There is a tiny problem with testing this prediction: it refers to the dark matter halo that we cannot see. In order to test it, we have to introduce some scaling factors to relate the dark to the light. Specifically, M_b = f_d M₂₀₀ and V_f = f_v V₂₀₀, where f_d is the observed fraction of mass in baryons and f_v relates the observed flat velocity to the circular speed of our notional test particle at the virial radius. The obvious assumptions to make are that f_d is a constant (perhaps as much as but not more than the cosmic baryon fraction of 16%) and f_v is close to untiy. The latter requirement stems from the need for dark matter to explain the amplitude of the flat rotation speed, but f_v could be slightly different; plausible values range from 0.9 < f_v < 1.4. Values large than one indicate a rotation curve that declines before the virial radius is reached, which is the natural expectation for NFW halos.

Here is a worked example with f_d = 0.025 and f_v = 1:

A graph depicting the relationship between the flat rotation speed (Vf) in kilometers per second and the baryonic mass (Mb) in solar masses. The data points are shown with various markers, including gray squares, green squares, and blue circles, each representing different galaxy types, along with error bars. A solid gray line indicates a trend, while a dotted line marks a theoretical lower bound. — The same data as above with the addition of the nominal prediction of LCDM. The dotted line is the halo mass-circular velocity relation; the gray band is a simple model with f_d = 0.025 and f_v = 1 (e.g., *Mo, Mao, & White 1998)*.

I have illustrated the model with a fat grey line because f_d = 0.025 is an arbitrary choice^* I made to match the data. It could be more, it could be less. The detected baryon fraction can be anythings up to or less than the cosmic value, f_d < fb = 0.16 as not all of the baryons available in a halo cool and condense into cold gas that forms visible stars. That’s fine; there’s no requirement that all of the baryons have to become readily observable, but there is also no reason to expect all halos to cool exactly the same fraction of baryons. Naively one would expect at least some variation in f_d from halo to halo, so there could and probably should be a lot of scatter: the gray line could easily be a much wider band than depicted.

In addition to the rather arbitrary value of f_d, this reasoning also predicts a Tully-Fisher relation with the wrong slope. Picking a favorable value of f_d only matches the data over a narrow range of mass. It was nevertheless embraced for many years by many people. Selection effects bias samples to bright galaxies. Consequently, the literature is rife with TF samples dominated by galaxies with M_b > 10¹⁰ M_☉ (the top right corner of the plot above); with so little dynamic range, a slope of 3 looks fine. Once you look outside that tiny box, it does not look fine.

Personally, I think a slope of 3 is an oversimplification. That is the prediction for dark matter halos; there can be effects that vary systematically with mass. An obvious one is adiabatic compression, the effect by which baryons drag some dark matter along with them as they settle to the center of their halos. This increases f_v by an amount that depends on the baryonic surface density. Surface density correlates with mass, so I would nominally expect higher velocities in brighter galaxies; this drives up the slope. There are various estimates of this effect; typically one gets a slope like 3.3, not the observed 4. Worse, it predicts an additional effect: at a given mass, galaxies of higher surface brightness should also have higher velocity. Surface brightness should be a second parameter in the Tully-Fisher relation, but this is not observed.

The easiest way to reconcile the predicted and observed slopes are to make f_d a function of mass. Since Mb = f_d M₂₀₀ and M₂₀₀ ~ V₂₀₀³, Mb ~ f_d V₂₀₀³. Adopting f_v = 1 for simplicity, Mb ~ V_f⁴ follows if f_d ~ V_f. Problem solved, QED.

There are [at least] two problems with this argument. One is that the scaling f_d ~ V_f must hold perfectly without introducing any scatter. This is a fine-tuning problem: we need one parameter to vary precisely with an another, unrelated parameter. There is no good reason to expect this; we just have to insert the required dependence by hand. This is much worse than choosing an arbitrary value for f_d: now we’re making it a rolling fudge factor to match whatever we need it to. We can make it even more complicated by invoking some additional variation in f_v, but this just makes the fine-tuning worse as the product f_df_v^-3 has to vary just so. Another problem is that what we’re doing all this to adjust the prediction of one theory (LCDM) to match that of a different theory (MOND). It is never a good sign when we have to do that, whether we admit it or not.

Abundance matching

The reasoning leading to a slope 3 Tully-Fisher relation assumes a one-to-one relation between baryonic and halo mass (f_d = constant). This is an eminently reasonable assumption. We spent a couple of decades trying to avoid having to break this assumption. Once we do so and make f_d a freely variable parameter, then it can become a rolling fudge factor that can be adjusted to fit anything. Everyone agrees that is Bad. However, it might be tolerable if there is an independent way of estimating this variation. Rather than make f_d just be what we need it to be as described above, we can instead estimate it with abundance matching.

Abundance matching comes from equating the observed number density of galaxies as a function of mass with the number density of dark matter halos. This process gives f_d, or at least the stellar fraction, f_*, which is close to f_d for bright galaxies. Critically, it provides a way to assign dark matter halo masses to galaxies independently of their kinematics. This replaces an arbitrary, rolling fudge factor with a predictive theory.

Abundance matching models generically introduce curvature into the prediction for the BTFR. This stems from the mismatch in the shape of the galaxy stellar mass function (a Schechter function) and the dark halo mass function (a power law on galaxy scales). This leads to a bend in relations that map between visible and dark mass.

The transition from the M ~ V³ reasoning to abundance matching occurred gradually, but became pronounced circa 2010. There are many abundance matching models; I already faced the problem of the multiplicity of LCDM predictions when I wrote a lengthy article on the BTFR in 2012. To get specific, let’s start with an example from then, the model of Trujillo-Gomez-et al. (2011):

Scatter plot showing the relationship between gravitational potential flat rotation speed (Vf in km/s) and baryonic mass (Mb in solar masses). The plot features varying data points marked with blue circles, green squares, and gray squares, indicating different galaxy types or observational methods. A red curve is drawn, illustrating an empirical relationship fitting the data. — *The same data as above with the addition of the line predicted by LCDM in the model of Trujillo-Gomez-et al. (2011).*

One thing Trujillo-Gomez-et al. (2011) say in their abstract is “The data present a clear monotonic LV relation from ∼50 km s⁻¹ to ∼500 km s⁻¹, with a bend below ∼80 km s⁻¹“. By LV they mean luminosity-velocity, i.e., the regular Tully-Fisher relation. The bend they note is real; that’s what happens when you consider only the starlight and ignore the gas. The bend goes away if you include that gas. This was already known at the time – our original BTFR paper from 2000 has nearly a thousand citations, so it isn’t exactly obscure. Ignoring the gas is a choice that makes no sense empirically but makes a lot of sense from the perspective of LCDM simulations. By 2010, these had become reasonably good at matching the numbers of stars observed in galaxies, but the gas properties of simulated galaxies remained, hmmmmmmm, wanting. It makes sense to utilize the part that works. It makes less sense to pretend that this bend is something physically meaningful rather than an artifact of ignoring the gas. The pressure-supported dwarfs are all star dominated, so this distinction doesn’t matter here, and they follow the BTFR, not the stars-only version.

An old problem in galaxy formation theory is how to calibrate the number density of dark matter halos to that of observed galaxies. For a long time, a choice that people made was to match either the luminosity function or the kinematics. These didn’t really match up, so there was occasional discussion of the virtues and vices of the “luminosity function calibration” vs. the “Tully-Fisher calibration.” These differed by a factor of ~2. This tension between remains with us. Mostly simulations have opted to adopt the luminosity function calibration, updated and rebranded as abundance matching. Again, this makes sense from the perspective of LCDM simulations, because the number density of dark matter halos is something that simulations can readily quantify while the kinematics of individual galaxies are much harder to resolve^**.

The nonlinear relation between stellar mass and halo mass obtained from abundance matching inevitably introduces curvature into the corresponding Tully-Fisher relation predicted by such models. That’s what you see in the curved line of Trujillo-Gomez-et al. (2011) above. They weren’t the first to obtain such a result, and the certainly weren’t the last: this is a feature of LCDM with abundance matching, not a bug.

The line of Trujillo-Gomez-et al. (2011) matches the data pretty well at intermediate masses. It diverges to higher velocities at both small and large galaxy masses. I’ve written about this tension at high masses before; it appears to be real, but let’s concentrate on low masses here. At low masses, the velocity of galaxies with M_b < 10⁸ M_☉ appears to be overestimated. But the divergence between model and reality has just begun, and it is hard to resolve small things in simulations, so this doesn’t seem too bad. Yet.

Moving ahead, there are the “Latte” simulations of Wetzel et al. (2016) that use the well-regarded FIRE code to look specifically at simulated dwarfs, both isolated and satellites – specifically satellites of Milky Way-like systems. (Milky Way. Latte. Get it? Nerd humor.) So what does that find?

A graph displaying the relationship between circular velocity (Vf in km/s) and baryonic mass (Mb in solar masses), featuring various data points distinguished by shape and color, including gray squares, green squares, orange triangles, and blue circles to represent different types of galaxies. — *The same data as above with the addition of* simulated dwarfs (orange triangles) from the Latte LCDM simulation of Wetzel et al. (2016), specifically the simulated satellites in the top panel of their Fig. 3. Note that we plot V_f = 2σ for pressure supported systems, both real and simulated.

The individual simulated dwarf satellites of Wetzel et al. (2016) follow the extrapolation of the line predicted by Trujillo-Gomez-et al. (2011). To first order, it is the same result to higher resolution (i.e., smaller galaxy mass). Most of the simulated objects have velocity dispersions that are higher than observed in real galaxies. Intriguingly, there are a couple of simulated objects with M_* ~ 5 x 10⁶ M_☉ that fall nicely among the data where there are both star-dominated and gas-rich galaxies. However, these two are exceptions; the rule appears to be characteristic speeds that are higher than observed.

The lowest mass simulated satellite objects begin to approach the ultrafaint regime, but resolution continues to be an issue: they’re not really there yet. This hasn’t precluded many people from assuming that dark matter will work where MOND fails, which seems like a heck of a presumption given that MOND has been consistently more successful up until that point. Where MOND underpredicts the characteristic velocity of ultrafaints, LCDM hasn’t yet made a clear prediction, and it overpredicts velocities for objects of slightly larger mass. Ain’t no theory covering itself in glory here, but this is a good example where objects that are a problem for MOND are also a problem for dark matter, and it seems likely that non-equilibrium dynamics play a role in either case.

Comparing apples with apples

A persistent issue with comparing simulations to reality is extracting comparable measures. Where circular velocities are measured from velocity fields in rotating galaxies and estimated from measured velocity dispersions in pressure supported galaxies, the most common approach to deriving rotation curves from simulated objects is to sum up particles in spherical shells and assume V² = GM/R. These are not the same quantities. They should be proxies for one another, but equality holds only in the limit of isotropic orbits in spherical symmetry. Reality is messier than that, and simulations aren’t that simple either^%.

Sales et al. (2017) make the effort to make a better comparison between what is observed given how it is observed, and what the simulations would show for that quantity. Others have made a similar effort; a common finding is that the apparent rotation speeds of simulated gas disks do not trace the gravitational potential as simply as GM/R. That’s no surprise, but most simulated rotation curves do not look like those of real galaxies^{^}, so the comparison is not straightforward. Those caveats aside, Sales et al. (2017) are doing the right thing in trying to make an apples-to-apples comparison between simulated and observed quantities. They extract from simulations a quantity V_out that is appropriate for comparison with what we observe in the outer parts of rotation curves. So here is the resulting prediction for the BTFR:

A graph plotting the baryonic mass (Mb in solar masses) against the characteristic flat rotation speed (Vf in km/s) for various galaxies, showing a curve that describes the baryonic Tully-Fisher relation. The scatter points include different types of galaxies, with green squares indicating specific categories. — *The same data as above with the addition of the line predicted by LCDM in the model of* Sales et al. (2017), specifically the formula for V_out in their Table 2 which is *their proxy for the observable rotation speed.*

That’s pretty good. It still misses at high masses (those two big blue points at the top are Andromeda and the Milky Way) and it still bends away from the data at low masses where there are both star-dominated and gas-rich galaxies. (There are a lot more examples of the latter that I haven’t used here because the plot gets overcrowded.) Despite the overshoot, the use of an observable aspect of the simulations gets closer to the data, and the prediction flattens out in the same qualitative sense. That’s good, so one might see cause for hope that this problem is simply a matter of making a fair comparison between simulations and data. We should also be careful not to over-interpret it: I’ve simply plotted the formula they give; the simulations to which they fit it surely do not resolve ultrafaint dwarfs, so really the line should stop at some appropriate mass scale.

Nevertheless, it makes sense to look more closely at what is observed vs. what is simulated. This has recently been done in greater detail by Ruan et al. (2025). They consider two simulations that implement rather different feedback; both wind up producing rotating, gas rich dwarfs that actually fall on the BTFR.

Scatter plot illustrating the baryonic Tully-Fisher relation, showing the relationship between characteristic circular velocity (Vf) and baryonic mass (Mb) for various galaxy types, including data points for ultrafaint dwarfs. — *The same data as above with the addition of* simulated dwarfs of Ruan et al. (2025), specifically from the top right panel of their Fig. 6. The orange circles are their “massives” and the red triangles the “marvels” (the distinction refers to different feedback models).

Finally some success after all these years! Looking at this, it is tempting to declare victory: problem solved. It was just a matter of doing the right simulation all along, and making an apples-to-apples comparison with the data.

That sounds too goo to be true. Is it repeatable in other simulations? What works now that didn’t before?

These are high resolution simulations, but they still don’t resolve ultrafaints. We’re talking here about gas-rich dwarfs. That’s also an important topic, so let’s look more closely. What works now is in the apples-to-apples assessment: what we would measure for V_out is less than V_max (related to V₂₀₀) of the halo:

A graph displaying two panels: the top panel shows the relation between the ratio of mid-outward velocity to maximum velocity (Vout, mid / Vmax, mid) and the logarithm of baryonic mass (Mbar), with data points represented as circles and triangles. The bottom panel illustrates the relationship between the ratio of outer radius to maximum radius (Rout, mid / Rmax, mid) and the logarithm of baryonic mass, also featuring similar data points. — Two panels from Fig. 7 of *Ruan et al. (2025)* showing the ratio of the velocity we might observe relative to the characteristic circular velocity of the halo (top) and the ratio of the radii where these occur (bottom).

The treatment of cold gas in simulations has improved. In these simulations, V_out(R_out) is measured where the gas surface density falls to 1 M_☉ pc^-2, which is typical of many observations. But the true rotation curve is still rising for objects with M_b < a few x 10⁸ M_☉; it has not yet reached a value that is characteristic of the halo. So the apparent velocity is low, even if the dark matter halos are doing basically the same thing as before:

Graph showing the baryonic Tully-Fisher relation, with velocity Vf (km/s) plotted against baryonic mass Mb (solar masses). Data points include various galaxies and dwarf galaxies, with error bars indicating measurement uncertainties. A red line represents the best-fit relation. — As above, but with the addition of the true V_max *(small black dots*) of the simulated halos discussed by *Ruan et al. (2025)*, which follow the relation of *Sales et al. (2017)* (line for V_max in their Table 2).

I have mixed feelings about this. On the one hand, there are many dwarf galaxies with rising rotation curves that we don’t see flatten out, so it is easy to imagine they might keep going up, and I find it plausible that this is what we would find if we looked harder. So plausible that I’ve spend a fair amount of time doing exactly this. Not all observations terminate at 1 M_☉ pc^-2, and whenever we push further out, we see the same damn thing over and over: the rotation curve flattens out and stays flat^!!. That’s been my anecdotal experience; getting beyond that systematically is the point of the MOHNGOOSE survey. This was constructed to detect much lower atomic gas surface densities, and routinely detects gas at the 0.1 M_☉ pc^-2 level where Ruan et al. suggest we should see something closer to V_max. So far, we don’t.

I don’t want to sound too negative, because how we map what we predict in simulations to what we measure in observations is a serious issue. But it seems a bit of a stretch for a low-scatter power law BTFR to be the happenstance of observational sensitivity that cuts in at a convenient mass scale. So far, we see no indication of that in more sensitive observations. I’ll certainly let you know if that changes.

Survey says…

At this juncture, we’ve examined enough examples that the reader can appreciate my concern that LCDM models can predict rather different things. What does the theory really predict? We can’t really test it until we agree what it should do^!!!.

I thought it might be instructive to combine some of the models discussed above. It is.

Graph illustrating the correlation between the characteristic flat rotation speed (Vf) and baryonic mass (Mb) of galaxies. The plot features data points in different colors representing various galaxy types, with lines indicating theoretical trends and empirical relations. — Some of the LCDM predictions discussed above shown together. The dotted line to the right of the data is the halo mass-velocity relation, which is the one thing we all agree LCDM predicts but which is observationally inaccessible. The grey band is a *Mo, Mao, & White*-type model with f_d = 0.025. The red dotted line is the model of *Trujillo-Gomez-et al. (2011)*; the solid red line that of *Sales et al. (2017)* for V_max.

The models run together, more or less, for high mass galaxies. Thanks to observational selection effects, these are the objects we’ve always known about and matched our theories to. In order to test a theory, one wants to force it to make predictions in new regimes it wasn’t built for. Low mass galaxies do that, as do low surface brightness galaxies, which are often but not always low mass. MOND has done well for both, down to the ultrafaints we’re discussing here. LCDM does not yet explain those, or really any of the intermediate mass dwarfs.

What really disturbs me about LCDM models is their flexibility. It’s not just that they miss, it’s that it is possible to miss the data on either side of the BTFR. The older f_d = constant models predict velocities that are too low for low mass galaxies. The more recent abundance matching models predict velocities that are too high for low mass galaxies. I have no doubt that a model can be constructed that gets it right, because there is obviously enough flexibility to do pretty much anything. Adding new parameters until we get it right is an example of epicyclic thinking, as I’ve been pointing out for thirty years. I don’t know what could be worse for an idea like dark matter that is not falsifiable.

We still haven’t come anywhere close to explaining the ultrafaints in either theory. In LCDM, we don’t even know if we should draw a curved line that catches them as if they’re in equilibrium, or start from a power-law BTFR and look for departures from that due to tidal effects. Both are possible in LCDM, both are plausible, as is some combination of both. I expect theorists will pick an option and argue about it indefinitely.

Tidal effects

The typical velocity dispersion of the ultrafaint dwarfs is too high for them to be in equilibrium in MOND. But there’s also pretty much no way these tiny things could be in equilibrium, being in the rough neighborhood dominated by our home, the cosmic gorilla. That by itself doesn’t make an explanation; we need to work out what happens to such things as they evolve dynamically under the influence of a pronounced external field. To my knowledge, this hasn’t been addressed in detail in MOND any more than in LCDM, though Brada & Milgrom addressed some of the relevant issues.

There is a difference in approach required for the two theories. In LCDM, we need to increase the resolution of simulations to see what happens to the tiniest of dark matter halos and their resident galaxies within the larger dark matter halos of giant galaxies. In MOND we have to simulate the evolution along the orbit of each unique individual. This is challenging on multiple levels, as each possible realization of a MOND theory requires its own code. Writing a simulation code for AQUAL requires a different numerical approach than QUMOND, and those are both modifications of gravity via the Poisson euqation. We don’t know which might be closer to reality; heck, we don’t even know [yet] if MOND is a modification of gravity or intertia, the latter being even harder to code.

Cold dark matter is scale-free, so crudely I expect ultrafaint dwarfs in LCDM to do the same as larger dwarf satellites that have been simulated: their outer dark matter halos are gradually whittled away by tidal stripping for many Gyr. At first the stars are unaffected, but eventually so little dark matter is left that the stars start to be lost impulsively during pericenter passages. Though the dark matter is scale free, the stars and the baryonic physics that made them are not, so that’s where it gets tricky. The apparent dark-to-luminous mass ratio is huge, so one possibility is that the ultrafaints are in equilibrium despite their environment; they just made ridiculously few stars from the amount of mass available. That’s consistent with a wild extrapolation of abundance matching models, but how it comes about physically is less clear. For example, at some low mass, a galaxy would make so few stars that none are massive enough to result in a supernova, so there is no feedback, which is what is preventing too many stars from forming. Awkward. Alternately, the constant exposure to tidal perturbation might stir things up, with the velocity dispersion growing and stars getting stripped to form tidal streams, so they may have started as more massive objects. Or some combination of both, plus the evergreen possibility of things that don’t occur to me offhand.

Equilibrium for ultrafaint satellites is not an option in MOND, but tidal stirring and stripping is. As a thought experiment, let’s imagine what happens to a low mass dwarf typical of the field that falls towards the Milky Way from some large distance. Initially gas-rich, the first environmental effect that it is likely to experience is ram pressure stripping by the hot coronal gas around the Milky Way. That’s a baryonic effect that happens in either theory; it’s nothing to do with the effective law of gravity. A galaxy thus deprived of much of its mass will be out of equilibrium; its internal velocities will be typical of the original mass but the stripped mass is less. Consequently, its structure must adjust to compensate; perhaps dwarf Irregulars puff up and are transformed into dwarf Spheroidals in this way. Our notional infalling dwarf may have time to equilibrate to its new mass before being subject to strong tidal perturbation by the Milky Way, or it may not. If not, it will have characteristic internal velocities that are too high for its new mass, and reside above the BTFR. I doubt this suffices to explain [m]any of the ultrafaints, as their masses are so tiny that some stellar mass loss is also likely to have occurred.

Let’s suppose that our infalling dwarf has time to [approximately] equilibrate, or it simply formed nearby to begin with. Now it is a pressure supported system [more or less] on the BTFR. As it orbits the Milky Way, it feels an extra force from the external field. If it stays far enough out to remain in quasi-equilibrium in the EFE regime, then it will oscillate in size and velocity dispersion in phase with the strength of the external field it feels along its orbit.

If instead a satellite dips too close, it will be tidally disturbed and depart from equilibrium. The extra energy may stir it up, increasing its velocity dispersion. It doesn’t have the mass to sustain that, so stars will start to leak out. Tidal disruption will eventually happen, with the details depending on the initial mass and structure of the dwarf and on the eccentricity of its orbit, the distance of closest approach (pericenter), whether the orbit is prograde or retrograde relative to any angular momentum the dwarf may have… it’s complicated, so it is hard to generalize^##. Nevertheless, we (McGaugh & Wolf 2010) anticipated that “the deviant dwarfs [ultrafaints] should show evidence of tidal disruption while the dwarfs that adhere to the BTFR should not.” Unlike LCDM where most of the damage is done at closest approach, we anticipate for MOND that “stripping of the deviant dwarfs should be ongoing and not restricted to pericenter passage” because tides are stronger and there is no cocoon of dark matter to shelter the stars. The effect is still maximized at pericenter, its just not as impulsive as in the some of the dark matter simulations I’ve seen.

This means that there should be streams of stars all over the sky. As indeed there are. For example:

A color-coded map of the northern sky displaying various stellar streams, indicated by labels such as 'Gaia-1*', 'Gaia-3*', and 'GD-1'. The color gradient represents velocity in kilometers per second, with colors ranging from blue for lower velocities to red for higher velocities. — *Stellar streams in the Milky Way identified using Gaia (Malhan et al. 2018).*

As a tidally influence dwarf dissolves, the stars will leak out and form a trail. This happens in LCDM too, but there are differences in the rate, coherence, and symmetry of the resulting streams. Perhaps ultrafaint dwarfs are just the last dregs of the tidal disruption process. From this perspective, it hardly matters if they originated as external satellites or are internal star clusters: globular clusters native to the Milky Way should undergo a similar evolution.

Evolutionary tracks

Perhaps some of the ultrafaint dwarfs are the nuggets of disturbed systems that have suffered mass loss through tidal stripping. That may be the case in either LCDM or MOND, and has appealing aspects in either case – we went through all the possibilities in McGaugh & Wolf (2010). In MOND, the BTFR provides a reference point for what a stable system in equilibrium should do. That’s the starting point for the evolutionary tracks suggested here:

A graph plotting flat rotation speed (Vf) in km/s against baryonic mass (Mb) in solar masses. The data points include various galaxies represented as blue circles and green squares, with error bars indicating measurement uncertainty. A solid black line demonstrates the overall trend, while red curves suggest alternative theoretical predictions. — *BTFR with conceptual evolutionary tracks (red lines) for tidally-stirred ultrafaint dwarfs.*

Objects start in equilibrium on the BTFR. As they become subject to the external field, their velocity dispersions first decreases as they transition through the quasi-Newtonian regime. As tides kick in, stars are lost and stretched along the satellite’s orbit, so mass is lost but the apparent velocity dispersion increases as stars gradually separate and stretch out along a stream. Their relative velocities no longer represent a measure of the internal gravitational potential; rather than a cohesive dwarf satellite they’re more an association of stars in similar orbits around the Milky Way.

This is crudely what I imagine might be happening in some of the ultrafaint dwarfs that reside above the BTFR. Reality can be more complicated, and probably is. For example, objects that are not yet disrupted may oscillate around and below the BTFR before becoming completely unglued. Moreover, some individual ultrafaints probably are not real, while the data for others may suffer from systematic uncertainties. There’s a lot to sort out, and we’ve reached the point where the possibility of non-equilibrium effects cannot be ignored.

As a test of theories, the better course remains to look for new galaxies free from environmental perturbation. Ultrafaint dwarfs in the field, far from cosmic gorillas like the Milky Way, would be ideal. Hopefully many will be discovered in current and future surveys.

^!Other examples exist and continue to be discovered. More pertinent to my thinking is that the mass threshold at which reionization is supposed to suppress star formation has been a constantly moving goal post. To give an amusing anecdote, while I was junior faculty at the University of Maryland (so at least twenty years ago), Colin Norman called me up out of the blue. Colin is an expert on star formation, and had a burning question he thought I could answer. “Stacy,” he says as soon as I pick up, “what is the lowest mass star forming galaxy?” Uh, Hi, Colin. Off the cuff and totally unprepared for this inquiry, I said “um, a stellar mass of a few times 10⁷ solar masses.” Colin’s immediate response was to laugh long and loud, as if I had made the best nerd joke ever. When he regained his composure, he said “We know that can’t be true as reionization will prevent star formation in potential wells that small.” So, after this abrupt conversation, I did some fact-checking, and indeed, the number I had pulled out of my arse on the spot was basically correct, at that time. I also looked up the predictions, and of course Colin knew his business too; galaxies that small shouldn’t exist. Yet they do, and now the minimum known is two orders of magnitude lower in mass, with still no indication that a lower limit has been reached. So far, the threshold of our knowledge has been imposed by observational selection effects (low luminosity galaxies are hard to see), not by any discernible physics.

More recently, McQuinn et al. (2024) have made a study of the star formation histories of Leo P and a few similar galaxies that are near enough to see individual stars so as to work out the star formation rate over the course of cosmic history. They argue that there seems to be a pause in star formation after reionization, so a more nuanced version of the hypothesis may be that reionization did suppress star forming activity for a while, but these tiny objects were subsequently able to re-accrete cold gas and get started again. I find that appealing as a less simplistic thing that might have happened in the real universe, and not just a simple on/off switch that leaves only a fossil. However, it isn’t immediately clear to me that this more nuanced hypothesis should happen in LCDM. Once those baryons have evaporated, they’re gone, and it is far from obvious that they’ll ever come back to the weak gravity of such a small dark matter halo. It is also not clear to me that this interpretation, appealing as it is, is unique: the reconstructed star formation histories also look consistent with stochastic star formation, with fluctuations in the star formation rate being a matter of happenstance that have nothing to do with the epoch of reionization.

^#So how are ultrafaint dwarfs different from star clusters? Great question! Wish we had a great answer.

Some ultrafaints probably are star clusters rather than independent satellite galaxies. How do we tell the difference? Chiefly, the velocity dispersion: star clusters show no need for dark matter, while ultrafaint dwarfs generally appear to need a lot. This of course assumes that their measured velocity dispersions represent an equilibrium measure of their gravitational potential, which is what we’re questioning here, so the opportunity for circular reasoning is rife.

^$Rather than apply a strict luminosity cut, for convenience I’ve kept the same “not safe from tidal disruption” distinction that we’ve used before. Some of the objects in the 10⁵ – 10⁶ M_☉ range might belong more with the classical dwarfs than with the ultrafaints. This is a reminder that our nomenclature is terrible more than anything physically meaningful.

^&Astronomy is an observational science, not a laboratory science. We can only detect the photons nature sends our way. We cannot control all the potential systematics as can be done in an enclosed, finite, carefully controlled laboratory. That means there is always the potential for systematic uncertainties whose magnitude can be difficult to estimate, or sometimes to even be aware of, like how local variations impact Jeans analyses. This means we have to take our error bars with a grain of salt, often such a big grain as to make statistical tests unreliable: goodness of fit is only as meaningful as the error bars.

I say this because it seems to be the hardest thing for physicists to understand. I also see many younger astronomers turning the crank on fancy statistical machinery as if astronomical error bars can be trusted. Garbage in, garbage out.

^*This is an example of setting a parameter in a model “by hand.”

^**The transition to thinking in terms of the luminosity function rather than Tully-Fisher is so complete that the most recent, super-large, Euclid flagship simulation doesn’t even attempt to address the kinematics of individual galaxies while giving extraordinarily detailed and extensive details about their luminosity distributions. I can see why they’d do that – they want to focus on what the Euclid mission might observe – but it is also symptomatic of the growing tendency to I’ve witnessed to just not talk about those pesky kinematics.

^%Halos in dark matter simulations tend to be rather triaxial, i.e., a 3D bloboid that is neither spherical like a soccer ball nor oblate like a frisbee nor prolate like an American football: each principle axis has a different length. If real halos were triaxial, it would lead to non-circular orbits in dark matter-dominated galaxies that are not observed.

The triaxiality of halos is a result from dark matter-only simulations. Personally, I suspect that the condensation of gas within a dark matter halo (presuming such things exist) during the process of galaxy formation rounds-out the inner halo, making it nearly spherical where we are able to make measurements. So I don’t see this as necessarily a failure of LCDM, but rather an example of how more elaborate simulations that include baryonic physics are sometimes warranted. Sometimes. There’s a big difference between this process, which also compresses the halo (making it more dense when it already starts out too dense), and the various forms of feedback, which may or may not further alter the structure of the halo.

^{^}There are many failure modes in simulated rotation curves, the two most common being the cusp-core problem in dwarfs and sub-maximal disks in giants. It is common for the disks of bright spiral galaxies to be nearly maximal in the sense that the observed stars suffice to explain the inner rotation curve. They may not be completely maximal in this sense, but they come close for normal stellar populations. (Our own Milky Way is a good example.) In contrast, many simulations produce bright galaxies that are absurdly sub-maximal; EAGLE and SIMBA being two examples I remember offhand.

Another common problem is that LCDM simulations often don’t produce rotation curves that are as flat as observed. This was something I also found in my early attempts at model-building with dark matter halos. It is easy to fit a flat rotation curve given the data, but it is hard to predict a priori that rotation curves should be flat.

^!!Gravitational lensing indicates that rotation curves remain flat to even larger radii. However, these observations are only sensitive to galaxies more massive than those under discussion here. So conceivably there could be another coincidence wherein flatness persists for galaxies with M_b > 10¹⁰ M_☉, but not those with M_b < 10⁹ M_☉.

^!!!Many in the community seem to agree that it will surely work out.

^##I’ve tried to estimate dissolution timescales, but find the results wanting. For plausible assumptions, one finds timescales that seem plausible (a few Gyr) but with some minor fiddling one can also find results that are no-way that’s-too-short (a few tens of millions of years), depending on the dwarf and its orbit. These are crude analytic estimates; I’m not satisfied that these numbers were particularly meaningful. Still, this is a worry with the tidal-stirring hypothesis: will perturbed objects persist long enough to be observed as they are? This is another reason we need detailed simulations tailored to each object.

^{*&^#}Note added after initial publication: While I was writing this, a nice paper appeared on exactly this issue of the star formation history of a good number of ultrafaint dwarfs. They find that 80% of the stellar mass formed 12.48 ± 0.18 Gyr ago, so 12.5 was a good guess. Formally, at the one sigma level, this is a little after reionization, but only a tiny bit, so close enough: the bulk of the stars formed long ago, like a classical globular cluster, and these ultrafaints are consistent with being fossils.

Intriguingly, there is a hint of an age difference by kinematic grouping, with things that have been in the Milky Way being the oldest, those on first infall being a little younger (but still very old), and those infalling with the Large Magellanic Cloud a tad younger still. If so, then there is more to the story than quenching by cosmic reionization.

They also show a nice collection of images so you can see more examples. The ellipses trace out the half-light radii, so can see the proclivity for many (not all!) of these objects to be elongated, perhaps as a result of tidal perturbation:

**Figure 2** from Durbin et al. (2025): *Footprints of all HST observations (blue filled patches) overlaid on DSS2 imaging cutouts. Open black ellipses show the galaxy profiles at one half-light radius.*

Non-equilibrium dynamics in galaxies that appear to lack dark matter: ultradiffuse galaxies

Previously, we discussed non-equilibrium dynamics in tidal dwarf galaxies. These are the result of interactions between giant galaxies that are manifestly a departure from equilibrium, a circumstance that makes TDGs potentially a decisive test to distinguish between dark matter and MOND, and simultaneously precludes confident application of that test. There are other galaxies for which I suspect non-equilibrium dynamics may play a role, among them some (not all) of the so-called ultradiffuse galaxies (UDGs).

UDGs

The term UDG has been adopted for galaxies below a certain surface brightness threshold with a size (half-light radius) in excess of 1.5 kpc (van Dokkum et al. 2015). I find the stipulation about the size to be redundant, as surface brightness^* is already a measure of diffuseness. But OK, whatever, these things are really spread out. That means they should be good tests of MOND like low surface brightness galaxies before them: their low stellar surface densities mean^** that they should be in the regime of low acceleration and evince large mass discrepancies when isolated. It also makes them susceptible to the external field effect (EFE) in MOND when they are not isolated, and perhaps also to tidal disruption.

To give some context, here is a plot of the size-mass relation for Local Group dwarf spheroidals. Typically they have masses comparable to globular clusters, but much large sizes – a few hundred parsecs instead of just a few. As with more massive galaxies, these pressure supported dwarfs are all over the place – at a give mass, some are large while others are relatively compact. All but the one most massive galaxy in this plot are in the MOND regime. For convenience, I’ll refer to the black points labelled with names as UDGs⁺.

The size (radius encompassing half of the total light) and stellar mass of Local Group dwarf spheroidals (green points selected by McGaugh et al. 2021 to be relatively safe from external perturbation) along with two more Local Group dwarfs that are subject to the EFE (Crater 2 and Antlia 2) and the two UDGs NGC 1052-DF2 and DF4. Dotted lines show loci of constant surface density. For reference, the solar neighborhood has ~40 M_☉ pc^-2; the centers of high surface brightness galaxies frequently exceed 1,000 M_☉ pc^-2.

The UDGs are big and diffuse. This makes them susceptible to the EFE and tidal effects. The lower the density of a system, the easier it is for external systems to mess with it. The ultimate example is something gets so close to a dominant central mass that it gets tidally disrupted. That can happen conventionally; the stronger effective force of MOND increases tidal effects. Indeed, there is only a fairly narrow regime between the isolated case and tidally-induced disequilibrium where the EFE modifies the internal dynamics in a quasi-static way.

The trouble is the s-word: static. In order to test theories, we assume that the dynamical systems we observe are in equilibrium. Though often a good assumption, it doesn’t always hold. If we forget we made the assumption, we might think we’ve falsified a theory when all we’ve done is discover a system that is out of equilibrium. The universe is a very dynamic place – the whole thing is expanding, after all – so we need to be wary of static thinking.

Equilibrium MOND formulae

That said, let’s indulge in some static thinking. An isolated, pressure supported galaxy in the MOND regime will have an equilibrium velocity dispersion

where M is the mass (the stellar mass in the case of a gas-free dwarf spheroidal), G is Newton’s constant, and a₀ is Milgrom’s acceleration constant. The number 4/81 is a geometrical factor that assumes we’re observing a spherical system with isotropic orbits, neither of which is guaranteed even in the equilibrium case, and deviations from this idealized situation are noticeable. Still, this is as simple as it gets: if you know the mass, you can predict the characteristic speed at which stars move. Mass is all that matters: we don’t care about the radius as we must with Newton (v² = GM/r); the only other quantities are constants of nature.

But what do we mean by isolated? In MOND, it is that the internal acceleration of the system, g_in, exceeds that from external sources, g_ex: g_in ≫ g_ex. For a pressure supported dwarf, g_in ≈ 3σ²/r (so here the size of the dwarf does matter, as does the location of a star within it), while the external field from a giant host galaxy would be g_ex = V_f²/D where V_f is the flat rotation speed stipulated by the baryonic mass of the host and D is the distance from the host to the dwarf satellite. The distance is not a static quantity. As a dwarf orbits its host, D will vary by an amount that depends on the eccentricity of the orbit, and the external field will vary with it, so it is possible to have an orbit in which a dwarf satellite dips in and out of the EFE regime. Many Local Group dwarfs straddle the line g_in ≈ g_ex, and it takes time to equilibrate, so static thinking can go awry.

It is possible to define a sample of Local Group dwarfs that have sufficiently high internal accelerations (but also in the MOND regime with g_ex ≪ g_in ≪ a₀) that we can pretend they are isolated, and the above equation applies. Such dwarfs should^& fall on the BTFR, which they do:

The baryonic Tully-Fisher relation (BTFR) including pressure supported dwarfs (green points) with their measured velocity dispersions matched to the flat rotation speeds of rotationally supported galaxies (blue points) via the prescription of *McGaugh et al. (2021)*. The large blue points are rotators in the Local Group (with Andromeda and the Milky Way up near the top); smaller points are spirals with direct distance measurements (Schombert et al. 2020). The Local Group dwarfs assessed to be safe from external perturbation are on the BTFR (for V_f = 2σ); Crater 2 and the UDGs near NGC 1052 are not.

In contrast, three of the four the UDGs considered here do not fall on the BTFR. Should they?

Conventionally, in terms of dark matter, probably they should. There is no reason for them to deviate from whatever story we make up to explain the BTFR for everything else. That they do means we have to make up a separate story for them. I don’t want to go deeply into this here since the cold dark matter model doesn’t really explain the observed BTFR in the first place. But even accepting that it does so after invoking feedback (or whatever), does it tolerate deviants? In a broad sense, yes: since it doesn’t require the particular form of the BTFR that’s observed, it is no problem to deviate from it. In a more serious sense, no: if one comes up with a model that explains the small scatter of the BTFR, it is hard to make that same model defy said small scatter. I know, I’ve tried. Lots. One winds up with some form of special pleading in pretty much any flavor of dark matter theory on top of whatever special pleading we invoked to explain the BTFR in the first place. This is bad, but perhaps not as bad as it seems once one realizes that not everything has to be in equilibrium all the time.

In MOND, the BTFR is absolute – for isolated systems in equilibrium. In the EFE regime, galaxies can and should deviate from it even if they are in equilibrium. This always goes in the sense of having a lower characteristic velocity for a given mass, so below the line in the plot. To get above the line would require being out of equilibrium through some process that inflates velocities (if systematic errors are not to blame, which also sometimes happens.)

The velocity dispersion in the EFE regime (g_in ≪ g_ex ≪ a₀) is slightly more complicated than this isolated case:

This is just like Newton except the effective value of the gravitational constant is modified. It gets a boost^{^} by how far the system is in the MOND regime: G_eff ≈ G(a₀/g_ex). An easy way to tell which regime an object is in is to calculate both velocity dispersions σ_iso and σ_efe: the smaller one is the one that applies^#. An upshot of this is that systems in the EFE regime should deviate from the BTFR to the low velocity side. The amplitude of the deviation depends on the system and the EFE: both the size and mass matter, as does g_ex. Indeed, if an object is on an eccentric orbit, then the velocity dispersion can vary with the EFE as the distance of the satellite from its host varies, so over time the object would trace out some variable path in the BTFR plane.

Three of the four UDGs fall off the BTFR, so that sounds mostly right, qualitatively. Is it? Yes, for Crater 2, but but not really for the others. Even for Crater 2 it is only a partial answer, as non-equilibrium effects may play a role. This gets involved for Crater 2, then more so for the others, so let’s start with Crater 2.

Crater 2 – the velocity dispersion

The velocity dispersion of Crater 2 was correctly predicted a priori by the formula for σ_efe above. It is a tiny number, 2 km/s, and that’s what was subsequently observed. Crater 2 is very low mass, ~3 x 10⁵ M_☉, which is barely a globular cluster, but it is even more spread out than the typical dwarf spheroidal, having an effective surface density of only ~0.05 M_☉pc^-2. If it were isolated, MOND predicts that it would have a higher velocity dispersion – all of 4 km/s. That’s what it would take to put it on the BTFR above. The seemingly modest difference between 2 and 4 km/s makes for a clear offset. But despite its substantial current distance from the Milky Way (~ 120 kpc), Crater 2 is so low surface density that it is still subject to the external field effect, which lowers its equilibrium velocity dispersion. Unlike isolated galaxies, it should be offset from the BTFR according to MOND.

LCDM struggles to explain the low mass end of the BTFR because it predicts a halo mass-circular speed relation M_halo ~ V_halo³ that differs from the observed M_b ~ V_f⁴. A couple of decades ago, it looked like massive galaxies might be consistent with the lower power-law, but that anticipates higher velocities for small systems. The low velocity dispersion of Crater 2 is thus doubly weird in LCDM. It’s internal velocities are too small not just once – the BTFR is already lower than was expected – but twice, being below even that.

An object with a large radial extent like Crater 2 probes far out into its notional dark matter halo, making the nominal prediction^$ of LCDM around ~17 km/s, albeit with a huge expected scatter. Even if we can explain the low mass end of the BTFR and its unnaturally low scatter in LCDM, we now have to explain this exception to it – an exception that is natural in MOND, but is on the wrong side of the probability distribution for LCDM. That’s one of the troubles with tuning LCDM to mimic MOND: if you succeed in explaining the first thing, you still fail to anticipate the other. There is no EFE^% in LCDM, no reason to anticipate that σ_efe applies rather than σ_iso, and no reason to expect via feedback that this distinction has anything to do with the dynamical accelerations g_in and g_ex.

But wait – this is a post about non-equilibrium dynamics. That can happen in LCDM too. Indeed, one expects that satellite galaxies suffer tidal effects in the field of their giant host. The primary effect is that the dark matter subhalos in which dwarf satellites reside are stripped from the outside in. Their dark matter becomes part of the large halo of the host. But the stars are well-cocooned in the inner cusp of the NFW halo which is more robust than the outskirts of the subhalo, so the observable velocity dispersion barely evolves until most of the dark mass has been stripped away. Eventually, the stars too get stripped, forming tidal streams. Most of the damage occurs during pericenter passage when satellites are closest to their host. What’s left is no longer in equilibrium, with the details depending on the initial conditions of the dwarf on infall, the orbit, the number of pericenter passages, etc., etc.

What does not come out of this process is Crater 2 – at least not naturally. It has stars very far out – these should get stripped outright if the subhalo has been eviscerated to the point where its velocity dispersion is only 2 km/s. This tidal limitation has been noted by Errani et al.: “the large size of kinematically cold ‘feeble giant’ satellites like Crater 2 or Antlia 2 cannot be explained as due to tidal effects alone in the Lambda Cold Dark Matter scenario.” To save LCDM, we need something extra, some additional special pleading on top of non-equilibrium tidal effects, which is why I previously referred to Crater 2 as the Bullet Cluster of LCDM: an observation so problematic that it amounts to a falsification.

Crater 2 – the orbit

We held a workshop on dwarf galaxies on CWRU’s campus in 2017 where issues pertaining to both dark matter and MOND discussed. The case of Crater 2 was one of the things discussed, and it was included in the list of further tests for both theories (see above links). Basically the expectation in LCDM is that most subhalo orbits are radial (highly eccentric), so that is likely to be the case for Crater 2. In contrast, the ultradiffuse blob that is Crater 2 would not survive a close passage by the Milky Way given the strong tidal force exerted by MOND, so the expectation was for a more tangential (quasi-circular) orbit that keeps it at a safe distance.

Subsequently, it became possible to constrain orbits with Gaia data. The exact orbit depends on the gravitational potential of the Milky Way, which isn’t perfectly known. However, several plausible choices of the global potential give an an eccentricity around 0.6. That’s not exactly radial, but it’s pretty far from circular, placing the pericenter around 30 kpc. That’s much closer than its current distance, and well into the regime where it should be tidally disrupted in MOND. No way it survives such a close passage!

So which is it? MOND predicted the correct velocity dispersion, which LCDM struggles to explain. Yet the orbit is reasonable in LCDM, but incompatible with MOND.

Simulations of dwarf satellites

It occurs to me that we might be falling victim to static thinking somewhere. We talked about the impact of tides on dark matter halos a bit above. What should we expect in MOND?

The first numerical simulations of dwarf galaxies orbiting a giant host were conducted by Brada & Milgrom (2000). Their work is specific to the Aquadratic Lagrangian (AQUAL) theory proposed by Bekenstein & Milgrom (1984). This was the first demonstration that it was possible to write a version of MOND that conserved momentum and energy. Since then, a number of different approaches have been demonstrated. These can be subtly different, so it is challenging to know which (if any) is correct. Sorting that out is well beyond the scope of this post, so let’s stick to what we can learn from Brada & Milgrom.

Brada & Milgrom followed the evolution of low surface density dwarfs of a range of masses as they orbited a giant host galaxy. One thing they found was that the behavior of the numerical model could deviate from the analytic expectation of quasi-equilibrium enshrined in the equations above. For an eccentric orbit, the external field varies with distance from the host. If there is enough time to respond to this, the change can be adiabatic (reversible), and the static approximation may be close enough. However, as the external field varies more rapidly and/or the dwarf is more fragile, the numerical solution departs from the simple analytic approximation. For example:

**Fig. 2** of Brada & Milgrom (2000): showing the numerically calculated (dotted line) variation of radius (left) and characteristic velocity (right) for a dwarf on a mildly eccentric orbit (peri- and apocenter of roughly 60 and 90 kpc, respectively, for a Milky Way-like host). Also shown is the variation in the EFE as the dwarf’s distance from the host varies (solid line). Dwarfs go through a breathing mode of increasing/decreasing size and decreasing/increasing velocity dispersion in phase with the orbit. If this process is adiabatic, it tracks the solid line and the static EFE approximation holds. This is not always the case in the simulation, so applying our usual assumption of dynamical equilibrium will result in an error stipulated by the difference between the dotted and solid lines. The amplitude of this error depends on the size, mass, and orbital history of each and every dwarf satellite.

As long as the behavior is adiabatic, the dwarf can be stable indefinitely even as it goes through periodic expansion and contraction in phase with the orbit. Departure from adiabaticity means that every passage will be different. Some damage will be done on the first passage, more on the second, and so on. As a consequence, reality will depart from our simple analytic expectations.

I was aware of this when I made the prediction for the velocity dispersion of Crater 2, and hedged appropriately. Indeed, I worried that Crater 2 should already be out of equilibrium. Nevertheless, I took solace in two things: first, the orbital timescale is long, over a Gyr, so departures from the equilibrium prediction might not have had time to make a dramatic difference. Second, this expectation is consistent with the slow evolution of the characteristic velocity for the most Crater 2-like, m=1 model of Brada & Milgrom (bottom track in the right panel below):

**Fig. 4** of Brada & Milgrom (2000): The variation of the size and characteristic velocity of dwarf models of different mass. The more massive models approximate the adiabatic limit, which gradually breaks down for the lowest mass models. In this example, the m = 1 and 2 models explode, with the scale size growing gradually without recovering.

What about the size? That is not constant except for the most massive (m=16) model. The m=3 and 4 models recover, albeit not adiabatically. The m=4 model almost returns to its original size, but the m=3 model has puffed up after one orbit. The m=1 and 2 models explode.

One can see this by eye. The continuous growth in radii of the lower mass models is obvious. If one looks closely, one can also see the expansion then contraction of the heavier models.

**Fig. 5** of Brada & Milgrom (2000): AQUAL numerical simulations dwarf satellites orbiting a more massive host galaxy. The parameter m describes the mass and effective surface density of the satellite; all the satellites are in the MOND regime and subject to the external field of the host galaxy, which exceeds their internal accelerations. In dimensionless simulation units, m = 5 x 10^-5, which for a satellite of the Milky Way corresponds roughly to a stellar mass of 3 x 10⁶ *M_☉*. For real dwarf satellite galaxies, the scale size is also relevant, but the sequence of m above suffices to illustrate the increasingly severe effects of the external field as m decreases.

The current size of Crater 2 is unusual. It is very extended for its mass. If the current version of Crater 2 has a close passage with the Milky Way, it won’t survive. But we know it already had a close passage, so it should be expanding now as a result. (I did discuss the potential for non-equilibrium effects.) Knowing now that there was a pericenter passage in the (not exactly recent) past, we need to imagine running back the clock on the simulations. It would have been smaller in the past, so maybe it started with a normal size, and now appears so large because of its pericenter passage. The dynamics predict something like that; it is static thinking to assume it was always thus.

The dotted line shows a possible evolutionary track for Crater 2 as it expands after pericenter passage. Its initial condition would have been amongst the other dwarf spheroidals. It could also have lost some mass in the process, so any of the green low-mass dwarfs might be similar to the progenitor.

This is a good example of a phenomena I’ve encountered repeatedly with MOND. It predicts something right, but seems to get something else wrong. If we’re already sure it is wrong, we stop there and never think further. But when one bothers to follow through on what the theory really predicts, more often than not the apparently problematic observation is in fact what we should have expected in the first place.

DF2 and DF4

DF2 and DF4 are two UDGs in the vicinity of the giant galaxy NGC 1052. They have very similar properties, and are practically identical in terms of having the same size and mass within the errors. They are similar to Crater 2 in that they are larger than other galaxies of the same mass.

When it was first discovered, NGC 1052-DF2 was portrayed as a falsification of MOND. On closer examination, had I known about it, I could have used MOND to correctly predict its velocity dispersion, just like the dwarfs of Andromeda. This seemed like yet another case where the initial interpretation contrary to MOND melted away to actually be a confirmation. At this point, I’ve seen literally hundreds^{^^}of cases like that. Indeed, this particular incident made me realize that there would always be new cases like that, so I decided to stop spending my time addressing every single case.

Since then, DF2 has been the target of many intensive observing campaigns. Apparently it is easier to get lots of telescope time to observe a single object that might have the capacity to falsify MOND than it is to get a more modest amount to study everything else in the universe. That speaks volumes about community priorities and the biases that inform them. At any rate, there is now lots more data on this one object. In some sense there is too much – there has been an active debate in the literature over the best distance determination (which affects the mass) and the most accurate velocity dispersion. Some of these combinations are fine with MOND, but others are not. Let’s consider the worst case scenario.

In the worst case scenario, both DF2 and DF4 are too far from NGC 1052 for its current EFE to have much impact, and they have relatively low velocity dispersions for their luminosity, around 8 km/s, so they fall below the BTFR. Worse for MOND is that this is about what one expects from Newton for the stars alone. Consequently, these galaxies are sometimes referred to as being “dark matter free.” That’s a problem for MOND, which predicts a larger velocity dispersion for systems in equilibrium.

Perhaps we are falling prey to static thinking, and these objects are not in equilibrium. While their proximity to neighboring galaxies and the EFE to which they are presently exposed depends on the distance, which is disputed, it is clear that they live in a rough neighborhood with lots of more massive galaxies that could have bullied them in a close passage at some point in the past. Looking at Fig. 4 of Brada & Milgrom above, I see that galaxies whacked out of equilibrium not only expand in radius, potentially explaining the unusually large sizes of these UDGs, but they also experience a period during which their velocity dispersion is below the equilibrium value. The amplitude of the dip in these simulations is about right to explain the appearance of being dark-matter-free.

It is thus conceivable that DF2 and DF4 (the two are nearly identical in the relevant respects) suffered some sort of interaction that perturbed them into their current state. Their apparent absence of a mass discrepancy and the apparent falsification of MOND that follows therefrom might simply be a chimera of static thinking.

Make no mistake: this is a form of special pleading. The period of depressed velocity dispersion does not last indefinitely, so we have to catch them at a somewhat special time. How special depends on the nature of the interaction and its timescale. This can be long in intergalactic space (Gyrs), so it may not be crazy special, but we don’t really know how special. To say more, we would have to do detailed simulations to map out the large parameter space of possibilities for these objects.

I’d be embarrassed for MOND to have to make this kind of special pleading if we didn’t also have to do it for LCDM. A dwarf galaxy being dark matter free in LCDM shouldn’t happen. Galaxies form in dark matter halos; it is very hard to get rid of the dark matter while keeping the galaxy. The most obvious way to do it, in rare cases, is through tidal disruption, though one can come up with other possibilities. These amount to the same sort of special pleading we’re contemplating on behalf of MOND.

Recently, Tang et al. (2024) argue that DF2 and DF4 are “part of a large linear substructure of dwarf galaxies that could have been formed from a high-velocity head-on encounter of two gas-rich galaxies” which might have stripped the dark matter while leaving the galactic material. That sounds… unlikely. Whether it is more or less unlikely than what it would take to preserve MOND is hard to judge. It appears that we have to indulge in some sort of special pleading no matter what: it simply isn’t natural for galaxies to lack dark matter in a universe made of dark matter, just as it is unnatural for low acceleration systems to not manifest a mass discrepancy in MOND. There is no world model in which these objects make sense.

Tang et al. (2024) also consider a number of other possibilities, which they conveniently tabulate:

There are many variations on awkward hypotheses for how these particular UDGs came to be in LCDM. They’re all forms of special pleading. Even putting on my dark matter hat, most sound like crazy talk to me. (Stellar feedback? Really? Is there anything it cannot do?) It feels like special pleading on top of special pleading; it’s special pleading all the way down. All we have left to debate is which form of special pleading seems less unlikely than the others.

I don’t find this debate particularly engaging. Something weird happened here. What that might be is certainly of interest, but I don’t see how we can hope to extract from it a definitive test of world models.

Antlia 2

The last of the UDGs in the first plot above is Antlia 2, which I now regret including – not because it isn’t interesting, but because this post is getting exhausting. Certainly to write, perhaps to read.

Antlia 2 is on the BTFR, which is ordinarily normal. In this case it is weird in MOND, as the EFE should put it off the BTFR. The observed velocity dispersion is 6 km/s, but the static EFE formula predicts it should only be 3 km/s. This case should be like Crater 2.

First, I’d like to point out that, as an observer, it is amazing to me that we can seriously discuss the difference between 3 and 6 km/s. These are tiny numbers by the standard of the field. The more strident advocates of cold dark matter used to routinely assume that our rotation curve observations suffered much larger systematic errors than that in order to (often blithely) assert that everything was OK with cuspy halos so who are you going to believe, our big, beautiful simulations or those lying data?

I’m not like that, so I do take the difference seriously. My next question, whenever MOND is a bit off like this, is what does LCDM predict?

I’ll wait.

Well, no, I won’t, because I’ve been waiting for thirty years, and the answer, when there is one, keeps changing. The nominal answer, as best I can tell, is ~20 km/s. As with Crater 2, the large scale size of this dwarf means it should sample a large portion of its dark matter halo, so the expected characteristic speed is much higher than 6 km/s. So while the static MOND prediction may be somewhat off here, the static LCDM expectation fares even worse.

This happens a lot. Whenever I come across a case that doesn’t make sense in MOND, it usually doesn’t make sense in dark matter either.

In this case, the failure of the static-case prediction is apparently caused by tidal perturbation. Like Crater 2, Antlia 2 may have a large half-light radius because it is expanding in the way seen in the simulations of Brada & Milgrom. But it appears to be a bit further down that path, with member stars stretched out along the orbital path. They start to trace a small portion of a much deeper gravitational potential, so the apparent velocity dispersion goes up in excess of the static prediction.

**Fig. 9** from Ji et al. (2021) showing tidal features in Antlia 2 considering the effects of the Milky Way alone (left panel) and of the Milky Way and the Large Magellanic Cloud together (central panel) along with the position-velocity diagram from individual stars (right panel). The object is clearly not the isotropic, spherical cow presumed by the static equation for the velocity dispersion. Indeed, it is elongated as would be expected from tidal effects, with individual member stars apparently leaking out.

This is essentially what I inferred must be happening in the ultrafaint dwarfs of the Milky Way. There is no way that these tiny objects deep in the potential well of the Milky Way escape tidal perturbation^%% in MOND. They may be stripped of their stars and their velocity dispersions mage get tidally stirred up. Indeed, Antlia 2 looks very much like the MOND prediction for the formation of tidal streams from such dwarfs made by McGaugh & Wolf (2010). Unlike dark matter models in which stars are first protected, then lost in pulses during pericenter passages, the stronger tides of MOND combined with the absence of a protective dark matter cocoon means that stars leak out gradually all along the orbit of the dwarf. The rate is faster when the external field is stronger at pericenter passage, but the mass loss is more continuous. This is a good way to make long stellar streams, which are ubiquitous in the stellar halo of the Milky Way.

So… so what?

It appears that aspects of the observations of the UDGs discussed here that seem problematic for MOND may not be as bad for the theory as they at first seem. Indeed, it appears that the noted problems may instead be a consequence of the static assumptions we usually adopt to do the analysis. The universe is a dynamic place, so we know this assumption does not always hold. One has to judge each case individually to assess whether this is reasonable or not.

In the cases of Crater 2 and Antlia 2, yes, the stranger aspects of the observations fit well with non-equilibrium effects. Indeed, the unusually large half-light radii of these low mass dwarfs may well be a result of expansion after tidal perturbation. That this might happen was specifically anticipated for Crater 2, and Antlia 2 fits the bill described by McGaugh & Wolf (2010) as anticipated by the simulations of Brada & Milgrom (2000) even though it was unknown at the time.

In the cases of DF2 and DF4, it is less clear what is going on. I’m not sure which data to believe, and I want to refrain from cherry-picking, so I’ve discussed the worst-case scenario above. But the data don’t make a heck of a lot of sense in any world view; the many hypotheses made in the dark matter context seem just as contrived and unlikely as a tidally-induced, temporary dip in the velocity dispersion that might happen in MOND. I don’t find any of these scenarios to be satisfactory.

This is a long post, and we have only discussed four galaxies. We should bear in mind that the vast majority of galaxies do as predicted by MOND; a few discrepant cases are always to be expected in astronomy. That MOND works at all is a problem for the dark matter paradigm: that it would do so was not anticipated by any flavor of dark matter theory, and there remains no satisfactory explanation of why MOND appears to happen in a universe made of dark matter. These four galaxies are interesting cases, but they may be an example of missing the forest for the trees.

^*As it happens, the surface brightness threshold adopted in the definition of UDGs is exactly the same as I suggested for VLSBGs (very low surface brightness galaxies: McGaugh 1996), once the filter conversions have been made. At the time, this was the threshold of our knowledge, and I and other early pioneers of LSB galaxies were struggling to convince the community that such things might exist. Up until that time, the balance of opinion was that they did not, so it is gratifying to see that they do.

^**This expectation is specific to MOND; it doesn’t necessarily hold in dark matter where the acceleration in the central regions of diffuse galaxies can be dominated by the cusp of the dark matter halo. These were predicted to exceed what is observed, hence the cusp-core problem.

⁺Measuring by surface brightness, Crater 2 and Antlia 2 are two orders of magnitude more diffuse than the prototypical ultradiffuse galaxies DF2 and DF4. Crater 2 is not quite large enough to count as a UDG by the adopted size definition, but Antlia 2 is. So does that make it super-ultra diffuse? Would it even be astronomy without terrible nomenclature?

^&I didn’t want to use a MOND-specific criterion in McGaugh et al. (2021) because I was making a more general point, so the green points are overly conservative from the perspective of the MOND isolation criterion: there are more dwarfs for which this works. Indeed, we had great success in predicting velocity dispersions in exactly this fashion in McGaugh & Milgrom (2013a, 2013b). And XXVIII was a case not included above that we highlighted as a great test of MOND, being low mass (~4×10⁵ M_☉) but still qualifying as isolated, and its dispersion came in (6.6^+2.9_-2.1 km/s in one measurement, 4.9 ± 1.6 km/s in another) as predicted a priori (4.3^+0.8_-0.7 km/s). Hopefully the Rubin Observatory will discover many more similar objects that are truly isolated; these will be great additional tests, though one wonders how much more piling-on needs to be done.

^{^}This is an approximation that is reasonable for the small accelerations involved. More generally we have G_eff = G/μ(|g_ex+g_in|/a₀) where μ is the MOND interpolation function and one takes the vector sum of all relevant accelerations.

^#This follows because the boost from MOND is limited by how far into the low acceleration regime an object is in. If the EFE is important, the boost will be less than in the isolated case. As we said in 2013, “the case that reports the lower velocity dispersion is always the formally correct one.” I mention it again here because apparently people are good at scraping equations from papers without reading the associated instructions, so one gets statements like “the theory does not specify precisely when the EFE formula should replace the isolated MOND prediction.” Yes it does. We told you precisely when the EFE formula should replace the isolated formula. It is when it reports the lower velocity dispersion. We also noted this as the reason for not giving σ_efe in the tables in cases it didn’t apply, so there were multiple flags. It took half a dozen coauthors to not read that. I’d hate to see how their Ikea furniture turned out.

^$As often happens with LCDM, there are many nominal predictions. One common theme is that “Despite spanning four decades in luminosity, dSphs appear to inhabit halos of comparable peak circular velocity.” So nominally, one would expect a faint galaxy like Crater 2 to have a similar velocity dispersion to a much brighter one like Fornax, and the luminosity would have practically no power to predict the velocity dispersion, contrary to what we observe in the BTFR.

^%There is the 2-halo term – once you get far enough from the center of a dark matter halo (the 1-halo term), there are other halos out there. These provide additional unseen mass, so can boost the velocity. The EFE in MOND has the opposite effect, and occurs for completely different physical reasons, so they’re not at all the same.

^{^^}For arbitrary reasons of human psychology, the threshold many physicists set for “always happens” is around 100 times. That is, if a phenomenon is repeated 100 times, it is widely presumed to be a general rule. That was the threshold Vera Rubin hit when convincing the community that flat rotation curves were the general rule, not just some peculiar cases. That threshold has also been hit and exceeded by detailed MOND fits to rotation curves, and it seems to be widely accepted that this is the general rule even if many people deny the obvious implications. By now, it is also the case for apparent exceptions to MOND ceasing to be exceptions as the data improve. Unfortunately, people tend to stop listening at what they want to hear (in this case, “falsifies MOND”) and fail to pay attention to further developments.

^%%It is conceivable that the ultrafaint dwarfs might elude tidal disruption in dark matter models if they reside in sufficiently dense dark matter halos. This seems unlikely given the obvious tidal effects on much more massive systems like the Sagittarius dwarf and the Magellanic Clouds, but it could in principle happen. Indeed, if one calculates the mass density from the observed velocity dispersion, one infers that they do reside in dense dark matter halos. In order to do this calculation, we are obliged to assume that the objects are in equilibrium. This is, of course, a form of static thinking: the possibility of tidal stirring that enhances the velocity dispersion above the equilibrium value is excluded by assumption. The assumption of equilibrium is so basic that it is easy to unwittingly engage in circular reasoning. I know, as I did exactly that myself to begin with.

Non-equilibrium dynamics in galaxies that appear to lack dark matter: tidal dwarf galaxies

There are a number of galaxies that have been reported to lack dark matter. This is weird in a universe made of dark matter. It is also weird in MOND, which (if true) is what causes the inference of dark matter. So how can this happen?

In most cases, it doesn’t. These claims not only don’t make sense in either context, they are simply wrong. I don’t want to sound too harsh, as I’ve come close to making the same mistake myself. The root cause of this mistake is often a form of static thinking in dynamic situations that the here and now is always a representative test. The basic assumption we have to make to interpret observed velocities in terms of mass is that systems are in (or close to) gravitational equilibrium so that the kinetic energy is a measure of the gravitational potential. In most places, this is a good assumption, so we tend to forget we even made it.

However, no assumption is ever perfect. For example, Gaia has revealed a wealth of subtle non-equilibrium effects in the Milky Way. These are not so large as to invalidate the basic inference of the mass discrepancy, but neither can they be entirely ignored. Even maintaining the assumption of a symmetric but non-smooth mass profile in equilibrium complicates the analysis.

Since the apparent absence of dark matter is unexpected in either theory, one needs to question the assumptions whenever this inference is made. There is one situation in which it is expected, so let’s consider that special case:

Tidal dwarf galaxies

Most dwarf galaxies are primordial – they are the way they are because they formed that way. However, it is conceivable that some dwarfs may form in the tidal debris of collisions between large galaxies. These are tidal dwarf galaxies (TDGs). Here are some examples of interacting systems containing candidate TDGs:

**Fig. 1** from Lelli et al. (2015): *images of interacting systems with TDG candidates noted in yellow.*

I say candidate TDGs because it is hard to be sure a particular object is indeed tidal in origin. A good argument can be made that TDGs require such special conditions to form that perhaps they should not be able to form at all. As debris in tidal arms is being flung about in the (~ 200 km/s) potential well of a larger system, it is rather challenging for material to condense into a knot with a much smaller potential well (< 50 km/s). It can perhaps happen if the material in the tidal stream is both lumpy (to provide a seed to condense on) and sufficiently comoving (i.e., the tidal shear of the larger system isn’t too great), so maybe it happens on rare occasions. One way to distinguish TDGs from primordial dwarfs is metallicity: typical primordial dwarfs have low metallicity while TDGs have the higher metallicity of the giant system that is the source of the parent material.

A clean test of hypotheses

TDGs provide an interesting test of dark matter and MOND. In the vast majority of dark matter models, dark matter halos are dynamically hot, quasi-spherical systems with the particles that compose the dark matter (whatever it is) on eccentric, randomly oriented orbits that sum to a big, messy blob. Arguably it has to be this way in order to stabilize the disks of spiral galaxies. In contrast, the material that composes the tidal tails in which TDGs form originates in the baryonic material of the dynamically cold spiral disks where orbits are nearly circular in the same direction in the same thin plane. The phase space – the combination of position x,y,z and momentum v_x,v_y,v_z – of disk and halo couldn’t be more different. This means that when two big galaxies collide or have a close interaction, everything gets whacked and the two components go their separate ways. Starting in orderly disks, the stars and gas make long, coherent tidal tails. The dark matter does not. The expectation from these basic phase space considerations is consistent with detailed numerical simulations.

We now have a situation in which the dark matter has been neatly segregated from the luminous matter. Consequently, if TDGs are able to form, they must do it only* with baryonic mass. The ironic prediction of a universe dominated by dark matter is that TDGs should be devoid of dark matter.

In contrast, one cannot “turn off” the force law in MOND. MOND can boost the formation of TDGs in the first place, but if said TDGs wind up in the low acceleration regime, they must evince a mass discrepancy. So the ironic prediction here is that, in ignorance of MOND, MOND means that we would infer that TDGs do have dark matter.

Got that? Dark matter predicts TDGs with no dark matter. MOND predicts TDGs that look like they do have dark matter. That’s not confusing at all.

Clean in principle, messy in practice

Tests of these predictions have a colorful history. Bournaud et al. (2007) did a lovely job of combining simulations with observations of the Seashell system (NGC 5291 above) and came to a striking conclusion: the rotation curves of TDGs exceeded that expected for the baryons alone:

**Fig. 2** from Bournaud et al. (2007) *showing the rotation curves for the three TDGs identified in the image above.*

This was a strange, intermediary result. TDGs had more dark matter than the practically zero expected in LCDM, but less than comparable primordial dwarfs as expected in MOND. That didn’t make sense in either theory. They concluded that there must be a component of some other kind of dark matter that was not the traditional dark halo, but rather part of the spiral disk to begin with, perhaps unseen baryons in the form of very cold molecular gas.

Gentile et al. (2007) reexamined the situation, and concluded that the inclinations could be better constrained. When this was done, the result was more consistent with the prediction of MOND and the baryonic Tully-Fisher relation (BTFR. See their Fig. 2).

**Fig. 1** from Gentile et al. (2007): Rotation curve data (full circles) of the 3 tidal dwarf galaxies (Bournaud et al. 2007). The lower (red) curves are the Newtonian contribution V_bar of the baryons (and its uncertainty, indicated as dotted lines). The upper (black) curves are the MOND prediction and its uncertainty (dotted lines). The top panels have as an implicit assumption (following Bournaud et al.) an inclination angle of 45 degrees. In the middle panels the inclination is a free parameter, and the bottom panels show the fits made with the first estimate for the external field effect (EFE).

Clearly there was room for improvement, both in data quality and quantity. We decided to have a go at it ourselves, ultimately leading to Lelli et al. (2015), which is the source of the pretty image above. We reanalyzed the Seashell system, along with some new TDG candidates.

Making sense of these data is not easy. TDG candidates are embedded in tidal features. It is hard to know where the dwarf ends and the tidal stream begins, or even to be sure there is a clear distinction. Here is an example of the northern knot in the Seashell system:

**Fig. 5** from Lelli et al. (2015): *Top panels*: optical image (*left*), total H I map (*middle*), and H I velocity field (*right*). The dashed ellipse corresponds to the disc model described in Sect. 5.1. The cross and dashed line illustrate the kinematical centre and major axis, respectively. In the bottom-left corner, we show the linear scale (optical image) and the H I beam (total H I map and velocity field) as given in Table 6. In the total H I map, contours are at ~4.5, 9, 13.5, 18, and 22.5 M_⊙ pc^-2. *Bottom panels*: position-velocity diagrams obtained from the observed cube (*left*), model cube (*middle*), and residual cube (*right*) along the major and minor axes. Solid contours range from 2σ to 8σ in steps of 1σ. Dashed contours range from −2σ to −4σ in steps of −1σ. The horizontal and vertical lines correspond to the systemic velocity and dynamical centre, respectively.

Both the distribution of gas and the velocities along the tidal tail often blend smoothly across TDG candidates, making it hard to be sure they have formed a separate system. In the case above, I can see what we think is the velocity field of the TDG alone (contained by the ellipse in the upper right panel), but is that really an independent system that has completely decoupled from the tidal material from which it formed? Definite maybe!

Federico Lelli did amazing work to sort through these difficult-to-interpret data. At the end of the day, he found that there was no need for dark matter in any of these TDG candidates. The amplitude of the apparent circular speed was consistent with the enclosed mass of baryons.

**Figs. 11 and 13** from Lelli et al. (2015): the enclosed dynamical-to-baryonic mass ratio (left) and baryonic Tully-Fisher relation (right). TDGs (red points) are consistent with a mass ratio of unity: the observed baryons suffice; no dark matter is inferred. Contrary to Gentile et al., this manifests as a clear offset from the BTFR followed by normal galaxies.

Taken at face value, this absence of dark matter is a win for a universe made of dark matter and a falsification of MOND.

So we were prepared to say that, and did, but as Federico checked the numbers, it occurred to him to check the timescales. Mergers like this happen over the course of a few hundred million years, maybe a billion. The interactions we observe are ongoing; just how far into the process are they? Have the TDGs had time to settle down into dynamical equilibrium? That is the necessary assumption built into the mass ratio plotted above: the dynamical mass assumes the measured speed is that of a test particle in an equilibrium orbit. But these systems are manifestly not in equilibrium, at least on large scales. Maybe the TDGs have had time to settle down?

We can ask how long it takes to make an orbit at the observed speed, which is low by the standards of such systems (hence their offset from Tully-Fisher). To quote from the conclusions of the paper,

These [TDG] discs, however, have orbital times ranging from ~1 to ~3 Gyr, which are significantly longer than the TDG formation timescales (≲1 Gyr). This raises the question as to whether TDGs have had enough time to reach dynamical equilibrium.
Lelli et al. (2015)

So no, not really. We can’t be sure the velocities are measuring the local potential well as we want them to do. A particle should have had time to go around and around a few times to settle down in a new equilibrium configuration; here they’ve made 1/3, maybe 1/2 half of one orbit. Things have not had time to settle down, so there’s not really a good reason to expect that the dynamical mass calculation is reliable.

It would help to study older TDGs, as these would presumably have had time to settle down. We know of a few candidates, but as systems age, it becomes harder to gauge how likely they are to be legitimate TDGs. When you see a knot in a tidal arm, the odds seem good. If there has been time for the tidal stream to dissipate, it becomes less clear. So if such a thing turns out to need dark matter, is that because it is a TDG doing as MOND predicted, or just a primordial dwarf we mistakenly guessed was a TDG?

We gave one of these previously unexplored TDG candidates to a grad student. After much hard work combining observations from both radio and optical telescopes, she has demonstrated that it isn’t a TDG at all, in either paradigm. The metallicity is low, just as it should be for a primordial dwarf. Apparently it just happens to be projected along a tidal tail where it looks like a decent candidate TDG.

This further illustrates the trials and tribulations we encounter in trying to understand our vast universe.

*One expects cold dark matter halos to have subhalos, so it seems wise to suspect that perhaps TDGs condense onto these. Phase space says otherwise. It is not sufficient for tidal debris to intersect the location of a subhalo, the material must also “dock” in velocity space. Since tidal arms are being flung out at the speed that is characteristic of the giant system, the potential wells of the subhalos are barely speed bumps. They might perturb streams, but the probability of them being the seeds onto which TDGs condense is small: the phase space just doesn’t match up for the same reasons the baryonic and dark components get segregated in the first place. TDGs are one galaxy formation scenario the baryons have to pull off unassisted.

A Blog About the Science and Sociology of Cosmology and Dark Matter