Very thin galaxies

Very thin galaxies

The stability of spiral galaxies was a foundational motivation to invoke dark matter: a thin disk of self-gravitating stars is unstable unless embedded in a dark matter halo. Modified dynamics can also stabilize galactic disks. A related test is provided by how thin such galaxies can be.

Thin galaxies exist

Spiral galaxies seen edge-on are thin. They have a typical thickness – their short-to-long axis ratio – of q ≈ 0.2. Sometimes they’re thicker, sometimes they’re thinner, but this is often what we assume when building mass models of the stellar disk of galaxies that are not seen exactly* edge-on. One can employ more elaborate estimators, but the results are not particularly sensitive to the exact thickness so long as it isn’t the limit of either razor thin (q = 0) or a spherical cow (q = 1).

Sometimes galaxies are very thin. Behold the “superthin” galaxy UGC 7321:

UGC 7321 as seen in optical colors by the Sloan Digital Sky Survey.

It also looks very thin in the infrared, which is the better tracer of stellar mass:

Fig. 1 from Matthews et al (1999): H-band (1.6 micron) image of UGC 7321. Matthews (2000) finds a near-IR axis ratio of 14:1. That’s super thin (q = 0.07)!

UGC 7321 is very thin, would be low surface brightness if seen face-on (Matthews estimates a central B-band surface brightness of 23.4 mag arcsec-2), has no bulge component thickening the central region, and contains roughly as much mass in gas as stars. All of these properties dispose a disk to be fragile (to perturbations like mergers and subhalo crossings) and unstable, yet there it is. There are enough similar examples to build a flat galaxy catalog, so somehow the universe has figured out a way for galaxy disks to remain thin and dynamically cold# for the better part of a Hubble time.

We see spiral galaxies at various inclinations to our line of sight. Some will appear face on, others edge-on, and everything in between. If we observe enough of them, we can work out what the intrinsic distribution is based on the projected version we see.

First, some definitions. A 3D object has three principle axes of lengths a, b, and c. By convention, a is the longest and c the shortest. An oblate model imagines a galaxy like a frisbee: it is perfectly round seen face-on (a = b); seen edge-on q = c/a. More generally, an object can be triaxial, with a ≠ b ≠ c. In this case, a galaxy would not appear perfectly round even when seen perfectly face-on^ because it is intrinsically oval (with similar axis lengths a ≈ b but not exactly equal). I expect this is fairly common among dwarf Irregular galaxies.

The observed and intrinsic distribution of disk thicknesses

Benevides et al. (2025) find that the distribution of observed axis ratios q is pretty flat. This is a consequence of most galaxies being seen at some intermediate viewing angle. One can posit an intrinsic distribution, model what one would see at a bunch of random viewing angles, and iterate to extract the true distribution in nature, which they do:

Figure 6 from Benevides et al. (2025): Comparison between the observed (projected) q distribution and the inferred intrinsic 3D axis ratios for a subsample of dwarfs in the GAMA survey with M=109109.5M. The observed shapes are shown with the solid black line and are used to derive an intrinsic c/a (long-dashed) and b/a (dotted) distribution when projected. Solid color lines in each panel corresponds to the q values obtained from the 3D model after random projections. Note that a wide distribution of q values is generated by a much narrower intrinsic c/a distribution. For example, the blue shaded region in the left panel shows that an observed 5% of galaxies with q<0.2 requires 41% of galaxies to have an intrinsic c/a<0.2 for an oblate model. Similarly, for a triaxal model (right panel, red curve) 43% of galaxies are required to be thinner than c/a=0.2. The additional freedom of ba in the triaxial model helps to obtain a better fit to the projected q distribution, but the changes mostly affect large q values and changes little the c/a frequency derived from highly elongated objects.

That we see some thin galaxies implies that they they have to be common, as most of them are not seen edge-on. For dwarf$ galaxies of a specific mass range, which happens to include UGC 7321, Benevides et al. (2025) infer a lot% of thin galaxies, at least 40% with q < 0.2. They also infer a little bit of triaxiality, a ≈ b.

The existence and numbers of thin dwarfs seems to come as a surprise to many astronomers. This is perhaps driven in part by theoretical expectations for dwarf galaxies to be thick: a low surface brightness disk has little self-gravity to hold stars in a narrow plane. This expectation is so strong that Benevides et al. (2025) feel compelled to provide some observed examples, as if to say look, really:

Figure 8 – images of real galaxies from Benevides et al. (2025): Examples of 10 highly elongated dwarf galaxies with q0.2 and M=107108.5M. They resemble thin edge-on disks and can be found even among the faintest dwarfs in our sample. Legends in each panel quote the stellar mass, the shape parameter q, as well as the GAMA identifier. Objects are sorted by increasing M, left to right.

As an empiricist who has spent a career looking at low mass and low surface brightness galaxies, this does not come as a surprise to me. These galaxies look normal. That’s what the universe of late type dwarf$ galaxies looks like.

Edge-on galaxies in LCDM simulations

Thin galaxies do not occur naturally in the hierarchical mergers of LCDM (e.g., Haslbauer et al. 2022), where one would expect a steady bombardment by merging masses to mess things up. The picture above is not what galaxy-like objects in LCDM simulations look like. Scraping through a few simulations to find the flattest galaxies, Benevides et al. (2025) find only a handful of examples:

Figure 11 – images of simulated galaxies from Benevides et al. (2025): Edge-on projection of examples of the flattest galaxies in the TNG50 simulation, in different bins of stellar mass.

Note that only the four images on the left here occupy the same stellar mass range as the images of reality above. These are as close as it gets. Not terrible, but also not representative&. The fraction of galaxies this thin is a tiny fraction of the simulated population whereas they are quite common in reality. Here the two are compared: three different surveys (solid lines) vs. three different simulations (dashed lines).

Figure 9 from Benevides et al. (2025): Fraction of galaxies that are derived to be intrinsically thinner than c/a0.2 as a function of stellar mass. Thick solid lines correspond to our observational samples while dashed lines are used to display the results of cosmological simulations. Different colors highlight the specific survey or simulation name, as quoted in the legend. In all observational surveys, the frequency of thin galaxies peaks for dwarfs with M109M, almost doubling the frequency observed on the scale of MW-mass galaxies. Thin galaxies do not disappear at lower masses: we infer a significant fraction of dwarf galaxies with M<109M to have c/a<0.2. This is in stark contrast with the negligible production of thin dwarf galaxies in all numerical simulations analyzed here.

Note that the thinnest galaxies in nature are dwarfs of mass comparable to UGC 7321. Thin disks aren’t just for bright spirals like the Milky Way with log(M*) > 10.5. They are also common*$ for dwarfs with log(M*) = 9 and even log(M*) = 8, which are often gas dominated. In contrast, the simulations produce almost no galaxies that are thin at these lower masses.

The simulations simply do not look like reality. Again. And again, etc., etc., ad nauseam. It’s almost as if the old adage applies: garbage in, garbage out. Maybe it’s not the resolution or the implementation of the simulations that’s the problem. One could get all that right, but it wouldn’t matter if the starting assumption of a universe dominated by cold dark matter was the input garbage.

Galaxy thickness in Newton and MOND

Thick disks are not merely a product of simulations, they are endemic to Newtonian dynamics. As stars orbit around and around a galaxy’s center, they also oscillate up and down, bobbing in and out of the plane. How far up they get depends on how fast they’re going (the dynamical temperature of the stellar population) and how strong the restoring force to the plane of the disk is.

In the traditional picture of a thin spiral galaxy embedded in a quasi-spherical dark matter halo, the restoring force is provided by the stars in the disk. The dark matter halo is there to boost the radial force to make the rotation curve flat, and to stabilize the disk, for which it needs to be approximately spherical. The dark matter halo does not contribute much to the vertical restoring force because it adds little mass near the disk plane. In order to do that, the halo would have to be very squashed (small q) like the disk, in which case we revive the stability problem the halo was put there to solve.

This is why we expect low surface brightness disks to be thick. Their stars are spread thin, the surface mass density is low, so the restoring force to the disk should be small. Disks as thin as UGC 7321 shouldn’t be possible unless they are extremely cold*# dynamically – a situation that is unlikely to persist in a cosmogony built by hierarchical merging. The simulations discussed above corroborate this expectation.

In MOND, there is no dark matter halo, but the modified force should boost the vertical restoring force as well as the radial force. One thus expects thinner disks in MOND than in Newton.

I pointed this out in McGaugh & de Blok (1998) along with pretty much everything else in the universe that people tell me I should consider without bothering to check if I’ve already considered. Here is the plot I published at the time:

Figure 9 of McGaugh & de Blok (1998): Thickness q = z0/h expected for disks of various central surface densities σ0. Shown along the top axis is the equivalent B-band central surface brightness μ0 for ϒ* = 2. Parameters chosen for illustration are noted in the figure (a typical scale length h and two choices of central vertical velocity dispersion ςz). Other plausible values give similar results. The solid lines are the Newtonian expectation and the dashed lines that of MOND. The Newtonian and MOND cases are similar at high surface densities but differ enormously at low surface densities. Newtonian disks become very thick at low surface brightness. In contrast, MOND disks can remain reasonably thin to low surface density.

There are many approximations that have to be made in constructing the figure above. I assumed disks were plane-parallel slabs of constant velocity dispersion, which they are not. But this suffices to illustrate the basic point, that disks should remain thinner&% in MOND than in Newton as surface density decreases: as one sinks further into the MOND regime, there is relatively more restoring force keep disks thin. To duplicate this effect in Newton, one must invent two kinds of dark matter: a dissipational kind of dark matter that forms a dark matter disk in addition to the usual dissipationless cold dark matter that makes a quasi-spherical dark matter halo.

The idea of the plot above was to illustrate the trend of expected thickness for galaxies of different central surface brightness. One can also build a model to illustrate the expected thickness as a function of radius for a pair of galaxies, one high surface brightness (so it starts in the Newtonian regime at small radii) and one of low surface brightness (in the MOND regime everywhere). I have chosen numbers** resembling the Milky Way for the high surface brightness galaxy model, and scaled the velocity dispersion of the low surface brightness model so it has very nearly the same thickness in the Newtonian regime. In MOND, both disks remain thin as a function of radius (they flare a lot in Newton) and the lower surface brightness disk model is thinner thanks to the relatively stronger restoring force that follows from being deeper in the MOND regime.

The thickness of two model disks, one high surface brightness (solid lines) and the other low surface brightness (dashed lines), as a function of radius. The two are similar in Newton (black), but differ in MOND (blue). The restoring force to the disk is stronger in MOND, so there is less flaring with increasing radius. The low surface brightness galaxy is further in the MOND regime, leading naturally to a thinner disk.

These are not realistic disk models, but they again suffice to illustrate the point: thin disks occur naturally in MOND. Low surface brightness disks should be thick in LCDM (and in Newtonian dynamics in general), but can be as thin as UGC 7321 in MOND. I didn’t aim to make q ≈ 0.1 in the model low surface brightness disk; it just came out that way for numbers chosen to be reasonable representations of the genre.

What the distribution of thicknesses is depends on the accretion and heating history of each individual disk. I don’t claim to understand that. But the mere existence of dwarf galaxies with thin disks is a natural outcome in MOND that we once again struggle to comprehend in terms of dark matter.


*Seeing a galaxy highly inclined minimizes the inclination correction to the kinematic observations [Vrot = Vobs/sin(i)] but to build a mass model we also need to know the face-on surface density profile of the stars, the correction for which depends on 1/cos(i). So as a practical matter, the competition between sin(i) and cos(i) makes it difficult to analyze galaxies at either extreme.

#Dynamically cold means the random motions (quantified by the velocity dispersion of stars σ) are small compared to ordered rotation (V) in the disk, something like V/σ ≈ 10. As a disk heats (higher σ) it thickens, as some of that random motion goes in the vertical direction perpendicular to the disk. Mergers heat disks because they bring kinetic energy in from random directions. Even after an object is absorbed, the splash it made is preserved in the vertical distribution of the stars which, once displaced, never settle back into a thin disk. (Gas can settle through dissipation, but point masses like stars cannot.)

^Oval distortions are a major source of systematic error in galaxy inclination estimates, especially for dwarf Irregulars. It is an asymmetric error: a galaxy with a mild oval distortion can be inferred to have an inclination (i > 0) even when seen face-on (i = 0), but it can never have an inclination more face-on (i < 0) than exactly face-on. This is one of the common drivers of claims that low mass galaxies fall off the Tully-Fisher relation. (Other common problems include a failure to account for gas mass, bad distance estimates, or not measuring Vflat.)

$In a field with abominable terminology, what is meant by a “dwarf” galaxy is one of the worst offenders. One of my first conference contributions thirty years ago griped about the [mis]use of this term, and matters have not improved. For this particular figure, Benevides et al. (2025) define it to mean galaxies with stellar masses in the range 9 < log(M*) < 9.5, which seems big to me, but at least it is below the mass of a typical L* spiral, which has log(M*) ~ 10.5. For comparison, see Fig. 6 of the review of Bullock & Boylan-Kolchin (2017), who define “bright dwarfs” to have 7 < log(M*) < 9, and go lower from there, but not higher into the regime that we’re calling dwarf right now. So what a dwarf galaxy is depends on context.

%Note that the intrinsic distribution peaks below q = 0.2, so arguably one should perhaps adopt as typical the mode of the distribution (q ≈ 0.17).

&Another way in which even the thin simulated objects are not representative of reality is that they are dynamically hot, as indicated by the κrot parameter printed with the image. This is the fraction of kinetic energy in rotation. One of the more favorable cases with κrot = 0.67 corresponds to V/σ = 2.5. That happens in reality, but higher values are common. Of course, thin disks and dynamical coldness go hand in hand. Since the simulations involve a lot of mergers, the fraction of kinetic energy in rotation is naturally small. So I’m not saying the simulations are wrong in what they predict given the input physics that they assume, but I am saying that this prediction does not match reality.

*$The fraction of thin galaxies observed by DESI is slightly higher than found in the other surveys. Having looked at all these data, I am inclined to suspect the culprit is image quality: that of DESI is better. Regardless of the culprit for this small discrepancy between surveys, thin disks are much more common in reality than in the current generation of simulations.

*#There seems to be a limit to how cold disks get, with a minimum velocity dispersion around ~7 km/s observed in face-on dwarfs when the appropriate number, according to Newton, would be more like 2 km/s, tops. I remember this number from observations in the ’80s and ’90s, along with lots of discussion then to the effect of how can it be so? but it is the new year and I’m feeling too lazy to hunt down all the citations so you get a meme instead.


&%In an absolute sense, all other things being equal, which they’re not, disks do become thicker to lower surface brightness in both Newton and MOND. There is less restoring force for less surface mass density. It is the relative decline in restoring force and consequent thickening of the disk that is much more precipitous in Newton.

**For the numerically curious, these models are exponential disks with surface density profiles Σ(R) = Σ0 e-R/Rd. Both models have a scale length Rd = 3 kpc. The HSB has Σ0 = 866 M pc-2; this is a good match to the Eilers et al. (2019) Milky Way disk; see McGaugh (2019). The LSB has Σ0 = 100 M pc-2, which corresponds roughly to what I consider the boundary of low surface brightness, a central B-band surface brightness of ~23 mag. arcsec-2. For the velocity dispersion profile I also assume an exponential with scale length 2Rd (that’s what supposed to happen). The central velocity dispersion of the HSB is 100 km/s (an educated guess that gets us in the right ballpark) and that of the LSB is 33 km/s – the mass is down by a factor of ~9 so the velocity dispersion should be lower by a factor of 9\sqrt{9}. (I let it be inexact so the solid and dashed Newtonian lines wouldn’t exactly overlap.)

These models are crude, being single-population (there can be multiple stellar populations each with their own velocity dispersion and vertical scale height) and lacking both a bulge and gas. The velocity dispersion profile sometimes falls with a scale length twice the disk scale length as expected, sometimes not. In the Milky Way, Rd ≈ 2.5 or 3 kpc, but the velocity dispersion falls off with a scale length that is not 5 or 6 kpc but rather 21 or 25 kpc. I have also seen the velocity dispersion profile flatten out rather than continue to fall with radius. That might itself be a hint of MOND, but there are lots of different aspects of the problem to consider.

Non-equilibrium dynamics in galaxies that appear to have lots of dark matter: ultrafaint dwarfs

Non-equilibrium dynamics in galaxies that appear to have lots of dark matter: ultrafaint dwarfs

This is a long post. It started focused on ultrafaint dwarfs, but can’t avoid more general issues. In order to diagnose non-equilibrium effects, we have to have some expectation for what equilibrium would be. The Tully-Fisher relation is a useful empirical touchstone for that. How the Tully-Fisher relation comes about is itself theory-dependent. These issues are intertwined, so in addition to discussing the ultrafaints, I also review some of the many predictions for Tully-Fisher, and how our theoretical expectation for it has evolved (or not) over time.

In the last post, we discussed how non-equilibrium dynamics might make a galaxy look like it had less dark matter than similar galaxies. That pendulum swings both ways: sometimes non-equilibrium effects might stir up the velocity dispersion above what it would nominally be. Some galaxies where this might be relevant are the so-called ultrafaint dwarfs (not to be confused with ultradiffuse galaxies, which are themselves often dwarfs). I’ve talked about these before, but more keep being discovered, so an update seems timely.

Galaxies and ultrafaint dwarfs

It’s a big universe, so there’s a lot of awkward terminology, and the definition of an ultrafaint dwarf is somewhat debatable. Most often I see them defined as having an absolute magnitude limit MV > -8, which corresponds to a luminosity less than 100,000 suns. I’ve also seen attempts at something more physical, like being a “fossil” whose star formation was entirely before cosmic reionization, which ended way back at z ~ 6 so all the stars would be at least*&^# 12.5 Gyr old. While such physics-based definitions are appealing, these are often tied up with theoretical projection: the UV photons that reionized the universe should have evaporated the gas in small dark matter halos, so these tiny galaxies can only be fossils from before that time. This thinking pervades much of the literature despite it being obviously wrong, as counterexamples! exist. For example, Leo P is practically an ultrafaint dwarf by luminosity, but has ample gas (so a larger baryonic mass) and is currently forming stars.

A luminosity-based definition is good enough for us here; I don’t really care exactly where we make the cut. Note that ultrafaint is an appropriate moniker: a luminosity of 105 L is tiny by galaxy standards. This is a low-grade globular cluster, and some ultrafaints are only a few hundred solar luminosities, which is barely even# a star cluster. At this level, one has to worry about stochastic effects in stellar evolution. If there are only a handful of stars, the luminosity of the entire system changes markedly as a single star evolves up the red giant branch. Consequently, our mapping from observed quantities to stellar mass is extremely dodgy. For consistency, to compare with brighter dwarfs, I’ve adopted the same boilerplate M*/LV = 2 M/L. That makes for a fair comparison luminosity-to-luminosity, but the uncertainty in the actual stellar mass is ginormous.

It gets worse, as the ultrafaints that we know about so far are all very nearby satellites of the Milky Way. They are not discovered in the same way as other galaxies, where one plainly sees a galaxy on survey plates. For example, NGC 7757:

A faint galaxy in the night sky, surrounded by numerous distant star-like points.
The spiral galaxy NGC 7757 as seen on plates of the Palomar Sky Survey.

While bright, high surface brightness galaxies like NGC 7757 are easy to see, lower surface brightness galaxies are not. However, they can usually still be seen, if you know where to look:

A faint galaxy amidst numerous distant stars in a dark sky, illustrating the challenges of observing low surface brightness galaxies.
UGC 1230 as seen on the Palomar Sky Survey. It’s in the middle.

I like to use this pair as an illustration, as they’re about the same distance from us and about the same angular size on the sky – at least, once you crank up the gain for the low surface brightness UGC 1230:

Comparison of two astronomical images: the left side shows a spiral galaxy with visible structure and brightness, while the right side features a lower surface brightness galaxy, appearing more diffuse and less distinct.
Zoom in on deep CCD images of NGC 7757 (left) and UGC 1230 (right) with the contrast of the latter enhanced. The chief difference between the two is surface brightness – how spread out their stars are. They have a comparable physical diameter, they both have star forming regions that appear as knots in their spiral arms, etc. These galaxies are clearly distinct from the emptiness of the cosmic void around them, being examples of giant stellar systems that gave rise to the term “island universe.”

In contrast to objects that are obvious on the sky as independent island universes, ultrafaint dwarfs are often invisible to the eye. They are recognized as a subset of stars near each other on the sky that also share the same distance and direction of motion in a field that might otherwise be crowded with miscellaneous, unrelated stars. For example, here is Leo IV:

Wide field image of the Ultra-Faint Dwarf Galaxy Leo IV, featuring a zoomed-in view of its faint structure surrounded by numerous background stars and galaxies.
The ultrafaint dwarf Leo IV as identified by the Sloan Digital Sky Survey and the Hubble Space Telescope.

See it?

I don’t. I do see a number of background galaxies, including an edge-on spiral near the center of the square. Those are not the ultrafaint dwarf, which is some subset of the stars in this image. To decide which ones are potentially a part of such a dwarf, one examines the color magnitude diagram of all the stars to identify those that are consistent with being at the same distance, and assigns membership in a probabilistic way. It helps if one can also obtain radial velocities and/or proper motions for the stars to see which hang together – more or less – in phase space.

Part of the trick here is deciding what counts as hanging together. A strong argument in favor of these things residing in dark matter halos is that the velocity differences between the apparently-associated stars are too great for them to remain together for any length of time otherwise. This is essentially the same situation that confronted Zwicky in his observations of galaxies in clusters in the 1930s. Here are these objects that appear together in the sky, but they should fly apart unless bound together by some additional, unseen force. But perhaps some of these ultrafaints are not hanging together; they may be in the process of coming apart. Indeed, they may have so few stars because they are well down the path of dissolution.

Since one cannot see an ultrafaint dwarf in the same way as an island universe, I’ve heard people suggest that being bound by a dark matter halo be included in the definition of a galaxy. I see where they’re coming from, but find it unworkable. I know a galaxy when I see one. As did Hubble, as did thousands of other observers since, as can you when you look at the pictures above. It is absurd to make the definition of an object that is readily identifiable by visual inspection be contingent on the inferred presence of invisible stuff.

So are ultrafaints even galaxies? Yes and no. Some of the probabilistic identifications may be mere coincidences, not real objects. However, they can’t all be fakes, and I think that if you put them in the middle of intergalactic space, we would recognize them as galaxies – provided we could detect them at all. At present we can’t, but hopefully that situation will improve with the Rubin Observatory. In the meantime, what we have to work with are these fragmentary systems deep in the potential well of the seventy billion solar mass cosmic gorilla that is the Milky Way. We have to be cognizant that they might have gotten knocked around, as we can see in more massive systems like the Sagittarius dwarf. Of course, if they’ve gotten knocked around too much, then they shouldn’t be there at all. So how do these systems evolve under the influence of a comic gorilla?

Let’s start by looking at the size-mass diagram, as we did before. Ultrafaint dwarfs extend this relation to much lower mass, and also to rather small sizes – some approaching those of star clusters. They approximately follow a line of constant surface density, ~0.1 M pc-2 (dotted line)..

A graph illustrating the size-mass relationship of galaxies, plotting effective radius (Re) against stellar mass (M*). Black squares represent data points of larger galaxies, while green squares indicate ultrafaint dwarfs. The dotted line suggests a correlation between size and mass.
The size and stellar mass of Local Group dwarfs as discussed previously, with the addition of ultrafaint dwarfs$ (small gray squares).

This looks weird to me. All other types of galaxies scatter all over the place in this diagram. The ultrafaints are unique in following a tight line in the size-mass plane, and one that follows a line of constant surface brightness. Every element of my observational experience screams that this is likely to be an artifact. Given how these “galaxies” are identified as the loose association of a handful of stars, it is easy to imagine that this trend might be an artifact of how we define the characteristic size of a system that is essentially invisible. It might also arise for physical reasons to do with the cosmic gorilla; i.e., it is a consequence of dynamical evolution. So maybe this correlation is real, but the warning lights that it is not are flashing red.

The Baryonic Tully-Fisher relation as a baseline

Ideally, we would measure accelerations to test theories, particularly MOND. Here, we would need to use the size to estimate the acceleration, but I straight up don’t believe these sizes are physically meaningful. The stellar mass, dodgy as it is, seems robust by comparison. So we’ll proceed as if we know that much – which we don’t, really – but let’s at least try.

With the stellar mass (there is no gas in these things), we are halfway to constructing the baryonic Tully-Fisher relation (BTFR), which is the simplest test of the dynamics that we can make with the available data. The other quantity we need is the characteristic circular speed of the gravitational potential. For rotating galaxies, that is the flat rotation speed, Vf. For pressure supported dwarfs, what is usually measured is the velocity dispersion σ. We’ve previously established that for brighter dwarfs in the Local Group, a decent approximation is Vf = 2σ, so we’ll start by assuming that this should apply to the ultrafaints as well. This allows us to plot the BTFR:

A scatter plot showing the relationship between velocity (Vf in km/s) and baryonic mass (Mb in solar masses), with data points represented by different shapes and colors for various galaxy types.
The baryonic mass and characteristic circular speeds of both rotationally supported galaxies (circles) and pressure supported dwarfs (squares). The colored points follow the same baryonic Tully-Fisher relation (BTFR), but the data for low mass ultrafaint dwarfs (gray squares) flattens out, having nearly the same characteristic speed over several decades in mass.

The BTFR is an emprical relation of the form Vf ~ Mb1/4 over about six decades in mass. Somewhere around the ultrafaint scale, this no longer appears to hold, with the observed velocity flattening out to become approximately constant for these lowest mass galaxies. I’m not sure this is real, as there many practical caveats to interpreting the observations. Measuring stellar velocities is straightforward but demanding at this level of accuracy. There are many potential systematics, pretty much all of which cause the intrinsic velocity dispersion to be overestimated. For example, observations made with multislit masks tend to return larger dispersions than observations of the same object with fibers. That’s likely because it is hard to build a mask so well that all of the stars perfectly hit the centers of the slitlets assigned to them; offsets within the slit shift the spectrum in a way that artificially adds to the apparent velocity dispersion. Fibers are less efficient in their throughput, but have the virtue of blending the input light in a way that precludes this particular systematic. Another concern is physical – some of the stars that are observed are presumably binaries, and some of the velocity will be due to motion within the binary pair and nothing to do with the gravitational potential of the larger system. This can be addressed with repeated observations to see if some velocities change, but it is hard to do that for each and every system, especially when it is way more fun to discover and explore new systems than follow up on the same one over and over and over again.

There are lots of other things that can go wrong. At some level, some of them probably do – that’s the nature of observational astronomy&. While it seems likely that some of the velocity dispersions are systematically overestimated, it seems unlikely that all of them are. Let’s proceed as if the bulk of the data is telling us something, even if we treat individual objects with suspicion.

MOND

MOND makes a clear prediction for the BTFR of isolated galaxies: the baryonic mass goes as the fourth power of the flat rotation speed. Contrary to Newtonian expectation, this holds irrespective of surface brightness, which is what attracted my attention to the theory in the first place. So how does it do here?

A graph depicting the relationship between the flat rotation speed (Vf in km/s) and the baryonic mass (Mb in solar masses), showing data points for various galaxies, including ultrafaint dwarfs highlighted with unique markers.
The same data as above with the addition of the line predicted by MOND (Milgrom 1983).

Low surface density means low acceleration, so low surface brightness galaxies would make great tests of MOND if they were isolated. Oh, right – they already did. Repeatedly. MOND also correctly predicted the velocities of low mass, gas-rich dwarfs that were unknown when the prediction was made. These are highly nontrivial successes of the theory.

The ultrafaints we’re discussing here are not isolated, so they do not provide the clean tests that isolated galaxies provide. However, galaxies subject to external fields should have low velocities relative to the BTFR, while the ultrafaints have higher velocities. They’re on the wrong side of the relation! Taking this at face value (i.e., assuming equilibrium), MOND fails here.

Whenever MOND has a problem, it is widely seen as a success of dark matter. In my experience, this is rarely true: observations that are problematic for MOND usually don’t make sense in terms of dark matter either. For each observational test we also have to check how LCDM fares.

LCDM

How LCDM fares is often hard to judge because its predictions for the same phenomena are not always clear. Different people predict different things for the same theory. There have been lots of LCDM-based predictions made for both dwarf satellite galaxies and the Tully-Fisher relation. Too many, in fact – it is a practical impossibility to examine them all. Nevertheless, some common themes emerge if we look at enough examples.

The halo mass-velocity relation

The most basic prediction of LCDM is that the mass of a dark matter halo scales with the cube of the circular velocity of a test particle at the virial radius (conventionally taken to be the radius R200 that encompasses an average density 200 times the critical density of the universe. If that sounds like gobbledygook to you, just read “halo” for “200”): M200 ~ V2003. This is a very basic prediction that everyone seems to agree to.

There is a tiny problem with testing this prediction: it refers to the dark matter halo that we cannot see. In order to test it, we have to introduce some scaling factors to relate the dark to the light. Specifically, Mb = fd M200 and Vf = fv V200, where fd is the observed fraction of mass in baryons and fv relates the observed flat velocity to the circular speed of our notional test particle at the virial radius. The obvious assumptions to make are that fd is a constant (perhaps as much as but not more than the cosmic baryon fraction of 16%) and fv is close to untiy. The latter requirement stems from the need for dark matter to explain the amplitude of the flat rotation speed, but fv could be slightly different; plausible values range from 0.9 < fv < 1.4. Values large than one indicate a rotation curve that declines before the virial radius is reached, which is the natural expectation for NFW halos.

Here is a worked example with fd = 0.025 and fv = 1:

A graph depicting the relationship between the flat rotation speed (Vf) in kilometers per second and the baryonic mass (Mb) in solar masses. The data points are shown with various markers, including gray squares, green squares, and blue circles, each representing different galaxy types, along with error bars. A solid gray line indicates a trend, while a dotted line marks a theoretical lower bound.
The same data as above with the addition of the nominal prediction of LCDM. The dotted line is the halo mass-circular velocity relation; the gray band is a simple model with fd = 0.025 and fv = 1 (e.g., Mo, Mao, & White 1998).

I have illustrated the model with a fat grey line because fd = 0.025 is an arbitrary choice* I made to match the data. It could be more, it could be less. The detected baryon fraction can be anythings up to or less than the cosmic value, fd < fb = 0.16 as not all of the baryons available in a halo cool and condense into cold gas that forms visible stars. That’s fine; there’s no requirement that all of the baryons have to become readily observable, but there is also no reason to expect all halos to cool exactly the same fraction of baryons. Naively one would expect at least some variation in fd from halo to halo, so there could and probably should be a lot of scatter: the gray line could easily be a much wider band than depicted.

In addition to the rather arbitrary value of fd, this reasoning also predicts a Tully-Fisher relation with the wrong slope. Picking a favorable value of fd only matches the data over a narrow range of mass. It was nevertheless embraced for many years by many people. Selection effects bias samples to bright galaxies. Consequently, the literature is rife with TF samples dominated by galaxies with Mb > 1010 M (the top right corner of the plot above); with so little dynamic range, a slope of 3 looks fine. Once you look outside that tiny box, it does not look fine.

Personally, I think a slope of 3 is an oversimplification. That is the prediction for dark matter halos; there can be effects that vary systematically with mass. An obvious one is adiabatic compression, the effect by which baryons drag some dark matter along with them as they settle to the center of their halos. This increases fv by an amount that depends on the baryonic surface density. Surface density correlates with mass, so I would nominally expect higher velocities in brighter galaxies; this drives up the slope. There are various estimates of this effect; typically one gets a slope like 3.3, not the observed 4. Worse, it predicts an additional effect: at a given mass, galaxies of higher surface brightness should also have higher velocity. Surface brightness should be a second parameter in the Tully-Fisher relation, but this is not observed.

The easiest way to reconcile the predicted and observed slopes are to make fd a function of mass. Since Mb = fd M200 and M200 ~ V2003, Mb ~ fd V2003. Adopting fv = 1 for simplicity, Mb ~ Vf4 follows if fd ~ Vf. Problem solved, QED.

There are [at least] two problems with this argument. One is that the scaling fd ~ Vf must hold perfectly without introducing any scatter. This is a fine-tuning problem: we need one parameter to vary precisely with an another, unrelated parameter. There is no good reason to expect this; we just have to insert the required dependence by hand. This is much worse than choosing an arbitrary value for fd: now we’re making it a rolling fudge factor to match whatever we need it to. We can make it even more complicated by invoking some additional variation in fv, but this just makes the fine-tuning worse as the product fdfv-3 has to vary just so. Another problem is that what we’re doing all this to adjust the prediction of one theory (LCDM) to match that of a different theory (MOND). It is never a good sign when we have to do that, whether we admit it or not.

Abundance matching

The reasoning leading to a slope 3 Tully-Fisher relation assumes a one-to-one relation between baryonic and halo mass (fd = constant). This is an eminently reasonable assumption. We spent a couple of decades trying to avoid having to break this assumption. Once we do so and make fd a freely variable parameter, then it can become a rolling fudge factor that can be adjusted to fit anything. Everyone agrees that is Bad. However, it might be tolerable if there is an independent way of estimating this variation. Rather than make fd just be what we need it to be as described above, we can instead estimate it with abundance matching.

Abundance matching comes from equating the observed number density of galaxies as a function of mass with the number density of dark matter halos. This process gives fd, or at least the stellar fraction, f*, which is close to fd for bright galaxies. Critically, it provides a way to assign dark matter halo masses to galaxies independently of their kinematics. This replaces an arbitrary, rolling fudge factor with a predictive theory.

Abundance matching models generically introduce curvature into the prediction for the BTFR. This stems from the mismatch in the shape of the galaxy stellar mass function (a Schechter function) and the dark halo mass function (a power law on galaxy scales). This leads to a bend in relations that map between visible and dark mass.

The transition from the M ~ V3 reasoning to abundance matching occurred gradually, but became pronounced circa 2010. There are many abundance matching models; I already faced the problem of the multiplicity of LCDM predictions when I wrote a lengthy article on the BTFR in 2012. To get specific, let’s start with an example from then, the model of Trujillo-Gomez-et al. (2011):

Scatter plot showing the relationship between gravitational potential flat rotation speed (Vf in km/s) and baryonic mass (Mb in solar masses). The plot features varying data points marked with blue circles, green squares, and gray squares, indicating different galaxy types or observational methods. A red curve is drawn, illustrating an empirical relationship fitting the data.
The same data as above with the addition of the line predicted by LCDM in the model of Trujillo-Gomez-et al. (2011).

One thing Trujillo-Gomez-et al. (2011) say in their abstract is “The data present a clear monotonic LV relation from ∼50 km s−1 to ∼500 km s−1, with a bend below ∼80 km s−1“. By LV they mean luminosity-velocity, i.e., the regular Tully-Fisher relation. The bend they note is real; that’s what happens when you consider only the starlight and ignore the gas. The bend goes away if you include that gas. This was already known at the time – our original BTFR paper from 2000 has nearly a thousand citations, so it isn’t exactly obscure. Ignoring the gas is a choice that makes no sense empirically but makes a lot of sense from the perspective of LCDM simulations. By 2010, these had become reasonably good at matching the numbers of stars observed in galaxies, but the gas properties of simulated galaxies remained, hmmmmmmm, wanting. It makes sense to utilize the part that works. It makes less sense to pretend that this bend is something physically meaningful rather than an artifact of ignoring the gas. The pressure-supported dwarfs are all star dominated, so this distinction doesn’t matter here, and they follow the BTFR, not the stars-only version.

An old problem in galaxy formation theory is how to calibrate the number density of dark matter halos to that of observed galaxies. For a long time, a choice that people made was to match either the luminosity function or the kinematics. These didn’t really match up, so there was occasional discussion of the virtues and vices of the “luminosity function calibration” vs. the “Tully-Fisher calibration.” These differed by a factor of ~2. This tension between remains with us. Mostly simulations have opted to adopt the luminosity function calibration, updated and rebranded as abundance matching. Again, this makes sense from the perspective of LCDM simulations, because the number density of dark matter halos is something that simulations can readily quantify while the kinematics of individual galaxies are much harder to resolve**.

The nonlinear relation between stellar mass and halo mass obtained from abundance matching inevitably introduces curvature into the corresponding Tully-Fisher relation predicted by such models. That’s what you see in the curved line of Trujillo-Gomez-et al. (2011) above. They weren’t the first to obtain such a result, and the certainly weren’t the last: this is a feature of LCDM with abundance matching, not a bug.

The line of Trujillo-Gomez-et al. (2011) matches the data pretty well at intermediate masses. It diverges to higher velocities at both small and large galaxy masses. I’ve written about this tension at high masses before; it appears to be real, but let’s concentrate on low masses here. At low masses, the velocity of galaxies with Mb < 108 M appears to be overestimated. But the divergence between model and reality has just begun, and it is hard to resolve small things in simulations, so this doesn’t seem too bad. Yet.

Moving ahead, there are the “Latte” simulations of Wetzel et al. (2016) that use the well-regarded FIRE code to look specifically at simulated dwarfs, both isolated and satellites – specifically satellites of Milky Way-like systems. (Milky Way. Latte. Get it? Nerd humor.) So what does that find?

A graph displaying the relationship between circular velocity (Vf in km/s) and baryonic mass (Mb in solar masses), featuring various data points distinguished by shape and color, including gray squares, green squares, orange triangles, and blue circles to represent different types of galaxies.
The same data as above with the addition of simulated dwarfs (orange triangles) from the Latte LCDM simulation of Wetzel et al. (2016), specifically the simulated satellites in the top panel of their Fig. 3. Note that we plot Vf = 2σ for pressure supported systems, both real and simulated.

The individual simulated dwarf satellites of Wetzel et al. (2016) follow the extrapolation of the line predicted by Trujillo-Gomez-et al. (2011). To first order, it is the same result to higher resolution (i.e., smaller galaxy mass). Most of the simulated objects have velocity dispersions that are higher than observed in real galaxies. Intriguingly, there are a couple of simulated objects with M* ~ 5 x 106 M that fall nicely among the data where there are both star-dominated and gas-rich galaxies. However, these two are exceptions; the rule appears to be characteristic speeds that are higher than observed.

The lowest mass simulated satellite objects begin to approach the ultrafaint regime, but resolution continues to be an issue: they’re not really there yet. This hasn’t precluded many people from assuming that dark matter will work where MOND fails, which seems like a heck of a presumption given that MOND has been consistently more successful up until that point. Where MOND underpredicts the characteristic velocity of ultrafaints, LCDM hasn’t yet made a clear prediction, and it overpredicts velocities for objects of slightly larger mass. Ain’t no theory covering itself in glory here, but this is a good example where objects that are a problem for MOND are also a problem for dark matter, and it seems likely that non-equilibrium dynamics play a role in either case.

Comparing apples with apples

A persistent issue with comparing simulations to reality is extracting comparable measures. Where circular velocities are measured from velocity fields in rotating galaxies and estimated from measured velocity dispersions in pressure supported galaxies, the most common approach to deriving rotation curves from simulated objects is to sum up particles in spherical shells and assume V2 = GM/R. These are not the same quantities. They should be proxies for one another, but equality holds only in the limit of isotropic orbits in spherical symmetry. Reality is messier than that, and simulations aren’t that simple either%.

Sales et al. (2017) make the effort to make a better comparison between what is observed given how it is observed, and what the simulations would show for that quantity. Others have made a similar effort; a common finding is that the apparent rotation speeds of simulated gas disks do not trace the gravitational potential as simply as GM/R. That’s no surprise, but most simulated rotation curves do not look like those of real galaxies^, so the comparison is not straightforward. Those caveats aside, Sales et al. (2017) are doing the right thing in trying to make an apples-to-apples comparison between simulated and observed quantities. They extract from simulations a quantity Vout that is appropriate for comparison with what we observe in the outer parts of rotation curves. So here is the resulting prediction for the BTFR:

A graph plotting the baryonic mass (Mb in solar masses) against the characteristic flat rotation speed (Vf in km/s) for various galaxies, showing a curve that describes the baryonic Tully-Fisher relation. The scatter points include different types of galaxies, with green squares indicating specific categories.
The same data as above with the addition of the line predicted by LCDM in the model of Sales et al. (2017), specifically the formula for Vout in their Table 2 which is their proxy for the observable rotation speed.

That’s pretty good. It still misses at high masses (those two big blue points at the top are Andromeda and the Milky Way) and it still bends away from the data at low masses where there are both star-dominated and gas-rich galaxies. (There are a lot more examples of the latter that I haven’t used here because the plot gets overcrowded.) Despite the overshoot, the use of an observable aspect of the simulations gets closer to the data, and the prediction flattens out in the same qualitative sense. That’s good, so one might see cause for hope that this problem is simply a matter of making a fair comparison between simulations and data. We should also be careful not to over-interpret it: I’ve simply plotted the formula they give; the simulations to which they fit it surely do not resolve ultrafaint dwarfs, so really the line should stop at some appropriate mass scale.

Nevertheless, it makes sense to look more closely at what is observed vs. what is simulated. This has recently been done in greater detail by Ruan et al. (2025). They consider two simulations that implement rather different feedback; both wind up producing rotating, gas rich dwarfs that actually fall on the BTFR.

Scatter plot illustrating the baryonic Tully-Fisher relation, showing the relationship between characteristic circular velocity (Vf) and baryonic mass (Mb) for various galaxy types, including data points for ultrafaint dwarfs.
The same data as above with the addition of simulated dwarfs of Ruan et al. (2025), specifically from the top right panel of their Fig. 6. The orange circles are their “massives” and the red triangles the “marvels” (the distinction refers to different feedback models).

Finally some success after all these years! Looking at this, it is tempting to declare victory: problem solved. It was just a matter of doing the right simulation all along, and making an apples-to-apples comparison with the data.

That sounds too goo to be true. Is it repeatable in other simulations? What works now that didn’t before?

These are high resolution simulations, but they still don’t resolve ultrafaints. We’re talking here about gas-rich dwarfs. That’s also an important topic, so let’s look more closely. What works now is in the apples-to-apples assessment: what we would measure for Vout is less than Vmax (related to V200) of the halo:

A graph displaying two panels: the top panel shows the relation between the ratio of mid-outward velocity to maximum velocity (Vout, mid / Vmax, mid) and the logarithm of baryonic mass (Mbar), with data points represented as circles and triangles. The bottom panel illustrates the relationship between the ratio of outer radius to maximum radius (Rout, mid / Rmax, mid) and the logarithm of baryonic mass, also featuring similar data points.
Two panels from Fig. 7 of Ruan et al. (2025) showing the ratio of the velocity we might observe relative to the characteristic circular velocity of the halo (top) and the ratio of the radii where these occur (bottom).

The treatment of cold gas in simulations has improved. In these simulations, Vout(Rout) is measured where the gas surface density falls to 1 M pc-2, which is typical of many observations. But the true rotation curve is still rising for objects with Mb < a few x 108 M; it has not yet reached a value that is characteristic of the halo. So the apparent velocity is low, even if the dark matter halos are doing basically the same thing as before:

Graph showing the baryonic Tully-Fisher relation, with velocity Vf (km/s) plotted against baryonic mass Mb (solar masses). Data points include various galaxies and dwarf galaxies, with error bars indicating measurement uncertainties. A red line represents the best-fit relation.
As above, but with the addition of the true Vmax (small black dots) of the simulated halos discussed by Ruan et al. (2025), which follow the relation of Sales et al. (2017) (line for Vmax in their Table 2).

I have mixed feelings about this. On the one hand, there are many dwarf galaxies with rising rotation curves that we don’t see flatten out, so it is easy to imagine they might keep going up, and I find it plausible that this is what we would find if we looked harder. So plausible that I’ve spend a fair amount of time doing exactly this. Not all observations terminate at 1 M pc-2, and whenever we push further out, we see the same damn thing over and over: the rotation curve flattens out and stays flat!!. That’s been my anecdotal experience; getting beyond that systematically is the point of the MOHNGOOSE survey. This was constructed to detect much lower atomic gas surface densities, and routinely detects gas at the 0.1 M pc-2 level where Ruan et al. suggest we should see something closer to Vmax. So far, we don’t.

I don’t want to sound too negative, because how we map what we predict in simulations to what we measure in observations is a serious issue. But it seems a bit of a stretch for a low-scatter power law BTFR to be the happenstance of observational sensitivity that cuts in at a convenient mass scale. So far, we see no indication of that in more sensitive observations. I’ll certainly let you know if that changes.

Survey says…

At this juncture, we’ve examined enough examples that the reader can appreciate my concern that LCDM models can predict rather different things. What does the theory really predict? We can’t really test it until we agree what it should do!!!.

I thought it might be instructive to combine some of the models discussed above. It is.

Graph illustrating the correlation between the characteristic flat rotation speed (Vf) and baryonic mass (Mb) of galaxies. The plot features data points in different colors representing various galaxy types, with lines indicating theoretical trends and empirical relations.
Some of the LCDM predictions discussed above shown together. The dotted line to the right of the data is the halo mass-velocity relation, which is the one thing we all agree LCDM predicts but which is observationally inaccessible. The grey band is a Mo, Mao, & White-type model with fd = 0.025. The red dotted line is the model of Trujillo-Gomez-et al. (2011); the solid red line that of Sales et al. (2017) for Vmax.

The models run together, more or less, for high mass galaxies. Thanks to observational selection effects, these are the objects we’ve always known about and matched our theories to. In order to test a theory, one wants to force it to make predictions in new regimes it wasn’t built for. Low mass galaxies do that, as do low surface brightness galaxies, which are often but not always low mass. MOND has done well for both, down to the ultrafaints we’re discussing here. LCDM does not yet explain those, or really any of the intermediate mass dwarfs.

What really disturbs me about LCDM models is their flexibility. It’s not just that they miss, it’s that it is possible to miss the data on either side of the BTFR. The older fd = constant models predict velocities that are too low for low mass galaxies. The more recent abundance matching models predict velocities that are too high for low mass galaxies. I have no doubt that a model can be constructed that gets it right, because there is obviously enough flexibility to do pretty much anything. Adding new parameters until we get it right is an example of epicyclic thinking, as I’ve been pointing out for thirty years. I don’t know what could be worse for an idea like dark matter that is not falsifiable.

We still haven’t come anywhere close to explaining the ultrafaints in either theory. In LCDM, we don’t even know if we should draw a curved line that catches them as if they’re in equilibrium, or start from a power-law BTFR and look for departures from that due to tidal effects. Both are possible in LCDM, both are plausible, as is some combination of both. I expect theorists will pick an option and argue about it indefinitely.

Tidal effects

The typical velocity dispersion of the ultrafaint dwarfs is too high for them to be in equilibrium in MOND. But there’s also pretty much no way these tiny things could be in equilibrium, being in the rough neighborhood dominated by our home, the cosmic gorilla. That by itself doesn’t make an explanation; we need to work out what happens to such things as they evolve dynamically under the influence of a pronounced external field. To my knowledge, this hasn’t been addressed in detail in MOND any more than in LCDM, though Brada & Milgrom addressed some of the relevant issues.

There is a difference in approach required for the two theories. In LCDM, we need to increase the resolution of simulations to see what happens to the tiniest of dark matter halos and their resident galaxies within the larger dark matter halos of giant galaxies. In MOND we have to simulate the evolution along the orbit of each unique individual. This is challenging on multiple levels, as each possible realization of a MOND theory requires its own code. Writing a simulation code for AQUAL requires a different numerical approach than QUMOND, and those are both modifications of gravity via the Poisson euqation. We don’t know which might be closer to reality; heck, we don’t even know [yet] if MOND is a modification of gravity or intertia, the latter being even harder to code.

Cold dark matter is scale-free, so crudely I expect ultrafaint dwarfs in LCDM to do the same as larger dwarf satellites that have been simulated: their outer dark matter halos are gradually whittled away by tidal stripping for many Gyr. At first the stars are unaffected, but eventually so little dark matter is left that the stars start to be lost impulsively during pericenter passages. Though the dark matter is scale free, the stars and the baryonic physics that made them are not, so that’s where it gets tricky. The apparent dark-to-luminous mass ratio is huge, so one possibility is that the ultrafaints are in equilibrium despite their environment; they just made ridiculously few stars from the amount of mass available. That’s consistent with a wild extrapolation of abundance matching models, but how it comes about physically is less clear. For example, at some low mass, a galaxy would make so few stars that none are massive enough to result in a supernova, so there is no feedback, which is what is preventing too many stars from forming. Awkward. Alternately, the constant exposure to tidal perturbation might stir things up, with the velocity dispersion growing and stars getting stripped to form tidal streams, so they may have started as more massive objects. Or some combination of both, plus the evergreen possibility of things that don’t occur to me offhand.

Equilibrium for ultrafaint satellites is not an option in MOND, but tidal stirring and stripping is. As a thought experiment, let’s imagine what happens to a low mass dwarf typical of the field that falls towards the Milky Way from some large distance. Initially gas-rich, the first environmental effect that it is likely to experience is ram pressure stripping by the hot coronal gas around the Milky Way. That’s a baryonic effect that happens in either theory; it’s nothing to do with the effective law of gravity. A galaxy thus deprived of much of its mass will be out of equilibrium; its internal velocities will be typical of the original mass but the stripped mass is less. Consequently, its structure must adjust to compensate; perhaps dwarf Irregulars puff up and are transformed into dwarf Spheroidals in this way. Our notional infalling dwarf may have time to equilibrate to its new mass before being subject to strong tidal perturbation by the Milky Way, or it may not. If not, it will have characteristic internal velocities that are too high for its new mass, and reside above the BTFR. I doubt this suffices to explain [m]any of the ultrafaints, as their masses are so tiny that some stellar mass loss is also likely to have occurred.

Let’s suppose that our infalling dwarf has time to [approximately] equilibrate, or it simply formed nearby to begin with. Now it is a pressure supported system [more or less] on the BTFR. As it orbits the Milky Way, it feels an extra force from the external field. If it stays far enough out to remain in quasi-equilibrium in the EFE regime, then it will oscillate in size and velocity dispersion in phase with the strength of the external field it feels along its orbit.

If instead a satellite dips too close, it will be tidally disturbed and depart from equilibrium. The extra energy may stir it up, increasing its velocity dispersion. It doesn’t have the mass to sustain that, so stars will start to leak out. Tidal disruption will eventually happen, with the details depending on the initial mass and structure of the dwarf and on the eccentricity of its orbit, the distance of closest approach (pericenter), whether the orbit is prograde or retrograde relative to any angular momentum the dwarf may have… it’s complicated, so it is hard to generalize##. Nevertheless, we (McGaugh & Wolf 2010) anticipated that “the deviant dwarfs [ultrafaints] should show evidence of tidal disruption while the dwarfs that adhere to the BTFR should not.” Unlike LCDM where most of the damage is done at closest approach, we anticipate for MOND that “stripping of the deviant dwarfs should be ongoing and not restricted to pericenter passage” because tides are stronger and there is no cocoon of dark matter to shelter the stars. The effect is still maximized at pericenter, its just not as impulsive as in the some of the dark matter simulations I’ve seen.

This means that there should be streams of stars all over the sky. As indeed there are. For example:

A color-coded map of the northern sky displaying various stellar streams, indicated by labels such as 'Gaia-1*', 'Gaia-3*', and 'GD-1'. The color gradient represents velocity in kilometers per second, with colors ranging from blue for lower velocities to red for higher velocities.
Stellar streams in the Milky Way identified using Gaia (Malhan et al. 2018).

As a tidally influence dwarf dissolves, the stars will leak out and form a trail. This happens in LCDM too, but there are differences in the rate, coherence, and symmetry of the resulting streams. Perhaps ultrafaint dwarfs are just the last dregs of the tidal disruption process. From this perspective, it hardly matters if they originated as external satellites or are internal star clusters: globular clusters native to the Milky Way should undergo a similar evolution.

Evolutionary tracks

Perhaps some of the ultrafaint dwarfs are the nuggets of disturbed systems that have suffered mass loss through tidal stripping. That may be the case in either LCDM or MOND, and has appealing aspects in either case – we went through all the possibilities in McGaugh & Wolf (2010). In MOND, the BTFR provides a reference point for what a stable system in equilibrium should do. That’s the starting point for the evolutionary tracks suggested here:

A graph plotting flat rotation speed (Vf) in km/s against baryonic mass (Mb) in solar masses. The data points include various galaxies represented as blue circles and green squares, with error bars indicating measurement uncertainty. A solid black line demonstrates the overall trend, while red curves suggest alternative theoretical predictions.
BTFR with conceptual evolutionary tracks (red lines) for tidally-stirred ultrafaint dwarfs.

Objects start in equilibrium on the BTFR. As they become subject to the external field, their velocity dispersions first decreases as they transition through the quasi-Newtonian regime. As tides kick in, stars are lost and stretched along the satellite’s orbit, so mass is lost but the apparent velocity dispersion increases as stars gradually separate and stretch out along a stream. Their relative velocities no longer represent a measure of the internal gravitational potential; rather than a cohesive dwarf satellite they’re more an association of stars in similar orbits around the Milky Way.

This is crudely what I imagine might be happening in some of the ultrafaint dwarfs that reside above the BTFR. Reality can be more complicated, and probably is. For example, objects that are not yet disrupted may oscillate around and below the BTFR before becoming completely unglued. Moreover, some individual ultrafaints probably are not real, while the data for others may suffer from systematic uncertainties. There’s a lot to sort out, and we’ve reached the point where the possibility of non-equilibrium effects cannot be ignored.

As a test of theories, the better course remains to look for new galaxies free from environmental perturbation. Ultrafaint dwarfs in the field, far from cosmic gorillas like the Milky Way, would be ideal. Hopefully many will be discovered in current and future surveys.


!Other examples exist and continue to be discovered. More pertinent to my thinking is that the mass threshold at which reionization is supposed to suppress star formation has been a constantly moving goal post. To give an amusing anecdote, while I was junior faculty at the University of Maryland (so at least twenty years ago), Colin Norman called me up out of the blue. Colin is an expert on star formation, and had a burning question he thought I could answer. “Stacy,” he says as soon as I pick up, “what is the lowest mass star forming galaxy?” Uh, Hi, Colin. Off the cuff and totally unprepared for this inquiry, I said “um, a stellar mass of a few times 107 solar masses.” Colin’s immediate response was to laugh long and loud, as if I had made the best nerd joke ever. When he regained his composure, he said “We know that can’t be true as reionization will prevent star formation in potential wells that small.” So, after this abrupt conversation, I did some fact-checking, and indeed, the number I had pulled out of my arse on the spot was basically correct, at that time. I also looked up the predictions, and of course Colin knew his business too; galaxies that small shouldn’t exist. Yet they do, and now the minimum known is two orders of magnitude lower in mass, with still no indication that a lower limit has been reached. So far, the threshold of our knowledge has been imposed by observational selection effects (low luminosity galaxies are hard to see), not by any discernible physics.

More recently, McQuinn et al. (2024) have made a study of the star formation histories of Leo P and a few similar galaxies that are near enough to see individual stars so as to work out the star formation rate over the course of cosmic history. They argue that there seems to be a pause in star formation after reionization, so a more nuanced version of the hypothesis may be that reionization did suppress star forming activity for a while, but these tiny objects were subsequently able to re-accrete cold gas and get started again. I find that appealing as a less simplistic thing that might have happened in the real universe, and not just a simple on/off switch that leaves only a fossil. However, it isn’t immediately clear to me that this more nuanced hypothesis should happen in LCDM. Once those baryons have evaporated, they’re gone, and it is far from obvious that they’ll ever come back to the weak gravity of such a small dark matter halo. It is also not clear to me that this interpretation, appealing as it is, is unique: the reconstructed star formation histories also look consistent with stochastic star formation, with fluctuations in the star formation rate being a matter of happenstance that have nothing to do with the epoch of reionization.

#So how are ultrafaint dwarfs different from star clusters? Great question! Wish we had a great answer.

Some ultrafaints probably are star clusters rather than independent satellite galaxies. How do we tell the difference? Chiefly, the velocity dispersion: star clusters show no need for dark matter, while ultrafaint dwarfs generally appear to need a lot. This of course assumes that their measured velocity dispersions represent an equilibrium measure of their gravitational potential, which is what we’re questioning here, so the opportunity for circular reasoning is rife.

$Rather than apply a strict luminosity cut, for convenience I’ve kept the same “not safe from tidal disruption” distinction that we’ve used before. Some of the objects in the 105 – 106 M range might belong more with the classical dwarfs than with the ultrafaints. This is a reminder that our nomenclature is terrible more than anything physically meaningful.

&Astronomy is an observational science, not a laboratory science. We can only detect the photons nature sends our way. We cannot control all the potential systematics as can be done in an enclosed, finite, carefully controlled laboratory. That means there is always the potential for systematic uncertainties whose magnitude can be difficult to estimate, or sometimes to even be aware of, like how local variations impact Jeans analyses. This means we have to take our error bars with a grain of salt, often such a big grain as to make statistical tests unreliable: goodness of fit is only as meaningful as the error bars.

I say this because it seems to be the hardest thing for physicists to understand. I also see many younger astronomers turning the crank on fancy statistical machinery as if astronomical error bars can be trusted. Garbage in, garbage out.

*This is an example of setting a parameter in a model “by hand.”

**The transition to thinking in terms of the luminosity function rather than Tully-Fisher is so complete that the most recent, super-large, Euclid flagship simulation doesn’t even attempt to address the kinematics of individual galaxies while giving extraordinarily detailed and extensive details about their luminosity distributions. I can see why they’d do that – they want to focus on what the Euclid mission might observe – but it is also symptomatic of the growing tendency to I’ve witnessed to just not talk about those pesky kinematics.

%Halos in dark matter simulations tend to be rather triaxial, i.e., a 3D bloboid that is neither spherical like a soccer ball nor oblate like a frisbee nor prolate like an American football: each principle axis has a different length. If real halos were triaxial, it would lead to non-circular orbits in dark matter-dominated galaxies that are not observed.

The triaxiality of halos is a result from dark matter-only simulations. Personally, I suspect that the condensation of gas within a dark matter halo (presuming such things exist) during the process of galaxy formation rounds-out the inner halo, making it nearly spherical where we are able to make measurements. So I don’t see this as necessarily a failure of LCDM, but rather an example of how more elaborate simulations that include baryonic physics are sometimes warranted. Sometimes. There’s a big difference between this process, which also compresses the halo (making it more dense when it already starts out too dense), and the various forms of feedback, which may or may not further alter the structure of the halo.

^There are many failure modes in simulated rotation curves, the two most common being the cusp-core problem in dwarfs and sub-maximal disks in giants. It is common for the disks of bright spiral galaxies to be nearly maximal in the sense that the observed stars suffice to explain the inner rotation curve. They may not be completely maximal in this sense, but they come close for normal stellar populations. (Our own Milky Way is a good example.) In contrast, many simulations produce bright galaxies that are absurdly sub-maximal; EAGLE and SIMBA being two examples I remember offhand.

Another common problem is that LCDM simulations often don’t produce rotation curves that are as flat as observed. This was something I also found in my early attempts at model-building with dark matter halos. It is easy to fit a flat rotation curve given the data, but it is hard to predict a priori that rotation curves should be flat.

!!Gravitational lensing indicates that rotation curves remain flat to even larger radii. However, these observations are only sensitive to galaxies more massive than those under discussion here. So conceivably there could be another coincidence wherein flatness persists for galaxies with Mb > 1010 M, but not those with Mb < 109 M.

!!!Many in the community seem to agree that it will surely work out.

##I’ve tried to estimate dissolution timescales, but find the results wanting. For plausible assumptions, one finds timescales that seem plausible (a few Gyr) but with some minor fiddling one can also find results that are no-way that’s-too-short (a few tens of millions of years), depending on the dwarf and its orbit. These are crude analytic estimates; I’m not satisfied that these numbers were particularly meaningful. Still, this is a worry with the tidal-stirring hypothesis: will perturbed objects persist long enough to be observed as they are? This is another reason we need detailed simulations tailored to each object.


*&^#Note added after initial publication: While I was writing this, a nice paper appeared on exactly this issue of the star formation history of a good number of ultrafaint dwarfs. They find that 80% of the stellar mass formed 12.48 ± 0.18 Gyr ago, so 12.5 was a good guess. Formally, at the one sigma level, this is a little after reionization, but only a tiny bit, so close enough: the bulk of the stars formed long ago, like a classical globular cluster, and these ultrafaints are consistent with being fossils.

Intriguingly, there is a hint of an age difference by kinematic grouping, with things that have been in the Milky Way being the oldest, those on first infall being a little younger (but still very old), and those infalling with the Large Magellanic Cloud a tad younger still. If so, then there is more to the story than quenching by cosmic reionization.

They also show a nice collection of images so you can see more examples. The ellipses trace out the half-light radii, so can see the proclivity for many (not all!) of these objects to be elongated, perhaps as a result of tidal perturbation:

Figure 2 from Durbin et al. (2025)Footprints of all HST observations (blue filled patches) overlaid on DSS2 imaging cutouts. Open black ellipses show the galaxy profiles at one half-light radius.

Non-equilibrium dynamics in galaxies that appear to lack dark matter: ultradiffuse galaxies

Non-equilibrium dynamics in galaxies that appear to lack dark matter: ultradiffuse galaxies

Previously, we discussed non-equilibrium dynamics in tidal dwarf galaxies. These are the result of interactions between giant galaxies that are manifestly a departure from equilibrium, a circumstance that makes TDGs potentially a decisive test to distinguish between dark matter and MOND, and simultaneously precludes confident application of that test. There are other galaxies for which I suspect non-equilibrium dynamics may play a role, among them some (not all) of the so-called ultradiffuse galaxies (UDGs).

UDGs

The term UDG has been adopted for galaxies below a certain surface brightness threshold with a size (half-light radius) in excess of 1.5 kpc (van Dokkum et al. 2015). I find the stipulation about the size to be redundant, as surface brightness* is already a measure of diffuseness. But OK, whatever, these things are really spread out. That means they should be good tests of MOND like low surface brightness galaxies before them: their low stellar surface densities mean** that they should be in the regime of low acceleration and evince large mass discrepancies when isolated. It also makes them susceptible to the external field effect (EFE) in MOND when they are not isolated, and perhaps also to tidal disruption.

To give some context, here is a plot of the size-mass relation for Local Group dwarf spheroidals. Typically they have masses comparable to globular clusters, but much large sizes – a few hundred parsecs instead of just a few. As with more massive galaxies, these pressure supported dwarfs are all over the place – at a give mass, some are large while others are relatively compact. All but the one most massive galaxy in this plot are in the MOND regime. For convenience, I’ll refer to the black points labelled with names as UDGs+.

The size (radius encompassing half of the total light) and stellar mass of Local Group dwarf spheroidals (green points selected by McGaugh et al. 2021 to be relatively safe from external perturbation) along with two more Local Group dwarfs that are subject to the EFE (Crater 2 and Antlia 2) and the two UDGs NGC 1052-DF2 and DF4. Dotted lines show loci of constant surface density. For reference, the solar neighborhood has ~40 M pc-2; the centers of high surface brightness galaxies frequently exceed 1,000 M pc-2.

The UDGs are big and diffuse. This makes them susceptible to the EFE and tidal effects. The lower the density of a system, the easier it is for external systems to mess with it. The ultimate example is something gets so close to a dominant central mass that it gets tidally disrupted. That can happen conventionally; the stronger effective force of MOND increases tidal effects. Indeed, there is only a fairly narrow regime between the isolated case and tidally-induced disequilibrium where the EFE modifies the internal dynamics in a quasi-static way.

The trouble is the s-word: static. In order to test theories, we assume that the dynamical systems we observe are in equilibrium. Though often a good assumption, it doesn’t always hold. If we forget we made the assumption, we might think we’ve falsified a theory when all we’ve done is discover a system that is out of equilibrium. The universe is a very dynamic place – the whole thing is expanding, after all – so we need to be wary of static thinking.

Equilibrium MOND formulae

That said, let’s indulge in some static thinking. An isolated, pressure supported galaxy in the MOND regime will have an equilibrium velocity dispersion

where M is the mass (the stellar mass in the case of a gas-free dwarf spheroidal), G is Newton’s constant, and a0 is Milgrom’s acceleration constant. The number 4/81 is a geometrical factor that assumes we’re observing a spherical system with isotropic orbits, neither of which is guaranteed even in the equilibrium case, and deviations from this idealized situation are noticeable. Still, this is as simple as it gets: if you know the mass, you can predict the characteristic speed at which stars move. Mass is all that matters: we don’t care about the radius as we must with Newton (v2 = GM/r); the only other quantities are constants of nature.

But what do we mean by isolated? In MOND, it is that the internal acceleration of the system, gin, exceeds that from external sources, gex: gingex. For a pressure supported dwarf, gin ≈ 3σ2/r (so here the size of the dwarf does matter, as does the location of a star within it), while the external field from a giant host galaxy would be gex = Vf2/D where Vf is the flat rotation speed stipulated by the baryonic mass of the host and D is the distance from the host to the dwarf satellite. The distance is not a static quantity. As a dwarf orbits its host, D will vary by an amount that depends on the eccentricity of the orbit, and the external field will vary with it, so it is possible to have an orbit in which a dwarf satellite dips in and out of the EFE regime. Many Local Group dwarfs straddle the line gingex, and it takes time to equilibrate, so static thinking can go awry.

It is possible to define a sample of Local Group dwarfs that have sufficiently high internal accelerations (but also in the MOND regime with gexgin ≪ a0) that we can pretend they are isolated, and the above equation applies. Such dwarfs should& fall on the BTFR, which they do:

The baryonic Tully-Fisher relation (BTFR) including pressure supported dwarfs (green points) with their measured velocity dispersions matched to the flat rotation speeds of rotationally supported galaxies (blue points) via the prescription of McGaugh et al. (2021). The large blue points are rotators in the Local Group (with Andromeda and the Milky Way up near the top); smaller points are spirals with direct distance measurements (Schombert et al. 2020). The Local Group dwarfs assessed to be safe from external perturbation are on the BTFR (for Vf = 2σ); Crater 2 and the UDGs near NGC 1052 are not.

In contrast, three of the four the UDGs considered here do not fall on the BTFR. Should they?

Conventionally, in terms of dark matter, probably they should. There is no reason for them to deviate from whatever story we make up to explain the BTFR for everything else. That they do means we have to make up a separate story for them. I don’t want to go deeply into this here since the cold dark matter model doesn’t really explain the observed BTFR in the first place. But even accepting that it does so after invoking feedback (or whatever), does it tolerate deviants? In a broad sense, yes: since it doesn’t require the particular form of the BTFR that’s observed, it is no problem to deviate from it. In a more serious sense, no: if one comes up with a model that explains the small scatter of the BTFR, it is hard to make that same model defy said small scatter. I know, I’ve tried. Lots. One winds up with some form of special pleading in pretty much any flavor of dark matter theory on top of whatever special pleading we invoked to explain the BTFR in the first place. This is bad, but perhaps not as bad as it seems once one realizes that not everything has to be in equilibrium all the time.

In MOND, the BTFR is absolute – for isolated systems in equilibrium. In the EFE regime, galaxies can and should deviate from it even if they are in equilibrium. This always goes in the sense of having a lower characteristic velocity for a given mass, so below the line in the plot. To get above the line would require being out of equilibrium through some process that inflates velocities (if systematic errors are not to blame, which also sometimes happens.)

The velocity dispersion in the EFE regime (gingex ≪ a0) is slightly more complicated than this isolated case:

This is just like Newton except the effective value of the gravitational constant is modified. It gets a boost^ by how far the system is in the MOND regime: GeffG(a0/gex). An easy way to tell which regime an object is in is to calculate both velocity dispersions σiso and σefe: the smaller one is the one that applies#. An upshot of this is that systems in the EFE regime should deviate from the BTFR to the low velocity side. The amplitude of the deviation depends on the system and the EFE: both the size and mass matter, as does gex. Indeed, if an object is on an eccentric orbit, then the velocity dispersion can vary with the EFE as the distance of the satellite from its host varies, so over time the object would trace out some variable path in the BTFR plane.

Three of the four UDGs fall off the BTFR, so that sounds mostly right, qualitatively. Is it? Yes, for Crater 2, but but not really for the others. Even for Crater 2 it is only a partial answer, as non-equilibrium effects may play a role. This gets involved for Crater 2, then more so for the others, so let’s start with Crater 2.

Crater 2 – the velocity dispersion

The velocity dispersion of Crater 2 was correctly predicted a priori by the formula for σefe above. It is a tiny number, 2 km/s, and that’s what was subsequently observed. Crater 2 is very low mass, ~3 x 105 M, which is barely a globular cluster, but it is even more spread out than the typical dwarf spheroidal, having an effective surface density of only ~0.05 Mpc-2. If it were isolated, MOND predicts that it would have a higher velocity dispersion – all of 4 km/s. That’s what it would take to put it on the BTFR above. The seemingly modest difference between 2 and 4 km/s makes for a clear offset. But despite its substantial current distance from the Milky Way (~ 120 kpc), Crater 2 is so low surface density that it is still subject to the external field effect, which lowers its equilibrium velocity dispersion. Unlike isolated galaxies, it should be offset from the BTFR according to MOND.

LCDM struggles to explain the low mass end of the BTFR because it predicts a halo mass-circular speed relation Mhalo ~ Vhalo3 that differs from the observed Mb ~ Vf4. A couple of decades ago, it looked like massive galaxies might be consistent with the lower power-law, but that anticipates higher velocities for small systems. The low velocity dispersion of Crater 2 is thus doubly weird in LCDM. It’s internal velocities are too small not just once – the BTFR is already lower than was expected – but twice, being below even that.

An object with a large radial extent like Crater 2 probes far out into its notional dark matter halo, making the nominal prediction$ of LCDM around ~17 km/s, albeit with a huge expected scatter. Even if we can explain the low mass end of the BTFR and its unnaturally low scatter in LCDM, we now have to explain this exception to it – an exception that is natural in MOND, but is on the wrong side of the probability distribution for LCDM. That’s one of the troubles with tuning LCDM to mimic MOND: if you succeed in explaining the first thing, you still fail to anticipate the other. There is no EFE% in LCDM, no reason to anticipate that σefe applies rather than σiso, and no reason to expect via feedback that this distinction has anything to do with the dynamical accelerations gin and gex.

But wait – this is a post about non-equilibrium dynamics. That can happen in LCDM too. Indeed, one expects that satellite galaxies suffer tidal effects in the field of their giant host. The primary effect is that the dark matter subhalos in which dwarf satellites reside are stripped from the outside in. Their dark matter becomes part of the large halo of the host. But the stars are well-cocooned in the inner cusp of the NFW halo which is more robust than the outskirts of the subhalo, so the observable velocity dispersion barely evolves until most of the dark mass has been stripped away. Eventually, the stars too get stripped, forming tidal streams. Most of the damage occurs during pericenter passage when satellites are closest to their host. What’s left is no longer in equilibrium, with the details depending on the initial conditions of the dwarf on infall, the orbit, the number of pericenter passages, etc., etc.

What does not come out of this process is Crater 2 – at least not naturally. It has stars very far out – these should get stripped outright if the subhalo has been eviscerated to the point where its velocity dispersion is only 2 km/s. This tidal limitation has been noted by Errani et al.: “the large size of kinematically cold ‘feeble giant’ satellites like Crater 2 or Antlia 2 cannot be explained as due to tidal effects alone in the Lambda Cold Dark Matter scenario.” To save LCDM, we need something extra, some additional special pleading on top of non-equilibrium tidal effects, which is why I previously referred to Crater 2 as the Bullet Cluster of LCDM: an observation so problematic that it amounts to a falsification.

Crater 2 – the orbit

We held a workshop on dwarf galaxies on CWRU’s campus in 2017 where issues pertaining to both dark matter and MOND discussed. The case of Crater 2 was one of the things discussed, and it was included in the list of further tests for both theories (see above links). Basically the expectation in LCDM is that most subhalo orbits are radial (highly eccentric), so that is likely to be the case for Crater 2. In contrast, the ultradiffuse blob that is Crater 2 would not survive a close passage by the Milky Way given the strong tidal force exerted by MOND, so the expectation was for a more tangential (quasi-circular) orbit that keeps it at a safe distance.

Subsequently, it became possible to constrain orbits with Gaia data. The exact orbit depends on the gravitational potential of the Milky Way, which isn’t perfectly known. However, several plausible choices of the global potential give an an eccentricity around 0.6. That’s not exactly radial, but it’s pretty far from circular, placing the pericenter around 30 kpc. That’s much closer than its current distance, and well into the regime where it should be tidally disrupted in MOND. No way it survives such a close passage!

So which is it? MOND predicted the correct velocity dispersion, which LCDM struggles to explain. Yet the orbit is reasonable in LCDM, but incompatible with MOND.

Simulations of dwarf satellites

It occurs to me that we might be falling victim to static thinking somewhere. We talked about the impact of tides on dark matter halos a bit above. What should we expect in MOND?

The first numerical simulations of dwarf galaxies orbiting a giant host were conducted by Brada & Milgrom (2000). Their work is specific to the Aquadratic Lagrangian (AQUAL) theory proposed by Bekenstein & Milgrom (1984). This was the first demonstration that it was possible to write a version of MOND that conserved momentum and energy. Since then, a number of different approaches have been demonstrated. These can be subtly different, so it is challenging to know which (if any) is correct. Sorting that out is well beyond the scope of this post, so let’s stick to what we can learn from Brada & Milgrom.

Brada & Milgrom followed the evolution of low surface density dwarfs of a range of masses as they orbited a giant host galaxy. One thing they found was that the behavior of the numerical model could deviate from the analytic expectation of quasi-equilibrium enshrined in the equations above. For an eccentric orbit, the external field varies with distance from the host. If there is enough time to respond to this, the change can be adiabatic (reversible), and the static approximation may be close enough. However, as the external field varies more rapidly and/or the dwarf is more fragile, the numerical solution departs from the simple analytic approximation. For example:

Fig. 2 of Brada & Milgrom (2000): showing the numerically calculated (dotted line) variation of radius (left) and characteristic velocity (right) for a dwarf on a mildly eccentric orbit (peri- and apocenter of roughly 60 and 90 kpc, respectively, for a Milky Way-like host). Also shown is the variation in the EFE as the dwarf’s distance from the host varies (solid line). Dwarfs go through a breathing mode of increasing/decreasing size and decreasing/increasing velocity dispersion in phase with the orbit. If this process is adiabatic, it tracks the solid line and the static EFE approximation holds. This is not always the case in the simulation, so applying our usual assumption of dynamical equilibrium will result in an error stipulated by the difference between the dotted and solid lines. The amplitude of this error depends on the size, mass, and orbital history of each and every dwarf satellite.

As long as the behavior is adiabatic, the dwarf can be stable indefinitely even as it goes through periodic expansion and contraction in phase with the orbit. Departure from adiabaticity means that every passage will be different. Some damage will be done on the first passage, more on the second, and so on. As a consequence, reality will depart from our simple analytic expectations.

I was aware of this when I made the prediction for the velocity dispersion of Crater 2, and hedged appropriately. Indeed, I worried that Crater 2 should already be out of equilibrium. Nevertheless, I took solace in two things: first, the orbital timescale is long, over a Gyr, so departures from the equilibrium prediction might not have had time to make a dramatic difference. Second, this expectation is consistent with the slow evolution of the characteristic velocity for the most Crater 2-like, m=1 model of Brada & Milgrom (bottom track in the right panel below):

Fig. 4 of Brada & Milgrom (2000): The variation of the size and characteristic velocity of dwarf models of different mass. The more massive models approximate the adiabatic limit, which gradually breaks down for the lowest mass models. In this example, the m = 1 and 2 models explode, with the scale size growing gradually without recovering.

What about the size? That is not constant except for the most massive (m=16) model. The m=3 and 4 models recover, albeit not adiabatically. The m=4 model almost returns to its original size, but the m=3 model has puffed up after one orbit. The m=1 and 2 models explode.

One can see this by eye. The continuous growth in radii of the lower mass models is obvious. If one looks closely, one can also see the expansion then contraction of the heavier models.

Fig. 5 of Brada & Milgrom (2000): AQUAL numerical simulations dwarf satellites orbiting a more massive host galaxy. The parameter m describes the mass and effective surface density of the satellite; all the satellites are in the MOND regime and subject to the external field of the host galaxy, which exceeds their internal accelerations. In dimensionless simulation units, m = 5 x 10-5, which for a satellite of the Milky Way corresponds roughly to a stellar mass of 3 x 106 M. For real dwarf satellite galaxies, the scale size is also relevant, but the sequence of m above suffices to illustrate the increasingly severe effects of the external field as m decreases.

The current size of Crater 2 is unusual. It is very extended for its mass. If the current version of Crater 2 has a close passage with the Milky Way, it won’t survive. But we know it already had a close passage, so it should be expanding now as a result. (I did discuss the potential for non-equilibrium effects.) Knowing now that there was a pericenter passage in the (not exactly recent) past, we need to imagine running back the clock on the simulations. It would have been smaller in the past, so maybe it started with a normal size, and now appears so large because of its pericenter passage. The dynamics predict something like that; it is static thinking to assume it was always thus.

The dotted line shows a possible evolutionary track for Crater 2 as it expands after pericenter passage. Its initial condition would have been amongst the other dwarf spheroidals. It could also have lost some mass in the process, so any of the green low-mass dwarfs might be similar to the progenitor.

This is a good example of a phenomena I’ve encountered repeatedly with MOND. It predicts something right, but seems to get something else wrong. If we’re already sure it is wrong, we stop there and never think further. But when one bothers to follow through on what the theory really predicts, more often than not the apparently problematic observation is in fact what we should have expected in the first place.

DF2 and DF4

DF2 and DF4 are two UDGs in the vicinity of the giant galaxy NGC 1052. They have very similar properties, and are practically identical in terms of having the same size and mass within the errors. They are similar to Crater 2 in that they are larger than other galaxies of the same mass.

When it was first discovered, NGC 1052-DF2 was portrayed as a falsification of MOND. On closer examination, had I known about it, I could have used MOND to correctly predict its velocity dispersion, just like the dwarfs of Andromeda. This seemed like yet another case where the initial interpretation contrary to MOND melted away to actually be a confirmation. At this point, I’ve seen literally hundreds^^of cases like that. Indeed, this particular incident made me realize that there would always be new cases like that, so I decided to stop spending my time addressing every single case.

Since then, DF2 has been the target of many intensive observing campaigns. Apparently it is easier to get lots of telescope time to observe a single object that might have the capacity to falsify MOND than it is to get a more modest amount to study everything else in the universe. That speaks volumes about community priorities and the biases that inform them. At any rate, there is now lots more data on this one object. In some sense there is too much – there has been an active debate in the literature over the best distance determination (which affects the mass) and the most accurate velocity dispersion. Some of these combinations are fine with MOND, but others are not. Let’s consider the worst case scenario.

In the worst case scenario, both DF2 and DF4 are too far from NGC 1052 for its current EFE to have much impact, and they have relatively low velocity dispersions for their luminosity, around 8 km/s, so they fall below the BTFR. Worse for MOND is that this is about what one expects from Newton for the stars alone. Consequently, these galaxies are sometimes referred to as being “dark matter free.” That’s a problem for MOND, which predicts a larger velocity dispersion for systems in equilibrium.

Perhaps we are falling prey to static thinking, and these objects are not in equilibrium. While their proximity to neighboring galaxies and the EFE to which they are presently exposed depends on the distance, which is disputed, it is clear that they live in a rough neighborhood with lots of more massive galaxies that could have bullied them in a close passage at some point in the past. Looking at Fig. 4 of Brada & Milgrom above, I see that galaxies whacked out of equilibrium not only expand in radius, potentially explaining the unusually large sizes of these UDGs, but they also experience a period during which their velocity dispersion is below the equilibrium value. The amplitude of the dip in these simulations is about right to explain the appearance of being dark-matter-free.

It is thus conceivable that DF2 and DF4 (the two are nearly identical in the relevant respects) suffered some sort of interaction that perturbed them into their current state. Their apparent absence of a mass discrepancy and the apparent falsification of MOND that follows therefrom might simply be a chimera of static thinking.

Make no mistake: this is a form of special pleading. The period of depressed velocity dispersion does not last indefinitely, so we have to catch them at a somewhat special time. How special depends on the nature of the interaction and its timescale. This can be long in intergalactic space (Gyrs), so it may not be crazy special, but we don’t really know how special. To say more, we would have to do detailed simulations to map out the large parameter space of possibilities for these objects.

I’d be embarrassed for MOND to have to make this kind of special pleading if we didn’t also have to do it for LCDM. A dwarf galaxy being dark matter free in LCDM shouldn’t happen. Galaxies form in dark matter halos; it is very hard to get rid of the dark matter while keeping the galaxy. The most obvious way to do it, in rare cases, is through tidal disruption, though one can come up with other possibilities. These amount to the same sort of special pleading we’re contemplating on behalf of MOND.

Recently, Tang et al. (2024) argue that DF2 and DF4 are “part of a large linear substructure of dwarf galaxies that could have been formed from a high-velocity head-on encounter of two gas-rich galaxies” which might have stripped the dark matter while leaving the galactic material. That sounds… unlikely. Whether it is more or less unlikely than what it would take to preserve MOND is hard to judge. It appears that we have to indulge in some sort of special pleading no matter what: it simply isn’t natural for galaxies to lack dark matter in a universe made of dark matter, just as it is unnatural for low acceleration systems to not manifest a mass discrepancy in MOND. There is no world model in which these objects make sense.

Tang et al. (2024) also consider a number of other possibilities, which they conveniently tabulate:

Table 3 from Tang et al. (2024).

There are many variations on awkward hypotheses for how these particular UDGs came to be in LCDM. They’re all forms of special pleading. Even putting on my dark matter hat, most sound like crazy talk to me. (Stellar feedback? Really? Is there anything it cannot do?) It feels like special pleading on top of special pleading; it’s special pleading all the way down. All we have left to debate is which form of special pleading seems less unlikely than the others.

I don’t find this debate particularly engaging. Something weird happened here. What that might be is certainly of interest, but I don’t see how we can hope to extract from it a definitive test of world models.

Antlia 2

The last of the UDGs in the first plot above is Antlia 2, which I now regret including – not because it isn’t interesting, but because this post is getting exhausting. Certainly to write, perhaps to read.

Antlia 2 is on the BTFR, which is ordinarily normal. In this case it is weird in MOND, as the EFE should put it off the BTFR. The observed velocity dispersion is 6 km/s, but the static EFE formula predicts it should only be 3 km/s. This case should be like Crater 2.

First, I’d like to point out that, as an observer, it is amazing to me that we can seriously discuss the difference between 3 and 6 km/s. These are tiny numbers by the standard of the field. The more strident advocates of cold dark matter used to routinely assume that our rotation curve observations suffered much larger systematic errors than that in order to (often blithely) assert that everything was OK with cuspy halos so who are you going to believe, our big, beautiful simulations or those lying data?

I’m not like that, so I do take the difference seriously. My next question, whenever MOND is a bit off like this, is what does LCDM predict?

I’ll wait.

Well, no, I won’t, because I’ve been waiting for thirty years, and the answer, when there is one, keeps changing. The nominal answer, as best I can tell, is ~20 km/s. As with Crater 2, the large scale size of this dwarf means it should sample a large portion of its dark matter halo, so the expected characteristic speed is much higher than 6 km/s. So while the static MOND prediction may be somewhat off here, the static LCDM expectation fares even worse.

This happens a lot. Whenever I come across a case that doesn’t make sense in MOND, it usually doesn’t make sense in dark matter either.

In this case, the failure of the static-case prediction is apparently caused by tidal perturbation. Like Crater 2, Antlia 2 may have a large half-light radius because it is expanding in the way seen in the simulations of Brada & Milgrom. But it appears to be a bit further down that path, with member stars stretched out along the orbital path. They start to trace a small portion of a much deeper gravitational potential, so the apparent velocity dispersion goes up in excess of the static prediction.

Fig. 9 from Ji et al. (2021) showing tidal features in Antlia 2 considering the effects of the Milky Way alone (left panel) and of the Milky Way and the Large Magellanic Cloud together (central panel) along with the position-velocity diagram from individual stars (right panel). The object is clearly not the isotropic, spherical cow presumed by the static equation for the velocity dispersion. Indeed, it is elongated as would be expected from tidal effects, with individual member stars apparently leaking out.

This is essentially what I inferred must be happening in the ultrafaint dwarfs of the Milky Way. There is no way that these tiny objects deep in the potential well of the Milky Way escape tidal perturbation%% in MOND. They may be stripped of their stars and their velocity dispersions mage get tidally stirred up. Indeed, Antlia 2 looks very much like the MOND prediction for the formation of tidal streams from such dwarfs made by McGaugh & Wolf (2010). Unlike dark matter models in which stars are first protected, then lost in pulses during pericenter passages, the stronger tides of MOND combined with the absence of a protective dark matter cocoon means that stars leak out gradually all along the orbit of the dwarf. The rate is faster when the external field is stronger at pericenter passage, but the mass loss is more continuous. This is a good way to make long stellar streams, which are ubiquitous in the stellar halo of the Milky Way.

So… so what?

It appears that aspects of the observations of the UDGs discussed here that seem problematic for MOND may not be as bad for the theory as they at first seem. Indeed, it appears that the noted problems may instead be a consequence of the static assumptions we usually adopt to do the analysis. The universe is a dynamic place, so we know this assumption does not always hold. One has to judge each case individually to assess whether this is reasonable or not.

In the cases of Crater 2 and Antlia 2, yes, the stranger aspects of the observations fit well with non-equilibrium effects. Indeed, the unusually large half-light radii of these low mass dwarfs may well be a result of expansion after tidal perturbation. That this might happen was specifically anticipated for Crater 2, and Antlia 2 fits the bill described by McGaugh & Wolf (2010) as anticipated by the simulations of Brada & Milgrom (2000) even though it was unknown at the time.

In the cases of DF2 and DF4, it is less clear what is going on. I’m not sure which data to believe, and I want to refrain from cherry-picking, so I’ve discussed the worst-case scenario above. But the data don’t make a heck of a lot of sense in any world view; the many hypotheses made in the dark matter context seem just as contrived and unlikely as a tidally-induced, temporary dip in the velocity dispersion that might happen in MOND. I don’t find any of these scenarios to be satisfactory.

This is a long post, and we have only discussed four galaxies. We should bear in mind that the vast majority of galaxies do as predicted by MOND; a few discrepant cases are always to be expected in astronomy. That MOND works at all is a problem for the dark matter paradigm: that it would do so was not anticipated by any flavor of dark matter theory, and there remains no satisfactory explanation of why MOND appears to happen in a universe made of dark matter. These four galaxies are interesting cases, but they may be an example of missing the forest for the trees.


*As it happens, the surface brightness threshold adopted in the definition of UDGs is exactly the same as I suggested for VLSBGs (very low surface brightness galaxies: McGaugh 1996), once the filter conversions have been made. At the time, this was the threshold of our knowledge, and I and other early pioneers of LSB galaxies were struggling to convince the community that such things might exist. Up until that time, the balance of opinion was that they did not, so it is gratifying to see that they do.

**This expectation is specific to MOND; it doesn’t necessarily hold in dark matter where the acceleration in the central regions of diffuse galaxies can be dominated by the cusp of the dark matter halo. These were predicted to exceed what is observed, hence the cusp-core problem.

+Measuring by surface brightness, Crater 2 and Antlia 2 are two orders of magnitude more diffuse than the prototypical ultradiffuse galaxies DF2 and DF4. Crater 2 is not quite large enough to count as a UDG by the adopted size definition, but Antlia 2 is. So does that make it super-ultra diffuse? Would it even be astronomy without terrible nomenclature?

&I didn’t want to use a MOND-specific criterion in McGaugh et al. (2021) because I was making a more general point, so the green points are overly conservative from the perspective of the MOND isolation criterion: there are more dwarfs for which this works. Indeed, we had great success in predicting velocity dispersions in exactly this fashion in McGaugh & Milgrom (2013a, 2013b). And XXVIII was a case not included above that we highlighted as a great test of MOND, being low mass (~4×105 M) but still qualifying as isolated, and its dispersion came in (6.6+2.9-2.1 km/s in one measurement, 4.9 ± 1.6 km/s in another) as predicted a priori (4.3+0.8-0.7 km/s). Hopefully the Rubin Observatory will discover many more similar objects that are truly isolated; these will be great additional tests, though one wonders how much more piling-on needs to be done.

^This is an approximation that is reasonable for the small accelerations involved. More generally we have Geff = G/μ(|gex+gin|/a0) where μ is the MOND interpolation function and one takes the vector sum of all relevant accelerations.

#This follows because the boost from MOND is limited by how far into the low acceleration regime an object is in. If the EFE is important, the boost will be less than in the isolated case. As we said in 2013, “the case that reports the lower velocity dispersion is always the formally correct one.” I mention it again here because apparently people are good at scraping equations from papers without reading the associated instructions, so one gets statements likethe theory does not specify precisely when the EFE formula should replace the isolated MOND prediction.” Yes it does. We told you precisely when the EFE formula should replace the isolated formula. It is when it reports the lower velocity dispersion. We also noted this as the reason for not giving σefe in the tables in cases it didn’t apply, so there were multiple flags. It took half a dozen coauthors to not read that. I’d hate to see how their Ikea furniture turned out.

$As often happens with LCDM, there are many nominal predictions. One common theme is that “Despite spanning four decades in luminosity, dSphs appear to inhabit halos of comparable peak circular velocity.” So nominally, one would expect a faint galaxy like Crater 2 to have a similar velocity dispersion to a much brighter one like Fornax, and the luminosity would have practically no power to predict the velocity dispersion, contrary to what we observe in the BTFR.

%There is the 2-halo term – once you get far enough from the center of a dark matter halo (the 1-halo term), there are other halos out there. These provide additional unseen mass, so can boost the velocity. The EFE in MOND has the opposite effect, and occurs for completely different physical reasons, so they’re not at all the same.

^^For arbitrary reasons of human psychology, the threshold many physicists set for “always happens” is around 100 times. That is, if a phenomenon is repeated 100 times, it is widely presumed to be a general rule. That was the threshold Vera Rubin hit when convincing the community that flat rotation curves were the general rule, not just some peculiar cases. That threshold has also been hit and exceeded by detailed MOND fits to rotation curves, and it seems to be widely accepted that this is the general rule even if many people deny the obvious implications. By now, it is also the case for apparent exceptions to MOND ceasing to be exceptions as the data improve. Unfortunately, people tend to stop listening at what they want to hear (in this case, “falsifies MOND”) and fail to pay attention to further developments.

%%It is conceivable that the ultrafaint dwarfs might elude tidal disruption in dark matter models if they reside in sufficiently dense dark matter halos. This seems unlikely given the obvious tidal effects on much more massive systems like the Sagittarius dwarf and the Magellanic Clouds, but it could in principle happen. Indeed, if one calculates the mass density from the observed velocity dispersion, one infers that they do reside in dense dark matter halos. In order to do this calculation, we are obliged to assume that the objects are in equilibrium. This is, of course, a form of static thinking: the possibility of tidal stirring that enhances the velocity dispersion above the equilibrium value is excluded by assumption. The assumption of equilibrium is so basic that it is easy to unwittingly engage in circular reasoning. I know, as I did exactly that myself to begin with.

Non-equilibrium dynamics in galaxies that appear to lack dark matter: tidal dwarf galaxies

Non-equilibrium dynamics in galaxies that appear to lack dark matter: tidal dwarf galaxies

There are a number of galaxies that have been reported to lack dark matter. This is weird in a universe made of dark matter. It is also weird in MOND, which (if true) is what causes the inference of dark matter. So how can this happen?

In most cases, it doesn’t. These claims not only don’t make sense in either context, they are simply wrong. I don’t want to sound too harsh, as I’ve come close to making the same mistake myself. The root cause of this mistake is often a form of static thinking in dynamic situations that the here and now is always a representative test. The basic assumption we have to make to interpret observed velocities in terms of mass is that systems are in (or close to) gravitational equilibrium so that the kinetic energy is a measure of the gravitational potential. In most places, this is a good assumption, so we tend to forget we even made it.

However, no assumption is ever perfect. For example, Gaia has revealed a wealth of subtle non-equilibrium effects in the Milky Way. These are not so large as to invalidate the basic inference of the mass discrepancy, but neither can they be entirely ignored. Even maintaining the assumption of a symmetric but non-smooth mass profile in equilibrium complicates the analysis.

Since the apparent absence of dark matter is unexpected in either theory, one needs to question the assumptions whenever this inference is made. There is one situation in which it is expected, so let’s consider that special case:

Tidal dwarf galaxies

Most dwarf galaxies are primordial – they are the way they are because they formed that way. However, it is conceivable that some dwarfs may form in the tidal debris of collisions between large galaxies. These are tidal dwarf galaxies (TDGs). Here are some examples of interacting systems containing candidate TDGs:

Fig. 1 from Lelli et al. (2015): images of interacting systems with TDG candidates noted in yellow.

I say candidate TDGs because it is hard to be sure a particular object is indeed tidal in origin. A good argument can be made that TDGs require such special conditions to form that perhaps they should not be able to form at all. As debris in tidal arms is being flung about in the (~ 200 km/s) potential well of a larger system, it is rather challenging for material to condense into a knot with a much smaller potential well (< 50 km/s). It can perhaps happen if the material in the tidal stream is both lumpy (to provide a seed to condense on) and sufficiently comoving (i.e., the tidal shear of the larger system isn’t too great), so maybe it happens on rare occasions. One way to distinguish TDGs from primordial dwarfs is metallicity: typical primordial dwarfs have low metallicity while TDGs have the higher metallicity of the giant system that is the source of the parent material.

A clean test of hypotheses

TDGs provide an interesting test of dark matter and MOND. In the vast majority of dark matter models, dark matter halos are dynamically hot, quasi-spherical systems with the particles that compose the dark matter (whatever it is) on eccentric, randomly oriented orbits that sum to a big, messy blob. Arguably it has to be this way in order to stabilize the disks of spiral galaxies. In contrast, the material that composes the tidal tails in which TDGs form originates in the baryonic material of the dynamically cold spiral disks where orbits are nearly circular in the same direction in the same thin plane. The phase space – the combination of position x,y,z and momentum vx,vy,vz – of disk and halo couldn’t be more different. This means that when two big galaxies collide or have a close interaction, everything gets whacked and the two components go their separate ways. Starting in orderly disks, the stars and gas make long, coherent tidal tails. The dark matter does not. The expectation from these basic phase space considerations is consistent with detailed numerical simulations.

We now have a situation in which the dark matter has been neatly segregated from the luminous matter. Consequently, if TDGs are able to form, they must do it only* with baryonic mass. The ironic prediction of a universe dominated by dark matter is that TDGs should be devoid of dark matter.

In contrast, one cannot “turn off” the force law in MOND. MOND can boost the formation of TDGs in the first place, but if said TDGs wind up in the low acceleration regime, they must evince a mass discrepancy. So the ironic prediction here is that, in ignorance of MOND, MOND means that we would infer that TDGs do have dark matter.

Got that? Dark matter predicts TDGs with no dark matter. MOND predicts TDGs that look like they do have dark matter. That’s not confusing at all.

Clean in principle, messy in practice

Tests of these predictions have a colorful history. Bournaud et al. (2007) did a lovely job of combining simulations with observations of the Seashell system (NGC 5291 above) and came to a striking conclusion: the rotation curves of TDGs exceeded that expected for the baryons alone:

Fig. 2 from Bournaud et al. (2007) showing the rotation curves for the three TDGs identified in the image above.

This was a strange, intermediary result. TDGs had more dark matter than the practically zero expected in LCDM, but less than comparable primordial dwarfs as expected in MOND. That didn’t make sense in either theory. They concluded that there must be a component of some other kind of dark matter that was not the traditional dark halo, but rather part of the spiral disk to begin with, perhaps unseen baryons in the form of very cold molecular gas.

Gentile et al. (2007) reexamined the situation, and concluded that the inclinations could be better constrained. When this was done, the result was more consistent with the prediction of MOND and the baryonic Tully-Fisher relation (BTFR. See their Fig. 2).

Fig. 1 from Gentile et al. (2007): Rotation curve data (full circles) of the 3 tidal dwarf galaxies (Bournaud et al. 2007). The lower (red) curves are the Newtonian contribution Vbar of the baryons (and its uncertainty, indicated as dotted lines). The upper (black) curves are the MOND prediction and its uncertainty (dotted lines). The top panels have as an implicit assumption (following Bournaud et al.) an inclination angle of 45 degrees. In the middle panels the inclination is a free parameter, and the bottom panels show the fits made with the first estimate for the external field effect (EFE).

Clearly there was room for improvement, both in data quality and quantity. We decided to have a go at it ourselves, ultimately leading to Lelli et al. (2015), which is the source of the pretty image above. We reanalyzed the Seashell system, along with some new TDG candidates.

Making sense of these data is not easy. TDG candidates are embedded in tidal features. It is hard to know where the dwarf ends and the tidal stream begins, or even to be sure there is a clear distinction. Here is an example of the northern knot in the Seashell system:

Fig. 5 from Lelli et al. (2015): Top panels: optical image (left), total H I  map (middle), and H I  velocity field (right). The dashed ellipse corresponds to the disc model described in Sect. 5.1. The cross and dashed line illustrate the kinematical centre and major axis, respectively. In the bottom-left corner, we show the linear scale (optical image) and the H I  beam (total H I  map and velocity field) as given in Table 6. In the total H I  map, contours are at ~4.5, 9, 13.5, 18, and 22.5 M pc-2. Bottom panels: position-velocity diagrams obtained from the observed cube (left), model cube (middle), and residual cube (right) along the major and minor axes. Solid contours range from 2σ to 8σ in steps of 1σ. Dashed contours range from −2σ to −4σ in steps of −1σ. The horizontal and vertical lines correspond to the systemic velocity and dynamical centre, respectively.

Both the distribution of gas and the velocities along the tidal tail often blend smoothly across TDG candidates, making it hard to be sure they have formed a separate system. In the case above, I can see what we think is the velocity field of the TDG alone (contained by the ellipse in the upper right panel), but is that really an independent system that has completely decoupled from the tidal material from which it formed? Definite maybe!

Federico Lelli did amazing work to sort through these difficult-to-interpret data. At the end of the day, he found that there was no need for dark matter in any of these TDG candidates. The amplitude of the apparent circular speed was consistent with the enclosed mass of baryons.

Figs. 11 and 13 from Lelli et al. (2015): the enclosed dynamical-to-baryonic mass ratio (left) and baryonic Tully-Fisher relation (right). TDGs (red points) are consistent with a mass ratio of unity: the observed baryons suffice; no dark matter is inferred. Contrary to Gentile et al., this manifests as a clear offset from the BTFR followed by normal galaxies.

Taken at face value, this absence of dark matter is a win for a universe made of dark matter and a falsification of MOND.

So we were prepared to say that, and did, but as Federico checked the numbers, it occurred to him to check the timescales. Mergers like this happen over the course of a few hundred million years, maybe a billion. The interactions we observe are ongoing; just how far into the process are they? Have the TDGs had time to settle down into dynamical equilibrium? That is the necessary assumption built into the mass ratio plotted above: the dynamical mass assumes the measured speed is that of a test particle in an equilibrium orbit. But these systems are manifestly not in equilibrium, at least on large scales. Maybe the TDGs have had time to settle down?

We can ask how long it takes to make an orbit at the observed speed, which is low by the standards of such systems (hence their offset from Tully-Fisher). To quote from the conclusions of the paper,

These [TDG] discs, however, have orbital times ranging from ~1 to ~3 Gyr, which are significantly longer than the TDG formation timescales (≲1 Gyr). This raises the question as to whether TDGs have had enough time to reach dynamical equilibrium.

Lelli et al. (2015)

So no, not really. We can’t be sure the velocities are measuring the local potential well as we want them to do. A particle should have had time to go around and around a few times to settle down in a new equilibrium configuration; here they’ve made 1/3, maybe 1/2 half of one orbit. Things have not had time to settle down, so there’s not really a good reason to expect that the dynamical mass calculation is reliable.

It would help to study older TDGs, as these would presumably have had time to settle down. We know of a few candidates, but as systems age, it becomes harder to gauge how likely they are to be legitimate TDGs. When you see a knot in a tidal arm, the odds seem good. If there has been time for the tidal stream to dissipate, it becomes less clear. So if such a thing turns out to need dark matter, is that because it is a TDG doing as MOND predicted, or just a primordial dwarf we mistakenly guessed was a TDG?

We gave one of these previously unexplored TDG candidates to a grad student. After much hard work combining observations from both radio and optical telescopes, she has demonstrated that it isn’t a TDG at all, in either paradigm. The metallicity is low, just as it should be for a primordial dwarf. Apparently it just happens to be projected along a tidal tail where it looks like a decent candidate TDG.

This further illustrates the trials and tribulations we encounter in trying to understand our vast universe.


*One expects cold dark matter halos to have subhalos, so it seems wise to suspect that perhaps TDGs condense onto these. Phase space says otherwise. It is not sufficient for tidal debris to intersect the location of a subhalo, the material must also “dock” in velocity space. Since tidal arms are being flung out at the speed that is characteristic of the giant system, the potential wells of the subhalos are barely speed bumps. They might perturb streams, but the probability of them being the seeds onto which TDGs condense is small: the phase space just doesn’t match up for the same reasons the baryonic and dark components get segregated in the first place. TDGs are one galaxy formation scenario the baryons have to pull off unassisted.

The minimum acceleration in intergalactic space

The minimum acceleration in intergalactic space

A strange and interesting aspect of MOND is the External Field Effect (EFE). If physics is strictly local, it doesn’t matter what happens outside of an experimental apparatus, only inside it. Examples of gravitational experiments include an Eötvös-style apparatus in a laboratory or a dwarf galaxy in space: in each case, test masses/stars respond to each other’s gravity.

The MOND force depends on the acceleration from all sources; it is not strictly local. Consequently, the results of a gravitational experiment depend on the environment in which it happens. An Eötvös experiment sitting in a laboratory on the surface of the Earth feels the one gee of acceleration due to the Earth and remains firmly in the Newtonian regime no matter how small an inter-particle acceleration experimenters achieve within the apparatus. This is the way in which MOND breaks the strong equivalence principle (but not the weak or Einstein equivalence principle).

A dwarf galaxy in the depths of intergalactic space behaves differently from an otherwise identical dwarf that is the satellite of a giant galaxy. In the isolated case, only the dwarf’s internal acceleration matters. For the typical low surface brightness dwarf galaxy, the internal acceleration gin due to self-gravity is deep in the MOND regime (gin < a0). In contrast, a dwarf satellite with the same internal acceleration gin is also subject to an external orbital acceleration gex around the host that may be comparable to or even greater than its internal acceleration. Both of those accelerations matter, so the isolated case (gin < a0) is deeper in the MOND regime and will evince a larger acceleration discrepancy than when the same dwarf is proximate to a giant galaxy and in the EFE regime (gin < gin+gex < a0)*. This effect is observed in the dwarfs of the Local Group.

The same effect holds everywhere in the universe. There should be a minimum acceleration due to the net effect of everything: galaxies, clusters, filaments in the intergalactic medium (IGM); anything and everything add up to a nonzero acceleration everywhere. I first attempted to estimate this myself in McGaugh & de Blok (1998), obtaining ~0.026 Å s-2, which is about 2% of a0 (1.2 Å s-2). This is a tiny fraction of a tiny number, but it is practically+ never zero: it’s as low as you can go, an effective minimum acceleration experienced even in the darkest depths of intergalactic space.

One can do better nowadays. The community has invested a lot in galaxy surveys; one can use those to construct a map of the acceleration the observed baryonic mass predicts in MOND. We did this in Chae et al. (2021) using a number of surveys. This gets us more than just a mean number as I guestimated in 1998, but also a measure of its variation.

Here is a map of the expected Newtonian acceleration across the sky for different ranges of distance from us. Blue is low acceleration; yellow higher. Glossing over some minor technical details, the corresponding MONDian acceleration is basically the square root (a0 gN)1/2, so 2% of a0 corresponds to log(eN) = -3.4 in the following plots, where eN is the Newtonian environmental acceleration: what Newton would predict for the visible galaxies alone.

Figure 4 from Chae et al. (2021): All-sky distributions of the environmental field acceleration gNe,env from 2M++ galaxies and MCXC clusters in Mollweide projection and equatorial coordinates averaged across various distance ranges. The locations of SPARC galaxies with independent estimates of gNe from RC fits are shown as points with color reflecting stronger (red) or weaker (blue) EFE and with the opacity of each point increasing with its accuracy.

The EFE imposes an effect on all objects, even giant galaxies, which we were trying to estimate in Chae et al. (2021) – hence the dots in the above maps. Each of those dots is a galaxy for which we had made an estimate of the EFE from its effect on the rotation curve. This is a subtle effect that is incredibly hard to constrain, but there is a signal when all galaxies are considered statistically in aggregate. It does look like the EFE is at work, but we can’t yet judge whether its variation from place to place matches the predicted map. Still, we obtained values for the acceleration in intergalactic space that are in the same realm as my crude early estimate.

Here’s another way to look at it. The acceleration is plotted as a function of distance, with the various colors corresponding to different directions on the sky. So where above we’re looking at maps of the sky in different distance bins, here we’re looking as a function of distance but relying on the color bar to indicate different directions. There is a fair amount of variation: some places have more structure and others less with a corresponding variation in the acceleration field.

Figure 5 from Chae et al. (2021)Variation of eN,env with distance for the galaxies in the NSA and Karachentsev catalogs. Individual galaxies are color-coded by right ascension (RA). The black lines show the mean trend (solid) and standard deviation (dashed) in bins of distance. This plot assumes the “max clustering” model for the missing baryons (see Figure 6, below).

Different catalogs have been used to map the structure here, but the answer comes out pretty much the same, but for one little (big) detail: how clustered are the baryons? The locations of the galaxies have been well mapped, so we can turn that into a map of their gravitational field. But we also know that galaxies are not the majority of the baryons. So where are the rest? Are they clustered like galaxies, or spread uniformly through intergalactic space?

When we did this, we knew a lot of baryons were in the IGM, but it really wasn’t clear how clustered they might be. So we took two limiting cases by assuming (1) all the baryons were as clustered as the galaxies or (2) not clustered at all, just a uniform background. This makes a difference since a uniform background, being uniform, doesn’t contribute. There’s as much force from this direction as that, and it cancels itself out, leading to a lower overall amplitude for the environmental acceleration field.

Figure 6 from Chae et al. (2021): Variation of eN,env with distance for the SPARC galaxies within the NSA footprint. The “max clustering” model (blue) assumes that missing baryons are effectively coincident with observed structures, while the “no clustering” model (orange) distributes them uniformly in space. See Section 3.2.1 for details.

That’s where the new result reported last time comes in. We now know that the missing baryons were all in the IGM. Indeed, the split is 1/4 clustered, 3/4 not. So something closer to (2), the “no clustering” limit above. That places the minimum acceleration in intergalactic space around log(eN) = -3.5, which is very close to the 2% of a0 that I estimated last century.

The diffuse IGM is presumably not perfectly uniform. There are large scale filaments and wall around giant voids. This structure will contribute variations in the local minimum acceleration, as visualized in this MOND structure formation simulation by Llinares (2008):

Figures 7 & 9 of Llinares (2008): The simulated density field (left) and modulus (right) of the MONDian force |∇ΦM | at z = 0 normalized by g2a0. For values above 1 the particles are in the Newtonian regime whereas values below 1 indicate the MOND regime.

Surveys for fast radio bursts are very sensitive to variations in the free electron density along the line of sight. Consequently, they can be used to map out structure in the IGM. The trick is that we need to cover lots of the sky with them – the denser the tracers, the better. That means discovering lots of them all over the sky, a task the DSA-110 was built to do.

I sure hope NSF continues to fund it.


*I write gin < gin+gex for simplicity, but strictly speaking the acceleration is a vector quantity so it is possible for the orientation of gin and gex to oppose one another so that their vector sum cancels out. This doesn’t happen often, but in periodic orbits it will always happen at some moment, with further interesting consequences. The more basic point is that the amplitude of the discrepancy scales with the ratio a0/g: the lower the acceleration g, the bigger the discrepancy from Newton – or, equivalently, the more dark matter we appear to need. The discrepancy of the isolated case a0/gin is larger than the discrepancy of the non-isolated case a0/(gin+gex) just because gin+gex > gin.

+A test of MOND using Lyman-alpha clouds was proposed by Aguirre et al (2001). These tiny puffs of intergalactic gas have very low internal accelerations, so should evince much larger discrepancies than observed. Or at least that was their initial interpretation, until I pointed out that the EFE from large scale structures would be the dominant effect. They argued it was still a problem, albeit a much smaller one than initially estimated. I don’t think it is a problem at all, because the amplitude of the EFE is so uncertain. Indeed, they made an estimate of the EFE at the relevant redshifts that depended on the rate of structure formation being conventional, which it is not in MOND. Lyman-alpha clouds are entirely consistent with MOND when one takes into account the more rapid growth of structure.

Dark Matter or Modified Gravity? A virtual panel discussion

Dark Matter or Modified Gravity? A virtual panel discussion

This is a quick post to announce that on Monday, April 7 there will be a virtual panel discussion about dark matter and MOND involving Scott Dodelson and myself. It will be moderated by Orin Harris at Northeastern Illinois University starting at 3pm US Central time*. I asked Orin if I should advertise it more widely, and he said yes – apparently their Zoom set up has a capacity for a thousand attendees.

See their website for further details. If you wish to attend, you need to register in advance.


*That’s 4PM EDT to me, which is when I’m usually ready for a nap.

Things I don’t understand in modified dynamics (it’s cosmology)

Things I don’t understand in modified dynamics (it’s cosmology)

I’ve been busy, and a bit exhausted, since the long series of posts on structure formation in the early universe. The thing I like about MOND is that it helps me understand – and successfully predict – the dynamics of galaxies. Specific galaxies that are real objects: one can observe this particular galaxy and predict that it should have this rotation speed or velocity dispersion. In contrast, LCDM simulations can only make statistical statements about populations of galaxy-like numerical abstractions, they can never be equated to real-universe objects. Worse, they obfuscate rather than illuminate. In MOND, the observed centripetal acceleration follows directly from that predicted by the observed distribution of stars and gas. In simulations, this fundamental observation is left unaddressed, and we are left grasping at straws trying to comprehend how the observed kinematics follow from an invisible, massive dark matter halo that starts with the NFW form but somehow gets redistributed just so by inadequately modeled feedback processes.

Simply put, I do not understand galaxy dynamics in terms of dark matter, and not for want of trying. There are plenty of people who claim to do so, but they appear to be fooling themselves. Nevertheless, what I don’t like about MOND is the same thing that they don’t like about MOND which is that I don’t understand the basics of cosmology with it.

Specifically, what I don’t understand about cosmology in modified dynamics is the expansion history and the geometry. That’s a lot, but not everything. The early universe is fine: the expanding universe went through an early hot phase that bequeathed us with the relic radiation field and the abundances of the light elements through big bang nucleosynthesis. There’s nothing about MOND that contradicts that, and arguably MOND is in better agreement with BBN than LCDM, there being no tension with the lithium abundance – this tension was not present in the 1990s, and was only imposed by the need to fit the amplitude of the second peak in the CMB.

But we’re still missing some basics that are well understood in the standard cosmology, and which are in good agreement with many (if not all) of the observations that lead us to LCDM. So I understand the reluctance to admit that maybe we don’t know as much about the universe as we think we do. Indeed, it provokes strong emotional reactions.

Screenshot from Dr. Strangelove paraphrasing Major Kong (original quote at top).

So, what might the expansion history be in MOND? I don’t know. There are some obvious things to consider, but I don’t find them satisfactory.

The Age of the Universe

Before I address the expansion history, I want to highlight some observations that pertain to the age of the universe. These provide some context that informs my thinking on the subject, and why I think LCDM hits pretty close to the mark in some important respects, like the time-redshift relation. That’s not to say I think we need to slavishly obey every detail of the LCDM expansion history when constructing other theories, but it does get some things right that need to be respected in any such effort.

One big thing I think we should respect are constraints on the age of the universe. The universe can’t be younger than the objects in it. It could of course be older, but it doesn’t appear to be much older, as there are multiple, independent lines of evidence that all point to pretty much the same age.

Expansion Age: The first basic is that if the universe is expanding, it has a finite age. You can imagine running the expansion in reverse, looking back in time to when the universe was progressively smaller, until you reach an incomprehensibly dense initial phase. A very long time, to be sure, but not infinite.

To put an exact number on the age of the universe, we need to know its detailed expansion history. That is something LCDM provides that MOND does not pretend to do. Setting aside theory, a good ball park age is the Hubble time, which is the inverse of the Hubble constant. This is how long it takes for a linearly expanding, “coasting” universe to get where it is today. For the measured H0 = 73 km/s/Mpc, the Hubble time is 13.4 Gyr. Keep that number in mind for later. This expansion age is the metric against which to compare the ages of measured objects, as discussed below.

Globular Clusters: The most famous of age constraints is provided by the ancient stars in globular clusters. One of the great accomplishments of 20th century astrophysics is a masterful understanding of the physics of stars as giant nuclear fusion reactors. This allows us to understand how stars of different mass and composition evolve. That, in turn, allows us to put an age on the stars in clusters. Globulars are the oldest of clusters, with a mean age of 13.5 Gyr (Valcin et al. 2021). Other estimates are similar, though I note that the age determinations depends on the distance scale, so keeping them rigorously separate from Hubble constant determinations has historically been a challenge. The covariance of age and distance renders the meaning of error bars rather suspect, but to give a flavor, the globular cluster M92 is estimated to have an age of 13.80±0.75 Gyr (Jiaqi et al. 2023).

Though globular clusters are the most famous in this regard, there are other constraints on the age of the contents of the universe.

White dwarfs: White dwarfs are the remnants of dead stars that were never massive enough to have exploded as supernova. The over/under line for that is about 8 solar mass; the oldest white dwarfs will be the remnants of the first stars that formed just below this threshold. Such stars don’t take long to evolve, around 100 Myr. That’s small compared to the age of the universe, so the first white dwarfs have just been cooling off ever since their progenitors burned out.

As the remnants of the incredibly hot cores of former stars, white dwarfs star off hot but cool quickly by radiating into space. The timescale to cool off can be crudely estimated from first principles just from the Stefan-Boltzmann law. As with so many situations in astrophysics, some detailed radiative transfer calculations are necessary to get the answer right in detail. But the ballpark of the back-of-the-envelope answer is not much different from the detailed calculation, giving some confidence in the procedure: we have a good idea of how long it takes white dwarfs to cool.

Since white dwarfs are not generating new energy but simply radiating into space, their luminosity fades over time as their surface temperature declines. This predicts that there will be a sharp drop in the numbers of white dwarfs corresponding to the oldest such objects: there simply hasn’t been enough time to cool further. The observational challenge then becomes finding the faint edge of the luminosity function for these intrinsically faint sources.

Despite the obvious challenges, people have done it, and after great effort, have found the expected edge. Translating that into an age, we get 12.5+1.4/-3.5 Gyr (Munn et al. 2017). This seems to hold up well now that we have Gaia data, which finds J1312-4728 to be the oldest known white dwarf at 12.41±0.22 Gyr (Torres et al. 2021). To get to the age of the universe, one does have to account for the time it takes to make a white dwarf in the first place, which is of order a Gyr or less, depending on the progenitor and when it formed in the early universe. This is pretty consistent with the ages of globular clusters, but comes from different physics: radiative cooling is the dominant effect rather than the hydrogen fusion budget of main sequence stars.

Radiochronometers: Some elements decay radioactively, so measuring their isotopic abundances provides a clock. Carbon-14 is a famous example: with a half-life of 5,730 years, its decay provides a great way to date the remains of prehistoric camp sites and bones. That’s great over some tens of thousands of years, but we need something with a half-life of order the age of the universe to constrain that. One such isotope is 232Thorium, with a half life of 14.05 Gyr.

Making this measurement requires that we first find stars that are both ancient and metal poor but with detectable Thorium and Europium (the latter providing a stable a reference). Then one has to obtain a high quality spectrum with which to do an abundance analysis. This is all hard work, but there are some examples known.

Sneden‘s star, CS 22892-052, fits the bill. Long story short, the measured Th/Eu ratio gives an age of 12.8±3 Gyr (Sneden et al. 2003). A similar result of ~13 Gyr (Frebel & Kratz 2009) is obtained from 238U (this “stable” isotope of uranium has a half-life of 4.5 Gyr, as opposed to the kind that can be provoked into exploding, 235U, which has a half-life of 700 Myr). While the search for the first stars and the secrets they may reveal is ongoing, the ages for individual stars estimated from radioactive decay are consistent with the ages of the oldest globular clusters indicated by stellar evolution.

Interstellar dust grains: The age of the solar system (4.56 Gyr) is well known from the analysis of isotopic abundances in meteorites. In addition to tracing the oldest material in the solar system, sometimes it is possible to identify dust grains of interstellar origin. One can do the same sort of analysis, and do the sum: how long did it take the star that made those elements to evolve, return them to the interstellar medium, get mixed in with the solar nebula, and lurk about in space until plunging to the ground as a meteorite that gets picked up by some scientifically-inclined human. This exercise has been done by Nittler et al. (2008), who estimate a total age of 13.7±1.3 Gyr

Taken in sum, all these different age indicators point to a similar, consistent age between 13 and 14 billion years. It might be 12, but not lower, nor is there reason to think it would be much higher: 15 is right out. I say that flippantly because I couldn’t resist the Monty Python reference, but the point is serious: you could in principle have a much older universe, but then why are all the oldest things pretty much the same age? Why would the universe sit around doing nothing for billions of years then suddenly decide to make lots of stars all at once? The more obvious interpretation is that the age of the universe is indeed in the ballpark of 13.something Gyr.

Expansion history

The expansion history in the standard FLRW universe is governed by the Friedmann equation, which we can write* as

H2(z) = H02m(1+z)3k(1+z)2Λ]

where z is the redshift, H(z) is the Hubble parameter, H0 is its current value, and the various Ω are the mass-energy density of stuff relative to the critical density: the mass density Ωm, the geometry Ωk, and the cosmological constant ΩΛ. I’ve neglected radiation for clarity. One can make up other stuff X and add a term for it as ΩX which will have an associated (1+z) term that depends on the equation of state of X. For our purposes, both normal matter and non-baryonic cold dark matter (CDM) share the same equation of state (cold meaning non-relativisitic motions meaning rest-mass density but negligible pressure), so both contribute to the mass density Ωm = ΩbCDM.

Note that since H(z=0)=H0, the various Ω’s have to sum to unity. Thus a cosmology is geometrically flat with the curvature term Ωk = 0 if ΩmΛ = 1. Vanilla LCDM has Ωm = 0.3 and ΩΛ = 0.7. As a community, we’ve become very sure of this, but that the Friedmann equation is sufficient to describe the expansion history of the universe is an assumption based on (1) General Relativity providing a complete description, and (2) the cosmological principle (homogeneity and isotropy) holds. These seem like incredibly reasonable assumptions, but let’s bear in mind that we only know directly about 5% of the sum of Ω’s, the baryons. ΩCDM = 0.25 and ΩΛ = 0.7 are effectively fudge factors we need to make things works out given the stated assumptions. LCDM is viable if and only if cold dark matter actually exists.

Gravity is an attractive force, so the mass term Ωm acts to retard the expansion. Early on, we expected this to be the dominant term due to the (1+z)3 dependence. In the long-presumed+ absence of a cosmological constant, cosmology was the search for two numbers: once H0 and Ωm are specified, the entire expansion history is known. Such a universe can only decelerate, so only the region below the straight line in the graph below is accessible; an expansion history like the red one representing LCDM should be impossible. That lots of different data seemed to want this is what led us kicking and screaming to rehabilitate the cosmological constant, which acts as a form of anti-gravity to accelerate an expansion that ought to be decelerating.

The expansion factor maps how the universe has grown over time; it corresponds to 1/(1+z) in redshift so that z → ∞ as t → 0. The “coasting” limit of an empty universe (H0 = 73, Ωm = ΩΛ = 0) that expands linearly is shown as the straight line. The red line is the expansion history of vanilla LCDM (H0 = 70, Ωm = 0.3, ΩΛ = 0.7).

The over/under between acceleration/deceleration of the cosmic expansion rate is the coasting universe. This is the conceptually useful limit of a completely empty universe with Ωm = ΩΛ = 0. It expands at a steady rate that neither accelerates nor decelerates. The Hubble time is exactly equal to the age of such a universe, i.e., 13.4 Gyr for H0 = 73.

LCDM has a more complicated expansion history. The mass density dominates early on, so there is an early phase of deceleration – the red curve bends to the right. At late times, the cosmological constant begins to dominate, reversing the deceleration and transforming it into an acceleration. The inflection point when it switches from decelerating to accelerating is not too far in the past, which is a curious coincidence given that the entire future of such a universe will be spent accelerating towards the exponential expansion of the de Sitter limit. Why do we live anywhen close to this special time?

Lots of ink has been spilled on this subject, and the answer seems to boil down to the anthropic principle. I find this lame and won’t entertain it further. I do, however, want to point out a related strange coincidence: the current age of vanilla LCDM (13.5 Gyr) is the same as that of a coasting universe with the locally measured Hubble constant (13.4 Gyr). Why should these very different models be so close in age? LCDM decelerates, then accelerates; there’s only one moment in the expansion history of LCDM when the age is equal to the Hubble time, and we happen to be living just then.

This coincidence problem holds for any viable set of LCDM parameters, as they all have nearly the same age. Planck LCDM has an age of 13.7 Gyr, still basically the same as the Hubble time for the locally measured Hubble constant. The lower Planck Hubble value is balanced by a larger amount of early-time deceleration. The universe reaches its current point after 13.something Gyr in all of these models. That’s in good agreement with the ages of the oldest observed stars, which is encouraging, but it does nothing to help us resolve the Hubble tension, much less constrain alternative cosmologies.

Cosmic expansion in MOND

There is no equivalent to the Friedmann equation is in MOND. This is not satisfactory. As an extension of Newtonian theory, MOND doesn’t claim to encompass cosmic phenomena$ – hence the search for a deeper underlying theory. Lacking this, what can we try?

Felten (1984) tried to derive an equivalent to the Friedmann equation using the same trick that can be used with Newtonian theory to recover the expansion dynamics in the absence of a cosmological constant. This did not work. The result was unsatisfactory& for application to the whole universe because the presence of a0 in the equations makes the result scale-dependent. So how big the universe is matters in a way that the standard cosmology does not; there’s no way to generalize is to describe the whole enchilada.

In retrospect, what Felten had really obtained was a solution for the evolution of a top-hat over-density: the dynamics of a spherical region embedded in an expanding universe. This result is the basis for the successful prediction of early structure formation in MOND. But once again it only tells us about the dynamics of an object within the universe, not the universe itself.

In the absence of a complete theory, one makes an ansatz to proceed. If there is a grander theory that encompasses both General Relativity and MOND, then it must approach both in the appropriate limit, so an obvious ansatz to make is that the entire universe obeys the conventional Friedmann equation while the dynamics of smaller regions in the low acceleration regime obey MOND. Both Bob Sanders and I independently adopted this approach, and explicitly showed that it was consistent with the constraints that were known at the time. The first obvious guess for the mass density of such a cosmology is Ωm = Ωb = 0.04. (This was the high end of BBN estimates at the time, so back then we also considered lower values.) The expansion history of this low density, baryon-only universe is shown as the blue line below:

As above, but with the addition of a low density, baryon-dominated, no-CDM universe (H0 = 73, Ωm = Ωb = 0.04, ΩΛ = 0; blue line).

As before, there is not much to choose between these models in terms of age. The small but non-zero mass density does cause some early deceleration before the model approaches the coasting limit, so the current age is a bit lower: 12.6 Gyr. This is on the small side, but not problematically so, or even particularly concerning given the history of the subject. (I’m old enough to remember when we were pretty sure that globular clusters were 18 Gyr old.)

The time-redshift relation for the no-CDM, baryon-only universe is somewhat different from that of LCDM. If we adopt it, then we find that MOND-driven structure forms at somewhat higher redshift than in with the LCDM time-redshift relation. The benchmark time of 500 Myr for L* galaxy formation is reached at z = 15 rather than z = 9.5 as in LCDM. This isn’t a huge difference, but it does mean that an L* galaxy could in principle appear even earlier than so far seen. I’ve stuck with LCDM as the more conservative estimate of the time-redshift relation, but the plain fact is we don’t really know what the universe is doing at those early times, or if the ansatz we’ve made holds well enough to do this. Surely it must fail at some point, and it seems likely that we’re past that point.

There is a bigger problem with the no-CDM model above. Even if it is close to the right expansion history, it has a very large negative curvature. The geometry is nowhere close to the flat Robertson-Walker metric indicated by the angular diameter distance to the surface of last scattering (the CMB).

Geometry

Much of cosmology is obsessed with geometry, so I will not attempt to do the subject justice. Each set of FLRW parameters has a specific geometry that comes hand in hand with its expansion history. The most sensitive probe we have of the geometry is the CMB. The a priori prediction of LCDM was that its flat geometry required the first acoustic peak to have a maximum near one degree on the sky. That’s exactly what we observe.

Fig. 45 from Famaey & McGaugh (21012): The acoustic power spectrum of the cosmic microwave background as observed by WMAP [229] together with the a priori predictions of ΛCDM (red line) and no-CDM (blue line) as they existed in 1999 [265] prior to observation of the acoustic peaks. ΛCDM correctly predicted the position of the first peak (the geometry is very nearly flat) but over-predicted the amplitude of both the second and third peak. The most favorable a priori case is shown; other plausible ΛCDM parameters [468] predicted an even larger second peak. The most important parameter adjustment necessary to obtain an a posteriori fit is an increase in the baryon density Ωb, above what had previously been expected from BBN. In contrast, the no-CDM model ansatz made as a proxy for MOND successfully predicted the correct amplitude ratio of the first to second peak with no parameter adjustment [268, 269]. The no-CDM model was subsequently shown to under-predict the amplitude of the third peak [442], so no model can explain these data without post-hoc adjustment.

In contrast, no-CDM made the correct prediction for the first-to-second peak amplitude ratio, but it is entirely ambivalent about the geometry. FLRW cosmology and MOND dynamics care about incommensurate things in the CMB data. That said, the naive prediction of the baryon-only model outlined above is that the first peak should occur around where the third peak is observed. That is obviously wrong.

Since the geometry is not a fundamental prediction of MOND, the position of the first peak is easily fit by invoking the same fudge factor used to fit it conventionally: the cosmological constant. We need a larger ΩΛ = 0.96, but so what? This parameter merely encodes our ignorance: we make no pretense to understand it, let alone vesting deep meaning in it. It is one of the things that a deeper theory must explain, and can be considered as a clue in its development.

So instead of a baryon-only universe, our FLRW proxy becomes a Lambda-baryon universe. That fits the geometry, and for an optical depth to the surface of last scattering of τ = 0.17, matches the amplitude of the CMB power spectrum and correctly predicts the cosmic dawn signal that EDGES claimed to detect. Sounds good, right? Well, not entirely. It doesn’t fit the CMB data at L > 600, but I expected to only get so far with the no-CDM, so it doesn’t bother me that you need a better underlying theory to fit the entire CMB. Worse, to my mind, is that the Lambda-baryon proxy universe is much, much older than everything in it: 22 Gyr instead of 13.something.

As above, but now with the addition of a low density, Lambda-dominated universe (H0 = 73, Ωm = Ωb = 0.04, ΩΛ = 0.96; dashed line).

This just don’t seem right. Or even close to right. Like, not even pointing in a direction that might lead to something that had a hope of being right.

Moreover, we have a weird tension between the baryon-only proxy and the Lambda-baryon proxy cosmology. The baryon-only proxy has a plausible expansion history but an unacceptable geometry. The Lambda-baryon proxy has a plausible geometry by an implausible expansion history. Technically, yes, it is OK for the universe to be much older than all of its contents, but it doesn’t make much sense. Why would the universe do nothing for 8 or 9 Gyr, then burst into a sudden frenzy of activity? It’s as if Genesis read “for the first 6 Gyr, God was a complete slacker and did nothing. In the seventh Gyr, he tried to pull an all-nighter only to discover it took a long time to build cosmic structure. Then He said ‘Screw it’ and fudged Creation with MOND.”

In the beginning the Universe was created.
This has made a lot of people very angry and been widely regarded as a bad move.

Douglas Adams, The Restaurant at the End of the Universe

So we can have a plausible geometry or we can have a plausible expansion history with a proxy FLRW model, but not both. That’s unpleasant, but not tragic: we know this approach has to fail somehow. But I had hoped for FLRW to be a more coherent first approximation to the underlying theory, whatever it may be. If there is such a theory, then both General Relativity and MOND are its limits in their respective regimes. As such, FLRW ought to be a good approximation to the underlying entity up to some point. That we have to invoke both non-baryonic dark matter and a cosmological constant is a hint that we’ve crossed that point. But I would have hoped that we crossed it in a more coherent fashion. Instead, we seem to get a little of this for the expansion history and a little of that for the geometry.

I really don’t know what the solution is here, or even if there is one. At least I’m not fooling myself into presuming it must work out.


*There are other ways to write the Friedmann equation, but this is a useful form here. For the mathematically keen, the Hubble parameter is the time derivative of the expansion factor normalized by the expansion factor, which in terms of redshift is

H(z) = -(dz/dt)/(1+z)2.

This quantity evolves, leading us to expect evolution in Milgrom’s constant if we associate it with the numerical coincidence

2π a0 = cH0

If the Hubble parameter evolves, as it appears to do, it would seem to follow that so should a(z) ~ H(z) – otherwise the coincidence is just that: a coincidence that applies only now. There is, at present, no persuasive evidence that a0 evolves with redshift.

A similar order-of-magnitude association can be made with the cosmological constant,

2π a0 = c2Λ1/2

so conceivably the MOND acceleration scale appears as the result of vacuum effects. It is a matter of judgement whether these numerical coincidences are mere coincidences or profound clues towards a deeper theory. That the proportionality constant is very nearly 2π is certainly intriguing, but the constancy of any of these parameters (including Newton’s G) depends on how they emerge from the deeper theory.


+In January 2019, I was attending a workshop at Princeton when I had a chance encounter with Jim Peebles. He was not attending the workshop, but happened to be walking across campus at the same time I was. We got to talking, and he affirmed my recollection of just how incredibly unpopular the cosmological constant used to be. Unprompted, he went on to make the analogy of how similar that seemed to how unpopular MOND is now.

Peebles was awarded a long-overdue Nobel Prize later that year.


$This is one of the things that makes it tricky to compare LCDM and MOND. MOND is a theory of dynamics in the limit of low acceleration. It makes no pretense to be a cosmological theory. LCDM starts as a cosmological theory, but it also makes predictions about the dynamics of systems within it (or at least the dark matter halos in which visible galaxies are presumed to form). So if one starts by putting on a cosmology hat, there is nothing to talk about: LCDM is the only game in town. But from the perspective of dynamics, it’s the other way around, with LCDM repeatedly failing to satisfactorily explain, much less anticipate, phenomena that MOND predicted correctly in advance.


&An intriguing thing about Felten’s MOND universe is that it eventually recollapses irrespective of the mass density. There is no critical value of Ωm, hence no coincidence problem. MOND is strong enough to eventually reverse the expansion of the universe, it just takes a very long time to do so, depending on the density.

I’m surprised this aspect of the issue was overlooked. The coincidence problem (then mostly called the flatness problem) obsessed people at the time, so much so that its solution by Cosmic Inflation led to its widespread acceptance. That only works if Ωm = 1; LCDM makes the coincidence worse. I guess the timing was off, as Inflation had already captured the community’s imagination by that time, likely making it hard to recognize that MOND was a more natural solution. We’d already accepted the craziness that was Inflation and dark matter; MOND craziness was a bridge too far.

I guess. I’m not quite that old; I was still an undergraduate at the time. I did hear about Inflation then, in glowing terms, but not a thing about MOND.

Kinematics suggest large masses for high redshift galaxies

Kinematics suggest large masses for high redshift galaxies

This is what I hope will be the final installment in a series of posts describing the results published in McGaugh et al. (2024). I started by discussing the timescale for galaxy formation in LCDM and MOND which leads to different and distinct predictions. I then discussed the observations that constrain the growth of stellar mass over cosmic time and the related observation of stellar populations that are mature for the age of the universe. I then put on an LCDM hat to try to figure out ways to wriggle out of the obvious conclusion that galaxies grew too massive too fast. Exploring all the arguments that will be made is the hardest part, not because they are difficult to anticipate, but because there are so many* options to consider. This leads to many pages of minutiae that no one ever seems to read+, so one of the options I’ve discussed (e.g., super-efficient star formation) will likely emerge as the standard picture even if it comes pre-debunked.

The emphasis so far has been on the evolution of the stellar masses of galaxies because that is observationally most accessible. That gives us the opportunity to wriggle, because what we really want to measure to test LCDM is the growth of [dark] mass. This is well-predicted but invisible, so we can always play games to relate light to mass.

Mass assembly in LCDM from the IllustrisTNG50 simulation. The dark matter mass assembles hierarchically in the merger tree depicted at left; the size of the circles illustrates the dark matter halo mass. The corresponding stellar mass of the largest progenitor is shown at right as the red band. This does not keep pace with the apparent assembly of stellar mass (data points), but what is the underlying mass really doing?

Galaxy Kinematics

What we really want to know is the underlying mass. It is reasonable to expect that the light traces this mass, but is there another way to assess it? Yes: kinematics. The orbital speeds of objects in galaxies trace the total potential, including the dark matter. So, how massive were early galaxies? How does that evolve with redshift?

The rotation curve of NGC 6946 traced by stars at small radii and gas farther out. This is a typical flat rotation curve (data points) that exceeds what can be explained by the observed baryonic mass (red line deduced from the stars and gas pictured at right), leading to the inference of dark matter.

The rotation curve for NGC 6946 shows a number of well-established characteristics for nearby galaxies, including the dominance of baryons at small radii in high surface brightness galaxies and the famous flat outer portion of the rotation curve. Even when stars contribute as much mass as allowed by the inner rotation curve (“maximum disk“), there is a need for something extra further out (i.e., dark matter or MOND). In the case of dark matter, the amplitude of flat rotation is typically interpreted as being indicative& of halo mass.

So far, the rotation curves of high redshift galaxies look very much like those of low redshift galaxies. There are some fast rotators at high redshift as well. Here is an example observed by Neeleman et al. (2020), who measure a flat rotation speed of 272 km/s for DLA0817g at z = 4.26. That’s more massive than either the Milky Way (~200 km/s) or Andromeda (~230 km/s), if not quite as big as local heavyweight champion UGC 2885 (300 km/s). DLA0817g looks to be a disk galaxy that formed early and is sedately rotating only 1.4 Gyr after the Big Bang. It is already massive at this time: not at all the little nuggets we expect from the CDM merger tree above.

Fig. 1 from Neeleman et al. (2020): the velocity field (left) and position-velocity diagram (right) of DLA0817g. The velocity field looks like that of a rotating disk with the raw position-velocity diagram shows motions of ~200 km/s on either side of the center. When corrected for inclination, the flat rotation speed is 272 km/s, corresponding to a massive galaxy near the top of the Tully-Fisher relation.

This is anecdotal, of course, but there are a good number of similar cases that are already known. For example, the kinematics of ALESS 073.1 at z ≈ 5 indicate the presence of a massive stellar bulge as well as a rapidly rotating disk (Lelli et al. 2021). A similar case has been observed at z ≈ 6 (Tripodi et al. 2023). These kinematic observations indicate the presence of mature, massive disk galaxies well before they were expected to be in place (Pillepich et al. 2019; Wardlow 2021). The high rotation speeds observed in early disk galaxies sometimes exceed 250 (Neeleman et al. 2020) or even 300 km s−1 (Nestor Shachar et al. 2023; Wang et al. 2024), comparable to the most massive local spirals (Noordermeer et al. 2007; Di Teodoro et al. 2021, 2023). That such rapidly rotating galaxies exist at high redshift indicates that there is a lot of mass present, not just light. We can’t just tweak the mass-to-light ratio of the stars to explain the photometry and also explain the kinematics.

In a seminal galaxy formation paper, Mo, Mao, & White (1998) predicted that “present-day disks were assembled recently (at z ≤ 1).” Today, we see that spiral galaxies are ubiquitous in JWST images up to z ∼ 6 (Ferreira et al. 2022, 2023; Kuhn et al. 2024). The early appearance of massive, dynamically cold (Di Teodoro et al. 2016; Lelli et al. 2018, 2023; Rizzo et al. 2023) disks in the first few billion years after the Big Bang is contradictory the natural prediction of ΛCDM. Early disks are expected to be small and dynamically hot (Dekel & Burkert 2014; Zolotov et al. 2015; Krumholz et al. 2018; Pillepich et al. 2019), but they are observed to be massive and dynamically cold. (Hot or cold in this context means a high or low amplitude of the velocity dispersion relative to the rotation speed; the modern Milky Way is cold with σ ~ 20 km/s and Vc ~ 200 km/s.) Understanding the stability and longevity of dynamically cold spiral disks is foundational to the problem.

Kinematic Scaling Relations

Beyond anecdotal cases, we can check on kinematic scaling relations like Tully–Fisher. These are expected to emerge late and evolve significantly with redshift in LCDM (e.g., Glowacki et al. 2021). In MOND, the normalization of the baryonic Tully–Fisher relation is set by a0, so is immutable for all time if a0 is constant. Let’s see what the data say:

Figure 9 from McGaugh et al (2024)The baryonic Tully–Fisher (left) and dark matter fraction–surface brightness (right) relations. Local galaxy data (circles) are from Lelli et al. (2019; left) and Lelli et al. (2016; right). Higher-redshift data (squares) are from Nestor Shachar et al. (2023) in bins with equal numbers of galaxies color coded by redshift: 0.6 < z < 1.22 (blue), 1.22 < z < 2.14 (green), and 2.14 < z < 2.53 (red). Open squares with error bars illustrate the typical uncertainties. The relations known at low redshift also appear at higher redshift with no clear indication of evolution over a lookback time up to 11 Gyr.

Not much to see: the data from Nestor Shachar et al. (2023) show no clear indication of evolution. The same can be said for the dark matter fraction-surface brightness relation. (Glad to see that being plotted after I pointed it out.) The local relations are coincident with those at higher redshift for both relations within any sober assessment of the uncertainties – exactly what we measure and how matters at this level, and I’m not going to attempt to disentangle all that here. Neither am I about to attempt to assess the consistency (or lack thereof) with either LCDM or MOND; the data simply aren’t good enough for that yet. It is also not clear to me that everyone agrees on what LCDM predicts.

What I can do is check empirically how much evolution there is within the 100-galaxy data set of Nestor Shachar et al. (2023). To do that, I fit a line to their data (the left panel above) and measure the residuals: for a given rotation speed, how far is each galaxy from the expected mass? To compare this with the stellar masses discussed previously, I normalize those residuals to the same M** = 9 x 1010 M. If there is no evolution, the data will scatter around a constant value as function of redshift:

This figure reproduces the stellar mass-redshift data for L* galaxies (black points) and the monolithic (purple line) and LCDM (red and green lines) models discussed previously. The blue squares illustrate deviations of the data of Nestor Shachar et al. (2023) from the baryonic Tully-Fisher relation (dashed line, normalized to the same mass as the monolithic model). There is no indication of evolution in the baryonic Tully-Fisher relation, which was apparently established within the first few billion years after the Big Bang (z = 2.5 corresponds to a cosmic age of about 2.6 Gyr). The data are consistent with a monolithic galaxy formation model in which all the mass had been assembled into a single object early on.

The data scatter around a constant value as function of redshift: there is no perceptible evolution.

The kinematic data for rotating galaxies tells much the same story as the photometric data for galaxies in clusters. The are both consistent with a monolithic model that gathered together the bulk of the baryonic mass early on, and evolved as an island universe for most of the history of the cosmos. There is no hint of the decline in mass with redshift predicted by the LCDM simulations. Moreover, the kinematics trace mass, not just light. So while I am careful to consider the options for LCDM, I don’t know how we’re gonna get out of this one.

Empirically, it is an important observation that there is no apparent evolution in the baryonic Tully-Fisher relation out to z ~ 2.5. That’s a lookback time of ~11 Gyr, so most of cosmic history. That means that whatever physics sets the relation did so early. If the physics is MOND, this absence of evolution implies that a0 is constant. There is some wiggle room in that given all the uncertainties, but this already excludes the picture in which a0 evolves with the expansion rate through the coincidence a0 ~ cH0. That much evolution would be readily perceptible if H(z) evolves as it appears to do. In contrast, the coincidence a0 ~ c2Λ1/2 remains interesting since the cosmological constant is constant. Perhaps this is just a coincidence, or perhaps it is a hint that the anomalous acceleration of the expansion of the universe is somehow connected with the anomalous acceleration in galaxy dynamics.

Though I see no clear evidence for evolution in Tully-Fisher to date, it remains early days. For example, a very recent paper by Amvrosiadis et al. (2025) does show a hint of evolution in the sense of an offset in the normalization of the baryonic Tully-Fisher relation. This isn’t very significant, being different by less than 2σ; and again we find ourselves in a situation where we need to take a hard look at all the assumptions and population modeling and velocity measurements just to see if we’re talking about the same quantities before we even begin to assess consistency or the lack thereof. Nevertheless, it is an intriguing result. There is also another interesting anecdotal case: one of their highest redshift objects, ALESS 071.1 at z = 3.7, is also the most massive in the sample, with an estimated stellar mass of 2 x 1012 M. That is a crazy large number, comparable to or maybe larger than the entire dark matter halo of the Milky Way. It falls off the top of any of the graphs of stellar mass we discussed before. If correct, this one galaxy is an enormous problem for LCDM regardless of any other consideration. It is of course possible that this case will turn out to be wrong for some reason, so it remains early days for kinematics at high redshift.

Cluster Kinematics

It is even earlier days for cluster kinematics. First we have to find them, which was the focus of Jay Franck’s thesis. Once identified, we have to estimate their masses with the available data, which may or may not be up to the task. And of course we have to figure out what theory predicts.

LCDM makes a clear prediction for the growth of cluster mass. This work out OK at low redshift, in the sense that the cluster X-ray mass function is in good agreement with LCDM. Where the theory struggles is in the proclivity for the most massive clusters to appear sooner in cosmic history than anticipated. Like individual galaxies, they appear too big too soon. This trend persisted in Jay’s analysis, which identified candidate protoclusters at higher redshifts than expected. It also measured velocity dispersions that were consistently higher than found in simulations. That is, when Jay applied the search algorithm he used on the data to mock data from the Millennium simulation, the structures identified there had velocity dispersions on average a factor of two lower than seen in the data. That’s a big difference in terms of mass.

Figure 11 from McGaugh et al. (2024): Measured velocity dispersions of protocluster candidates (Franck & McGaugh 2016a, 2016b) as a function of redshift. Point size grows with the assessed probability that the identified overdensities correspond to a real structure: all objects are shown as small points, candidates with P > 50% are shown as light blue midsize points, and the large dark blue points meet this criterion and additionally have at least 10 spectroscopically confirmed members. The MOND mass for an equilibrium system in the low-acceleration regime is noted at right; these are comparable to cluster masses at low redshift.

At this juncture, there is no way to know if the protocluster candidates Jay identified are or will become bound structures. We made some probability estimates that can be summed up as “some are probably real, but some probably are not.” The relative probability is illustrated by the size of the points in the plot above; the big blue points are the most likely to be real clusters, having at least ten galaxies at the same place on the sky at the same redshift, all with spectroscopically measured redshifts. Here the spectra are critical; photometric redshifts typically are not accurate enough to indicate that galaxies that happen to be nearby to each other on the sky are also that close in redshift space.

The net upshot is that there are at least some good candidate clusters at high redshift, and these have higher velocity dispersions than expected in LCDM. I did the exercise of working out what the equivalent mass in MOND would be, and it is about the same as what we find for clusters at low redshift. This estimate assumes dynamical equilibrium, which is very far from guaranteed. But the time at which these structures appear is consistent with the timescale for cluster formation in MOND (a couple Gyr; z ~ 3), so maybe? Certainly there shouldn’t be lots of massive clusters in LCDM at z ~ 3.

Kinematic Takeaways

While it remains early days for kinematic observations at high redshift, so far these data do nothing to contradict the obvious interpretation of the photometric data. There are mature, dynamically cold, fast rotating spiral galaxies in the early universe that were predicted not to be there by LCDM. Moreover, kinematics traces mass, not just light, so all the wriggling we might try to explain the latter doesn’t help with the former. The most obvious interpretation of the kinematic data to date is the same as that for the photometric data: galaxies formed early and grew massive quickly, as predicted a priori by MOND.


*The papers I write that cover both theories always seem to wind up lopsided in favor of LCDM in terms of the bulk of their content. That happens because it takes many pages to discuss all the ins and outs. In contrast, MOND just gets it right the first time, so that section is short: there’s not much more to say than “Yep, that’s what it predicted.”

+I’ve yet not heard directly any criticisms of our paper. The criticisms that I’ve heard second or third hand so far almost all fall in the category of things we explicitly discussed. That’s a pretty clear tell that the person leveling the critique hasn’t bothered to read it. I don’t expect everyone to agree with our take on this or that, but a competent critic would at least evince awareness that we had addressed their concern, even if not to their satisfaction. We rarely seem to reach that level: it is much easier to libel and slander than engage with the issues.

The one complaint I’ve heard so far that doesn’t fall in the category of things-we-already-discussed is that we didn’t do hydrodynamic simulations of star formation in molecular gas. That is a red herring. To predict the growth of stellar mass, all we need is a prescription for assembling mass and converting baryons into stars; this is essentially a bookkeeping exercise that can be done analytically. If this were a serious concern, it should be noted that most cosmological hydro-simulations also fail to meet this standard: they don’t resolve star formation, so they typically adopt some semi-empirical (i.e., data-informed) bookkeeping prescription for this “subgrid physics.”

Though I have not myself attempted to numerically simulate galaxy formation in MOND, Sanders (2008) did. More recently, Eappen et al. (2022) have done so, including molecular gas and feedback$ and everything. They find a star formation history compatible with the analytic models we discuss in our paper.

$Related detail: Eappen et al find that different feedback schemes make little difference to the end result. The deus ex machina invoked to solve all problems in LCDM is largely irrelevant in MOND. There’s a good physical reason for this: gravity in MOND is sourced by what you see; how it came to have its observed distribution is irrelevant. If 90% of the baryons are swept entirely out of the galaxy by some intense galactic wind, then they’re gone BYE BYE and don’t matter any more. In contrast, that is one of the scenarios sometimes invoked to form cores in dark matter halos that are initially cuspy: the departure of all those baryons perturbs the orbits of the dark matter particles and rearranges the structure of the halo. While that might work to alter halo structure, how it results in MOND-like phenomenology has never been satisfactorily explained. Mostly that is not seen as even necessary; converting cusp to core is close enough!


&Though we typically associate the observed outer velocity with halo mass, an important caveat is that the radius also matters: M ~ RV2, and most data for high redshift galaxies do not extend very far out in radius. Nevertheless, it takes a lot of mass to make rotation speeds of order 200 km/s within a few kpc, so it hardly matters if this is or is not representative of the dark matter halo: if it is all stars, then the kinematics directly corroborate the interpretation of the photometric data that the stellar mass is large. If it is representative of the dark matter halo, then we expect the halo radius to scale with the halo velocity (R200 ~ V200) so M200 ~ V2003 and again it appears that there is too much mass in place too early.

Old galaxies in the early universe

Old galaxies in the early universe

Continuing our discussion of galaxy formation and evolution in the age of JWST, we saw previously that there appears to be a population of galaxies that grew rapidly in the early universe, attaining stellar masses like those expected in a traditional monolithic model for a giant elliptical galaxy rather than a conventional hierarchical model that builds up gradually through many mergers. The formation of galaxies at incredibly high redshift, z > 10, implies the existence of a descendant population at intermediate redshift, 3 < z < 4, at which point they should have mature stellar populations. These galaxies should not only be massive, they should also have the spectral characteristics of old stellar populations – old, at least, for how old the universe itself is at this point.

Theoretical predictions from Fig. 1 of McGaugh et al (2024) combined with the data of Fig. 4. The data follow the track of a monolithic model that forms early as a single galaxy rather than that of the largest progenitor of the hierarchical build-up expected in LCDM.

The data follow the track of stellar mass growth for an early-forming monolithic model. Do the ages of stars also look like that?

Here is a recent JWST spectrum published by de Graff et al. (2024). This appeared too recently for us to have cited in our paper, but it is a great example of what we’re talking about. This is an incredibly gorgeous spectrum of a galaxy at z = 4.9 when the universe was 1.2 Gyr old.

Fig. 1 from de Graff et al. (2024): JWST/NIRSpec PRISM spectrum (black line) of the massive quiescent galaxy RUBIES-EGS-QG-1 at a redshift of z = 4.8976.

It is challenging to refrain from nerding out at great length over many of the details on display here. First, it is an incredible technical achievement. I’ve seen worse spectra of local galaxies. JWST was built to obtain images and spectra of galaxies so distant they approach the horizon of the observable universe. Its cameras are sensitive to the infrared part of the spectrum in order to capture familiar optical features that have been redshifted by a huge factor (compare the upper and lower x-axes). The telescope itself was launched into space well beyond the obscuring atmosphere of the earth, pointed precisely at a tiny, faint flicker of light in a vast, empty universe, captured photons that had been traveling for billions of years, and transmitted the data to Earth. That this is possible, and works, is an amazing feat of science, engineering, and societal commitment (it wasn’t exactly cheap).

In the raw 2D spectrum (at top) I can see by eye the basic features in the extracted, 1D spectrum (bottom). This is a useful and convincing reality check to an experienced observer even if at first glance it looks like a bug splot smeared by a windshield wiper. The essential result is apparent to the eye; the subsequent analysis simply fills in the precise numbers.

Looking from right to left, the spectrum runs from red to blue. It ramps up then crashes down around an observed wavelength of 2.3 microns. This is the 4000 Å break in the rest frame, a prominent feature of aging stellar populations. The amount of blue-to-red ramp-up and the subsequent depth of drop is a powerful diagnostic of stellar age.

In addition to the 4000 Å break, a number of prominent spectral lines are apparent. In particular, the Balmer absorption lines Hβ, Hγ, and Hδ are clear and deep. These are produced by A stars, which dominate the light of a stellar population after a few hundred million years. There’s the answer right there: the universe is only 1.2 Gyr old at this point, and the stars dominating the light aren’t much younger.

There are also some emission lines. These can be the sign of on-going star formation or an active galactic nucleus powered by a supermassive black hole. The authors attribute these to the latter, inferring that the star formation happened fast and furious early on, then basically stopped. That’s important to the rest of the spectrum; A stars only dominate for a while, and their lines are not so prominent if a population keeps making new stars. So this galaxy made a lot of stars, made them fast, then basically stopped. That is exactly the classical picture of a monolithic giant elliptical.

Here is the star formation history that de Graff et al. (2024) infer:

Fig. 2 from de Graff et al. (2024): the star formation rate (top) and accumulated stellar mass (bottom) as a function of cosmic time (only the first 1.2 Gyr are shown). Results for stellar populations of two metallicities are shown (purple or blue lines). This affects the timing of the onset of star formation, but once going, an enormous mass of stars forms fast, in ~200 Myr.

There are all sorts of caveats about population modeling, but it is very hard to avoid the basic conclusion that lots of stars were assembled with incredible speed. A stellar mass a bit in excess of that of the Milky Way appears in the time it takes for the sun to orbit once. That number need not be exactly right to see that this is not a the gradual, linear, hierarchical assembly predicted by LCDM. The typical galaxy in LCDM is predicted to take ~7 Gyr to assemble half its stellar mass, not 0.1 Gyr. It’s as if the entire mass collapsed rapidly and experienced an intense burst of star formation during violent relaxation (Lynden-Bell 1967).

Collapse of shells within shells to form a massive galaxy rapidly in MOND (Sanders 2008). Note that the inner shells (inset) where most of the stars will be collapse even more rapidly than the overall monolith (dotted line).

Where MOND provides a natural explanation for this observation, the fiducial population model of de Graff et al. violates the LCDM baryon limit: there are more stars than there are baryons to make them from. It should be impossible to veer into the orange region above as the inferred star formation history does. The obvious solution is to adopt a higher metallicity (the blue model) even if that is a worse fit to the spectrum. Indeed, I find it hard to believe that so many stars could be made in such a small region of space without drastically increasing their metallicity, so there are surely things still to be worked out. But before we engage in too much excuse-making for the standard model, note that the orange region represents a double-impossibility. First, the star formation efficiency is 100%. Second, this is for an exceptionally rare, massive dark matter halo. The chances of spotting such an object in the area so far surveyed by JWST is small. So we not only need to convert all the baryons into stars, we also need to luck into seeing it happen in a halo so massive that it probably shouldn’t be there. And in the strictist reading, there still aren’t enough baryons. Does that look right to you?

Do these colors look right to you? Getting the color right is what stellar population modeling is all about.

OK, so I got carried away nerding out about this one object. There are other examples. Indeed, there are enough now to call them a population of old and massive quiescent galaxies at 3 < z < 4. These have the properties expected for the descendants of massive galaxies that form at z > 10.

Nanayakkara et al. (2024) model spectra for a dozen such galaxies. The spectra provide an estimate of the stellar mass at the redshift of observation. They also imply a star formation history from which we can estimate the age/redshift at which the galaxy had formed half of those stars, and when it quenched (stopped forming stars, or in practice here, when the 90% mark had been reached). There are, of course, large uncertainties in the modeling, but it is again hard to avoid the conclusion that lots of stars were formed early.

Figure 7 from McGaugh et al. (2024): The stellar masses of quiescent galaxies from Nanayakkara et al. (2024). The inferred growth of stellar mass is shown for several cases, marking the time when half the stars were present (small green circles) to the quenching time when 90% of the stars were present (midsize orange circles) to the epoch of observation (large red circles). Illustrative star formation histories are shown as dotted lines with the time of formation ti and the quenching timescale τ noted in Gyr. We omit the remaining lines for clarity, as many cross. There is a wide distribution of formation times from very early (ti = 0.2 Gyr) to relatively late (>1 Gyr), but all of the galaxies in this sample are inferred to build their stellar mass rapidly and quench early (τ < 0.5 Gyr).

The dotted lines above are models I constructed in the spirit of monolithic models. The particular details aren’t important, but the inferred timescales are. To put galaxies in this part of the stellar mass-redshift plane, they have to start forming early (typically in the first billion years), form stars at a prolific rate, then quench rapidly (typically with e-folding timescales < 1 Gyr). I wouldn’t say any of these numbers are particularly well-measured, but they are indicative.

What is missing from this plot is the LCDM prediction. That’s not because I omitted it, it’s because the prediction for typical L* galaxies doesn’t fall within the plot limits. LCDM does not predict that typical galaxies should become this massive this early. I emphasize typical because there is always scatter, and some galaxies will grow ahead of the typical rate.

Not only are the observed galaxies massive, they have mature stellar populations that are pretty much done forming stars. This will sound normal to anyone who has studied the stellar populations of giant elliptical galaxies. But what does LCDM predict?

I searched through the Illustris TNG50 and TNG300 simulations for objects at redshift 3 that had stellar masses in the same range as the galaxies observed by Nanayakkara et al. (2024). The choice of z = 3 is constrained by the simulation output, which comes in increments of the expansion factor. To compare to real galaxies at 3 < z < 4 one can either look at the snapshot at z = 4 or the one at z = 3. I chose z = 3 to be conservative; this gives the simulation the maximum amount of time to produce quenched, massive galaxies.

These simulations do indeed produce some objects of the appropriate stellar mass. These are rare, as they are early adopters: galaxies that got big quicker than is typical. However, they are not quenched as observed: the simulated objects are still on the star forming main sequence (the correlation between star formation rate and stellar mass). The distribution of simulated objects does not appear to encompass that of real galaxies.

Figure 8 from McGaugh et al. (2024): The stellar masses and star formation rates of galaxies from Nanayakkara et al. (2024; red symbols). Downward-pointing triangles are upper limits; some of these fall well below the edge of the plot and so are illustrated as the line of points along the bottom. Also shown are objects selected from the TNG50 (Pillepich et al. 2019; filled squares) and TNG300 (Pillepich et al. 2018; open squares) simulations at z = 3 to cover the same range of stellar mass. Unlike the observed galaxies, simulated objects with stellar masses comparable to real galaxies are mostly forming stars at a rapid pace. In the higher-resolution TNG50, none have quenched as observed.

If we want to hedge, we can note that TNG300 has a few objects that are kinda in the right ballpark. That’s a bit misleading, as the data are mostly upper limits. Moreover, these are the rare objects among a set of objects selected to be rare: it isn’t a resounding success if we have to scrape the bottom of the simulated barrel after cherry-picking which barrel. Worse, these few semi-quenched simulated objects are not present in TNG50. TNG50 is the higher resolution simulation, so presumably provides a better handle on the star formation in individual objects. It is conceivable that TNG300 “wins” by virtue of its larger volume, but that’s just saying we have more space in which to discover very rare entities. The prediction is that massive, quenched galaxies should be exceedingly rare, but in the real universe they seem mundane.

That said, I don’t think this problem is fundamental. Hierarchical assembly is still ongoing at this epoch, bringing with it merger-induced star formation. There’s an easy fix for that: change the star formation prescription. Instead of “wet” mergers with gas that can turn into stars, we just need to form all the stars already early on so that the subsequent mergers are “dry” – at least, for those mergers that build this particular population. One winds up needing a new and different mode of star formation. In addition to what we observe locally, there needs to be a separate mode of super-efficient star formation that somehow turns all of the available baryons into stars as soon as possible. That’s basically what I advocate as the least unreasonable possibility for LCDM in our paper. This is a necessary but not sufficient condition; these early stellar nuggets also need to assemble speedy quick to make really big galaxies. While it is straightforward to mess with the star formation prescription in models (if not in nature), the merger trees dictating the assembly history are less flexible.

Putting all the data together in a single figure, we can get a sense for the evolutionary trajectory of the growth of stellar mass in galaxies across cosmic time. This figure extends from the earliest galaxies so-far known at z ~ 14 when the universe was just a few hundred million years old (of order on orbital time in a mature galaxy) to the present over thirteen billion years later. In addition to data discussed previously, it also shows recent data with spectroscopic redshifts from JWST. This is important, as the sense of the figure doesn’t change if we throw away all the photometric redshifts, it just gets a little sparse around z ~ 8.

Figure 10 from McGaugh et al. (2024): The data from Figures 4 and 6 shown together using the same symbols. Additional JWST data with spectroscopic redshifts are shown from Xiao et al. (2023; green triangles) and Carnall et al. (2024). The data of Carnall et al. (2024) distinguish between star-forming galaxies (small blue circles) and quiescent galaxies (red squares); the latter are in good agreement with the typical stellar mass determined from Schechter fits in clusters (large circles). The dashed red lines show the median growth predicted by the Illustris ΛCDM simulation (Rodriguez-Gomez et al. 2016) for model galaxies that reach final stellar masses of M* = 1010, 1011, and 1012 M. The solid lines show monolithic models with a final stellar mass of 9 x 1010 M and ti = τ = 0.3, 0.4, and 0.5 Gyr, as might be appropriate for giant elliptical galaxies. The dotted line shows a model appropriate to a monolithic spiral galaxy with ti = 0.5 and τ = 13.5 Gyr.

The solid lines are monolithic models we built to represent classical giant elliptical galaxies that form early and quench rapidly. These capture nicely the upper envelope of the data. They form most of their stars at z > 4, producing appropriately old populations at lower redshifts. The individual galaxy data merge smoothly into those for typical galaxies in clusters.

The LCDM prediction as represented by the Illustris suite of simulations is shown as the dashed red lines for objects of several final masses. These are nearly linear in log(M*)-linear z space. Objects that end up with a typical L* elliptical galaxy mass at z = 0 deviate from the data almost immediately at z > 1. They disappear above z > 6 as the largest progenitors become tiny.

What can we do to fix this? Massive galaxies get a head start, as it were, by being massive at all epochs. But the shape of the evolutionary trajectory remains wrong. The top red line (for a final stellar masses of 1012 M) corresponds to a typical galaxy at z ~ 2, but it continues to grow to be atypical locally. The data don’t do that. Even with this boost, the largest progenitor is still predicted to be too small at z > 3 where there are now many examples of massive, quiescent galaxies – known both from JWST observations and from Jay Franck’s thesis before it. Again, the distribution of the data do not look like the predictions of LCDM.

One can abandon Illustris as the exemplar of LCDM, but it doesn’t really help. Other models show similar things, differing only in minor details. That’s because the issue is the mass assembly history they all share, not the details of the star formation. The challenge now is to tweak models to make them look more monolithic; i.e., change those red dashed lines into the solid black lines. One will need super-efficient star formation, if it is even possible. I’ll leave discussion of this and other obvious fudges to a future post.

Finally, note that there are a bunch of galaxies with JWST spectroscopic redshifts from 3 < z < 4 that are not exceptionally high mass (the small blue points). These are expected in any paradigm. They can be galaxies that are intrinsically low mass and won’t grow much further, or galaxies that may still grow a lot, just with a longer fuse on their star formation timescale. Such objects are ubiquitous in the local universe as spiral and irregular galaxies. Their location in the diagram above is consistent with the LCDM predictions, but is also readily explained by monolithic models with long star formation timescales. The dotted line shows a monolithic model that forms early (ti = 0.5) but converts gas into stars gradually (τ = 13.5 Gyr rather than < 1 Gyr). This is a boilerplate model for a spiral that has been around for as long as the short-τ model for giant ellipticals. So while these lower mass galaxies exist, their location in the M*-z plane doesn’t really add much to this discussion as yet. It is the massive galaxies that form early and become quiescent rapidly that most challenge LCDM.