Local baryons in simulations and reality

Our cosmology du jour, LCDM, suffers a local missing baryon problem: we don’t detect all of the baryons we expected to find associated with the dark matter halos of individual galaxies. Should we?

Empirically, yes. The stars and cold (atomic and molecular) gas appear to be all that there is to see in late type galaxies. Having additional large reservoirs of baryons results in a fine-tuning problem: the amount of extra stuff must vary precisely with galaxy mass so as not to impact the remarkably tight mass-rotation speed relation.

In theory, no. There are lots of places to stash extra baryons in phases where they are hard to detect. Warm/hot ionized gas in the circum-galactic medium (CGM) is one obvious place to harbor hard-to-detect phase of baryonic material that might add up to a lot of mass. Indeed, in many galaxy formation simulations, lots of mass winds up in the CGM. So what should we expect in LCDM?

That depends on who you ask. I went down a deep rabbit hole about this, both covering many different types of modern hydrodynamical simulations and how what we’ve expected has varied historically. It is a mess. But there are consistent threads: we do expect that galaxies might harbor extensive CGM (for reasons that vary) or that the missing baryons might not be in galaxies at all, having been ejected to the intergalactic medium (IGM) or prevented from accreting in the first place.

Rather than attempt a systematic survey of simulations, I’ll focus here on one example, EAGLE. The reason for this choice is that Mitchell & Schaye (2022) address exactly this subject. Here is their Fig. 1, which shows the baryon content of various components as a function of dark halo mass in the top panel. The bottom panel tracks where the heavy elements are, which is interesting, but I won’t address that here.

**Figure 1** from Mitchell & Schaye (2022): The median total baryonic mass (top panel) and metal mass (bottom panel) associated with the haloes of central galaxies at z = 0, normalized by the available baryon mass and plotted as a function of halo mass (M₂₀₀). Line colours indicate the mass in different components, including the CGM (cyan), ISM (green), stars (black), gas that has been ejected beyond R₂₀₀ (red), and gas that we estimate has been prevented from being accreted due to feedback effects (blue). Grey lines show the total mass, adding together each of these components. Solid lines show masses associated with the central subhalo, whereas dashed lines also include the masses associated with satellite subhaloes. For 10¹¹ < *M₂₀₀* < 10¹³ M_☉, most of the baryons that have ever been accreted on to haloes have since been ejected and reside outside R₂₀₀ by z = 0. Preventative feedback is important for *M₂₀₀* < *10¹² M_☉*. About half of the metals produced by stellar evolution are then ejected beyond R₂₀₀, apart from in very massive haloes.

There’s a lot going on here! For reference, our Milky Way resides in a ~10¹² M_☉ halo. Above M₂₀₀ > 10¹³ M_☉, most objects are groups and clusters where the distinction between centrals and satellites starts to matter. It’s complicated enough without that, so I’ll stick to individual galaxies. On the lower mass end, there’s a huge compression of the large range in stellar mass exhibited by dwarf galaxies into a relatively narrow range of halo mass, so the lower limit of M₂₀₀ = 10¹⁰ M_☉ captures many but not all dwarf galaxies.

That’s just the x-axis. The y-axis of the top panel shows the fraction of baryons in each component relative to the amount available in each halo (the product of the cosmic baryon fraction and the halo mass). The colored lines denote five different baryonic mass components. We readily observe two: the stars (black line) and cold gas (green line, denoted ISM in the figure legend). Additional components include gas in the CGM (cyan line) and gas that has been ejected to the IGM (red line) or was never accreted in the first place (blue line).

From this perspective, it seems hopeless to account for all the baryons on a halo-by-halo basis. Large galaxies (10¹¹ < M₂₀₀ < 10¹³ M_☉) eject most of their baryons (the red line exceeds all others). Lower mass galaxies also eject a lot of baryons, but most of them never get accreted in the first place (the blue line). There are a couple of reasons why accretion might be precluded. One is cosmic reionization: heating of the gas in the early universe by the first UV sources makes the gas too hot to stick to low mass halos: its thermal velocity exceeds their escape speed*. Another is feedback, in which the stars that do form in a galaxy return enough energy to the surrounding gas heat it up enough to prevent it from accreting. The latter process apparently dominates in the EAGLE simulations but which effect really dominates is a topic that simulators love to debate.

The largest reservoir of baryons that sticks to its dark matter halos is the CGM. This exceeds both stars and cold (ISM) gas for all halo masses. The CGM mass fraction increases with mass, which appears to be the opposite of what we need empirically, but really we need to sum up all three of the non-observed components to compare with the unseen baryonic mass that we infer:

**Figure 7** from McGaugh et al. (2026): *The ratio of missing-to-observed baryonic mass as a function of baryonic mass.*

That seems unlikely to add up, but it isn’t really possible to check. We don’t measure the missing component (by definition); we only infer its existence from the cosmic baryon fraction. One could laboriously check each simulation to see if the various missing components that should not be detected add up in the right way to explain the data, but one could always wave away any inconsistency by tweaking how many baryons get lost to the IGM. Between ejection to the IGM, prevention of accretion in the first place, and a quasi-undetectable CGM, the prospects for rigorously testing simulations are limited. However, each of these are distinct effects that occur in combination. This exacerbates the fine-tuning problem: not only does the unaccounted-for mass have to vary just so, these different mechanisms must somehow conspire to makes it so. It does not inspire confidence that this will work out when one realizes that these different mechanisms behave differently in different simulations.

We are not able to directly test the fraction of baryons that are prevented from accreting or that are ejected to the IGM. We have only the vaguest of constraints on the CGM restricted to massive galaxies. But we do measure the stellar and ISM gas mass, so we can compare the EAGLE simulation above to the data:

**Figure 6** from McGaugh et al. (2026): The stellar mass fraction (left panel) and gas mass fraction (right panel) as a function of mass M₂₀₀ with the equivalent V₂₀₀ on the top axis. Blue points are star-dominated spirals, green are gas rich dwarf irregulars, and yellow points are Local Group rotators. The lines show the expectation for central subhalos in the EAGLE simulations (Mitchell & Schaye 2022) with the width of the gray bands representing the range of the velocity fudge factor f_v = V_f/V₂₀₀ from *f_v* = 1 (bottom edge) to 1.4 (top edge). The dotted line in the right panel denotes the limit where gas is precluded from accreting onto halos in the EAGLE simulations (the blue line in the EAGLE figure).

To the eye habituated to astronomical accuracy, the stellar mass fraction in the left panel works out pretty well. The gray band representing the simulations does more or less the same thing as the data. However, this is one of the occasions on which we can fool ourselves with log-log plots. The bands are offset from the data by a factor that is not modest. The width of the bands already accounts for the plausible variation in the velocity fudge factor. One can of course consider implausible values of f_v, but the shape is also a problem. If we make an adjustment to match intermediate mass galaxies, the difference from high mass galaxies gets worse. One could make further tweaks, but this is a hopeless game as the shape problem stems from the curvature that is inevitable in abundance matching relations and the lack thereof in Tully-Fisher.

The gas content of EAGLE simulated galaxy-like objects does not compare well to the observed ISM in real galaxies (right panel). Gas is historically the hardest part to do in large magnetohydrodynamical cosmological simulations, so I’ve cut simulators a lot of slack, only occasionally pointing out that this doesn’t work out. But it really doesn’t work out, so if they want me to cut them slack then they should refrain^% from asserting that everything works out. It has become a tiresome, decades-long refrain that has never panned out.

The problem for cold ISM gas in massive galaxies in EAGLE is that there isn’t enough of it. The problem in intermediate mass galaxies is that there really isn’t enough of it. The typical value is off by an order of magnitude at M₂₀₀ ~ 10¹¹ M_☉. The problem in low mass galaxies is that it isn’t there at all. They typical EAGLE object with M₂₀₀ < 10^10.5 M_☉ has no cold gas at all. Such objects should not exist, apparently. But gas rich, low masses galaxies are boilerplate examples of observational reality, so it is a substantial problem for a simulation if such things are predicted to be rare^$.

There are many other LCDM simulations on the market. At most one of them can be correct. EAGLE is a reasonable example for illustrating what galaxy formation should plausibly do. Though not perfect, it is a reasonable representative of the LCDM brand. In this context, it makes sense to me that there would be all these various baryonic components and reservoirs. But reality doesn’t look like that. We add up the stars and cold gas and we’re done; anything extra involves fine-tuning. Maybe there should be more stuff associated with galaxies, but the fine-tuning problem this entails augers otherwise.

*I started to say a lot more about this here, but decided it was too deep a rabbit hole, so instead refer to a note about a conversation I had with Colin Norman on what the reionization scale should be.

^%The sociology in the simulation community seems to be to assert complete success in explaining everything at all times until the next batch of simulations completes running, then point out all the improvements. Everything is explained all the time, only more so as time goes on.

^$There is one caveat of comparing apples and oranges. The galaxies for which we have gas data are generally blue, late type (mostly dwarf irregular) galaxies. So we should make this comparison to similar objects in the simulation, but this distinction was not made by Mitchell & Schaye (2022). Persisting in my habit of giving the LCDM paradigm every benefit of the doubt, one can imagine that there is an as-yet undiscovered population of very low surface brightness galaxies that are red and gas poor pervading the universe, and that EAGLE is predicting these things are out there waiting to be discovered. The gas fractions are low because there are a lot of gas poor galaxies that we haven’t discovered yet. Having spent much of my career seeking low surface brightness galaxies, I’ve never been disappointed that there are more of them out there. I have, however, routinely been disappointed that there are enough of them to solve huge numerical discrepancies like this.

Extended Tully-Fisher relations

Previously I had alluded to some of the major projects I’ve been working on. One has come to fruition and can be found on the arXiv and in the Astrophysical Journal^&. It has taken many years to assemble the data in this paper, during which time the models purporting to explain some of it have evolved considerably while consistently failing to address the real problems they raise. There is a lot to explore, so it will take more than one post.

Here I start with the empirical basis: the stellar mass and baryonic Tully-Fisher relations. The Tully-Fisher relation was originally discovered as a relation between luminosity and linewidth in rotationally supported galaxies – spirals and irregulars. It immediately proved useful as an extragalactic distance indicator. As such, it was instrumental in breaking the impasse in the Hubble constant^* debate (back when it was 50 vs. 100, not 67 vs. 73), and it remains useful in this role.

Physically, the obvious interpretation was that luminosity is a proxy for stellar mass and linewidth^{*^} is a proxy for rotation speed. This is correct. Of the various rotation speeds one can define and measure, the one that works best, in terms of minimizing the scatter in the relation, is the flat rotation speed measured in the outer parts of extended rotation curves. See Stark et al. (2009) and Trachternach et al. (2009) for further examples. The scatter is basically a function of data quality.

On the mass axis, converting measured flux to luminosity to mass is a bit dicier, as we need to know the distance for the first step and the stellar mass-to-light ratio for the second. There is inevitably some intrinsic scatter in the mass-to-light ratio of a stellar population. While I don’t doubt that luminosity is a proxy for stellar mass, improving on it is hard to do: there are many instances in which simply assuming a straight mapping of light to mass can be as effective as applying fancier population models. We might^{^} finally be getting past that, so it is worth discussing a bit.

The procedure to convert starlight into stellar mass involves the construction of stellar population models that use the color(s) or spectral energy distribution of a galaxy to infer the types of stars that make the light. This is a long-argued subject; suffice it to say there are a number of points where it can go wrong. The most obvious is the IMF; the initial spectrum of masses with which stars are born. Most of the light we see from galaxies is produced by its higher mass stars, which are disproportionately bright (there is a steep scaling of stellar luminosity with mass). But most of the mass is locked up in low mass stars that contribute little to the total luminosity. So we are, in effect, using the light of the few to represent the mass of the many. That would go badly wrong if we don’t know the relative mix, i.e., the shape of the IMF. This has been the subject of much research, and over many decades has been narrowed down pretty well. While I hope that this is almost settled, the specter of the IMF lurks as a menace to all stellar mass determinations.

There is a lot else we need to know to build a stellar population model. This includes such essentials as the spectra of individual stars of each and every type and stellar evolution as a function of mass and composition including exotic phases like the asymptotic giant branch. There are a lot of places where this can go badly wrong, and sometimes^{^%} does. So I wouldn’t say we know how to do this perfectly, but we have become pretty good at it.

Converting light to mass suffices to plot the stellar mass Tully-Fisher relation. That accounts for most of the baryonic mass of high mass spirals, but it ignores the mass of the interstellar gas. This can be appreciable in lower mass systems. Indeed, the standard issue dwarf galaxy in the field is more gas than stars:

**Figure 1** from McGaugh et al. (2019): The gas and stellar masses of rotating galaxies. Blue points are galaxies in the SPARC database (Lelli et al. 2016b) and the gas rich galaxies discussed by McGaugh (2012). The location of the Milky Way is noted in red (McGaugh 2016): it is a typical bright spiral. Grey points are the sample of Bradford et al. (2015). The line is the line of equality where M_* = M_g.

With measurements of mass and rotation speed, we can construct the Tully-Fisher relation:

**Figure 4** from McGaugh et al. (2019): The stellar mass (left) and baryonic Tully-Fisher relation (right). Data from Lelli et al. (2016b) and McGaugh (2012) are shown as blue points if both axes are measured with at least 20% accuracy; less accurate data are shown in grey. The latter include cases for which the rotation curve does not extend far enough to measure V_f, in which case the last measure point is used. These cases are systematically offset to lower velocity. Inclination uncertainties and distance errors also contribute to the scatter. The better the data, the tighter the relation. *The location of the Milky Way is noted in red (you are here).*

The stellar mass Tully-Fisher relation is a good correlation by the standards of extragalactic astronomy. The majority of studies in the literature are restricted to massive^% galaxies, mostly those with M_* > 10¹⁰ M_☉ where stars dominate the baryonic mass budget so the omission of gas is not obvious. As we look to lower masses, the relation bends and the scatter increases. That this happens right where gas starts to become important to the mass budget suggests that we’re missing an important component, and voila – a nice, continuous relation that is linear in log space is restored when we plot the baryonic mass Mb = M_*+M_g. Indeed, the data are consistent with a simple power law

M_b = A \, V_f^4

with A = 50 M_☉ km^-4 s⁴. The intercept A has consistently been measured within 10% of this value over the past couple of decades. That this is an integer power law so that the intercept has real physical units is intriguing. That doesn’t happen in most astronomical scaling laws, which are usually more happenstance, like the mass-luminosity relation for main sequence stars.

Why limit ourselves to rotationally supported galaxies? Let’s plots every known type of gravitationally bound extragalactic object, from the smallest ultrafaint dwarfs to the largest clusters of galaxies. Note that I’ve flipped the axes to accommodate the huge dynamic range in baryonic mass, roughly twelve (12) orders of magnitude. This is like having gnats at one end of the scale and blue whales at the other. On that scale, a person is a regular galaxy like the Milky Way.

**Figure 3** from McGaugh et al. (2026): Extended Tully-Fisher relations plotting the flat-equivalent circular velocity of extragalactic systems as a function of stellar mass (top panel) and baryonic mass (bottom panel). Data for rotationally supported galaxies are depicted by circles; squares represent pressure supported systems. The blue circles are galaxies with directly measured distances, V_f from rotation curves, and stellar masses from WISE photometry from Duey et al. (2026, in preparation). Green circles are gas-rich galaxies (M_g > M_*; Stark et al. 2009; Trachternach et al. 2009; Bernstein-Cooper et al. 2014; McNichols et al. 2016; Iorio et al. 2017; Namumba et al. 2025; Xu et al. 2025) not already in Duey et al. (2026). Yellow points are Local Group galaxies, both spirals and dwarfs (McGaugh et al. 2021); gray squares are ultrafaint dwarfs (Lelli et al. 2017). Lensing results for early- and late-type galaxies (Mistele et al. 2024a) are shown as pink squares and magenta circles, respectively. Red squares are clusters of galaxies (Mistele et al. 2025), and purple squares are groups of galaxies (McGaugh et al. 2026). The orange line is the BTFR fit only to rotating galaxies over a more limited range (about three orders of magnitude in baryonic mass, from M_b ~ 4 x 10⁸ to 4 x 10¹¹ M_☉) by McGaugh (2005).

One improvement from twenty years ago, aside from the greater number of objects and the increase in dynamic range, is the accuracy of the mass measurements. I tried a number of prescriptions for the stellar mass-to-light ratio in McGaugh (2005), which resulted in a range of possible slopes. Now we just use the stellar mass from precise population models (Duey et al. 2025) and recover my best estimate from back then. The room to dodge the obvious conclusion about the slope of the relation by complaining about the choice of stellar mass estimator – a popular course of action back then – is gone. Another technical issue we’ve spent a lot of effort working on is how to put all these very different systems on the same scale of V_f. I won’t elaborate on this here: if you’re interested in that level of detail, you can go read the paper and references there in. If we got this wrong, it would add to the scatter in the relation, and/or create offsets between different types of data.

Both of the extended Tully-Fisher relations, that in stellar mass (top panel) and that in baryonic mass (bottom panel, the extended BTFR) are good correlations. That in baryonic mass is clearly better in the sense that it is tighter over a larger dynamic range. From small dwarf galaxies (M_b ~ 5 x 10⁵) to groups of galaxies (5 x 10¹² M_☉), the data are consistent with a single power law (M_b ~ V_f⁴) for all systems with remarkably little scatter. Outside this range, the data for both the lowest and the highest mass systems deviate from a straight line towards higher mass at a given flat velocity. I don’t put much credence in the smallest systems as I think there is little chance that their measured velocity dispersions are representative of their equilibrium gravitational potential. For all practical purposes, our knowledge runs out as we hit the regime of ultrafaint^# dwarfs. The deviations of the most massive systems, clusters of galaxies, are more difficult to dismiss.

Restricting our attention for the moment to the range where a single power law suffices to describe the data, we note that there is not much scatter in the BTFR. Some of it is from random uncertainties; these dominate most studies and lead to a lot more scatter than seen here: these data are very good. We can account for the known observational errors and subtract off their contribution to estimate the intrinsic scatter in the relation. This is the variance of the data from a perfect line. The intrinsic scatter for the best data (the WISE-SPARC sample of Duey et al. 2026) is about 0.11 dex in mass – about what we expect^$ for stellar populations. That doesn’t leave much room for other sources of scatter, so the underlying physical relation has to be very tight indeed: essentially perfect over the range 5 x 10⁵ < M_b < 5 x 10¹² M_☉.

Scatter will also occur if our mass budget is incomplete. We can see this in the transition from the stars-only relation to the BTFR. There is a lot of scatter in the stellar mass Tully-Fisher relation around 10⁷ < M_b < 10⁹ M_☉. Galaxies in this mass range are sometimes star-dominated and sometimes gas-dominated. The gas fraction is all over the place. This shows up as scatter in the stellar mass Tully-Fisher relation. That’s not real; it is a sign that we’ve missed an important mass reservoir. This is cured when we add in the gas mass, which is dominated by atomic gas (HI to spectroscopists and astronomers). That this addition removes the scatter and restores a single power law relation strongly suggests that there are no further substantial reservoirs^** of baryonic material that we’re missing.

This logic applies to other systems as well. Bright spirals do not need much correction because their baryonic mass is dominated by stars. Their stellar mass Tully-Fisher relation is pretty much already their BTFR.

Perhaps this applies to clusters of galaxies as well? There was a huge correction from stars-only to stars plus gas. The gas in this case is the hot, ionized plasma of the intracluster medium (ICM) that belongs to the cluster itself and not any individual galaxy within it. That goes most of the way to close the gap between the stars-only cluster data and the extrapolation of the BTFR fit to individual galaxies, but not all the way. So perhaps we are still missing an important baryonic mass component? It happened before – we didn’t know about the ICM for decades after Zwicky first identified the missing mass problem in clusters – so perhaps there are still more baryons to discover there.

It could also be that the apparent offset occurs because we’ve failed to put clusters on the same V_f scale as galaxies. This is not easy to do, and we’ve spent a lot of time worrying about it. I don’t think this is what’s going on, though it would make my life a lot simpler if it were. Different indicators – dynamics vs. ICM hydrostatics vs. gravitational lensing – can give somewhat different answers, but not in a way that “fixes” the problem: I see no viable path in which the offset turns out to be a simple difference in the way the depth of the gravitational potential is measured. I would love to be wrong here, but I’m not dismissing the offset for clusters as I am for ultrafaint dwarfs (which don’t do lightly).

Perhaps the extrapolation of the BTFR from individual galaxies to clusters is simply not appropriate. They’re very different kinds of systems, after all. To dig into that, we need some theoretical perspective – why does the observed power law happen? Should we expect different systems to share the same BTFR?

Theory is something I’ve studiously avoided in this post: the possibility that there are baryons that remain to be discovered in clusters can be inferred empirically. All the other data line up, so why not clusters? But unless and until these hypothetical additional baryons are discovered, that’s just one possibility. How likely this possibility seems to be diverges rapidly once we overlay a theoretical preference, which I will leave to future posts. (I did warn it would take more than one.)

^&This paper appears in ApJ volume 1001. The literature has grown quite a bit since I started contributing to it in volume 342. The Astrophysical Journal was founded in 1895. So I’ve been contributing to it for a little over a quarter of its temporal existence, but nearly twice the number of volumes have been published in that shorter time. It’s no wonder none of us can keep up.

^*Indeed, Tully & Fisher’s “preliminary estimate of the Hubble constant is H₀ = 80 km/s/Mpc” remains correct to this day, within the uncertainties (hard to estimate at the time, but roughly ±10 km/s/Mpc).

^{*^}There appears to be an irreducible intrinsic scatter in the linewidth: it is not a perfect proxy for rotation speed. Linewidths are observationally easier to obtain than resolved, extended rotation curves, so the numbers of galaxies in samples using linewidths can be very large without ever approaching the quality provided by resolved interferometric observations. Bigger samples are not necessarily better.

^{^}I emphasize might here because the community seems to have moved towards reporting stellar masses as if we observe these rather than the luminosities and colors/SEDs that the mass estimates are based upon. The latter are data – observed quantities – while stellar masses are a derived quantity that is inevitably model dependent. This doesn’t stop being true just because we decide to invest a lot of faith in our models.

^{*^}The Sloan Digital Sky Survey provides stellar masses based on models that are known to be wrong in the near infrared. Since SDSS itself is entirely optical, one might not notice. If one mixes SDSS data with near-IR data, one will get the wrong answer.

^%This is a classic selection effect. Brighter objects can be seen at a much greater distance than dim ones, so probe a much larger volume. Consequently, their raw numbers always dominate surveys even if their number density is low. Stars are a great example: most of the stars you can see at night are intrinsically luminous: bright stars that are rather far away. Mundane, low mass stars do not stand out even when nearby.

^#This isn’t for lack of observations of ultrafaint dwarfs, it’s the underlying assumptions.

^$No amount of information suffices to perfectly specify the stellar mass that produces an observed luminosity and SED (spectral energy distribution/set of colors), so one always expects at least some intrinsic scatter in the stellar mass-to-light ratio. I’ve seen estimates that range from 0.1 – 0.2 dex for near-IR colors. That’s as good as it can get as there is always some transient population (e.g., AGB stars) that produce an amount of light that depends on the star formation rate some time ago, not what we measure now. Optical colors are worse in the sense of having more intrinsic scatter, as they are more susceptible to the comings and goings of bright but short-lived stars whose numbers fluctuate with the stochastic star formation rate. Finding 0.11 dex intrinstic scatter is pretty much as good as it can get. (By dex we mean the scatter in log space.)

^**We noted this effect in the original BTFR paper to argue that it was unlikely that we were missing substantial amounts of molecular gas (H₂), which was a concern at the time. Flash forward, and we were right: the molecular gas mass is almost always a distant third behind stars and atomic gas in the baryonic mass budgets of individual galaxies. Nowadays, the concern is about the mass of baryons in the circumgalactic medium (CGM). That’s getting ahead of the story, which I’ll save for a future post. For now, it suffices to note that any baryonic mass in the CGM is far beyond the radius where the flat velocity is measured, so is not relevant to the sums here.

Very thin galaxies

The stability of spiral galaxies was a foundational motivation to invoke dark matter: a thin disk of self-gravitating stars is unstable unless embedded in a dark matter halo. Modified dynamics can also stabilize galactic disks. A related test is provided by how thin such galaxies can be.

Thin galaxies exist

Spiral galaxies seen edge-on are thin. They have a typical thickness – their short-to-long axis ratio – of q ≈ 0.2. Sometimes they’re thicker, sometimes they’re thinner, but this is often what we assume when building mass models of the stellar disk of galaxies that are not seen exactly* edge-on. One can employ more elaborate estimators, but the results are not particularly sensitive to the exact thickness so long as it isn’t the limit of either razor thin (q = 0) or a spherical cow (q = 1).

Sometimes galaxies are very thin. Behold the “superthin” galaxy UGC 7321:

*UGC 7321 as seen in optical colors by the Sloan Digital Sky Survey.*

It also looks very thin in the infrared, which is the better tracer of stellar mass:

**Fig. 1** from Matthews et al (1999): *H-band (1.6 micron) image of UGC 7321. Matthews (2000) finds a near-IR axis ratio of 14:1. That’s super thin (q = 0.07)!*

UGC 7321 is very thin, would be low surface brightness if seen face-on (Matthews estimates a central B-band surface brightness of 23.4 mag arcsec^-2), has no bulge component thickening the central region, and contains roughly as much mass in gas as stars. All of these properties dispose a disk to be fragile (to perturbations like mergers and subhalo crossings) and unstable, yet there it is. There are enough similar examples to build a flat galaxy catalog, so somehow the universe has figured out a way for galaxy disks to remain thin and dynamically cold^# for the better part of a Hubble time.

We see spiral galaxies at various inclinations to our line of sight. Some will appear face on, others edge-on, and everything in between. If we observe enough of them, we can work out what the intrinsic distribution is based on the projected version we see.

First, some definitions. A 3D object has three principle axes of lengths a, b, and c. By convention, a is the longest and c the shortest. An oblate model imagines a galaxy like a frisbee: it is perfectly round seen face-on (a = b); seen edge-on q = c/a. More generally, an object can be triaxial, with a ≠ b ≠ c. In this case, a galaxy would not appear perfectly round even when seen perfectly face-on^{^} because it is intrinsically oval (with similar axis lengths a ≈ b but not exactly equal). I expect this is fairly common among dwarf Irregular galaxies.

The observed and intrinsic distribution of disk thicknesses

Benevides et al. (2025) find that the distribution of observed axis ratios q is pretty flat. This is a consequence of most galaxies being seen at some intermediate viewing angle. One can posit an intrinsic distribution, model what one would see at a bunch of random viewing angles, and iterate to extract the true distribution in nature, which they do:

**Figure 6** from Benevides et al. (2025): Comparison between the observed (projected) $q$ distribution and the inferred intrinsic 3D axis ratios for a subsample of dwarfs in the GAMA survey with $M_{⋆} = 10^{9}$ – $10^{9.5} M_{⊙}$ . The observed shapes are shown with the solid black line and are used to derive an intrinsic $c / a$ (long-dashed) and $b / a$ (dotted) distribution when projected. Solid color lines in each panel corresponds to the $q$ values obtained from the 3D model after random projections. Note that a wide distribution of $q$ values is generated by a much narrower intrinsic $c / a$ distribution. For example, the blue shaded region in the left panel shows that an observed $5 %$ of galaxies with $q < 0.2$ requires $41 %$ of galaxies to have an intrinsic $c / a < 0.2$ for an oblate model. Similarly, for a triaxal model (right panel, red curve) $43 %$ of galaxies are required to be thinner than $c / a = 0.2$ . The additional freedom of $b \neq a$ in the triaxial model helps to obtain a better fit to the projected $q$ distribution, but the changes mostly affect large $q$ values and changes little the $c / a$ frequency derived from highly elongated objects.

That we see some thin galaxies implies that they they have to be common, as most of them are not seen edge-on. For dwarf^$ galaxies of a specific mass range, which happens to include UGC 7321, Benevides et al. (2025) infer a lot^% of thin galaxies, at least 40% with q < 0.2. They also infer a little bit of triaxiality, a ≈ b.

The existence and numbers of thin dwarfs seems to come as a surprise to many astronomers. This is perhaps driven in part by theoretical expectations for dwarf galaxies to be thick: a low surface brightness disk has little self-gravity to hold stars in a narrow plane. This expectation is so strong that Benevides et al. (2025) feel compelled to provide some observed examples, as if to say look, really:

**Figure 8** – images of real galaxies from Benevides et al. (2025): Examples of $10$ highly elongated dwarf galaxies with $q \leq 0.2$ and $M_{⋆} = 10^{7}$ – $10^{8.5} M_{⊙}$ . They resemble thin edge-on disks and can be found even among the faintest dwarfs in our sample. Legends in each panel quote the stellar mass, the shape parameter $q$ , as well as the GAMA identifier. Objects are sorted by increasing $M_{⋆}$ , left to right.

As an empiricist who has spent a career looking at low mass and low surface brightness galaxies, this does not come as a surprise to me. These galaxies look normal. That’s what the universe of late type dwarf^$ galaxies looks like.

Edge-on galaxies in LCDM simulations

Thin galaxies do not occur naturally in the hierarchical mergers of LCDM (e.g., Haslbauer et al. 2022), where one would expect a steady bombardment by merging masses to mess things up. The picture above is not what galaxy-like objects in LCDM simulations look like. Scraping through a few simulations to find the flattest galaxies, Benevides et al. (2025) find only a handful of examples:

**Figure 11** – images of simulated galaxies from Benevides et al. (2025): *Edge-on projection of examples of the flattest galaxies in the TNG50 simulation, in different bins of stellar mass.*

Note that only the four images on the left here occupy the same stellar mass range as the images of reality above. These are as close as it gets. Not terrible, but also not representative^&. The fraction of galaxies this thin is a tiny fraction of the simulated population whereas they are quite common in reality. Here the two are compared: three different surveys (solid lines) vs. three different simulations (dashed lines).

**Figure 9** from Benevides et al. (2025): Fraction of galaxies that are derived to be intrinsically thinner than $c / a \leq 0.2$ as a function of stellar mass. Thick solid lines correspond to our observational samples while dashed lines are used to display the results of cosmological simulations. Different colors highlight the specific survey or simulation name, as quoted in the legend. In all observational surveys, the frequency of thin galaxies peaks for dwarfs with $M_{⋆} \sim 10^{9} M_{⊙}$ , almost doubling the frequency observed on the scale of MW-mass galaxies. Thin galaxies do not disappear at lower masses: we infer a significant fraction of dwarf galaxies with $M_{⋆} < 10^{9} M_{⊙}$ to have $c / a < 0.2$ . This is in stark contrast with the negligible production of thin dwarf galaxies in all numerical simulations analyzed here.

Note that the thinnest galaxies in nature are dwarfs of mass comparable to UGC 7321. Thin disks aren’t just for bright spirals like the Milky Way with log(M_*) > 10.5. They are also common^*$ for dwarfs with log(M_*) = 9 and even log(M_*) = 8, which are often gas dominated. In contrast, the simulations produce almost no galaxies that are thin at these lower masses.

The simulations simply do not look like reality. Again. And again, etc., etc., ad nauseam. It’s almost as if the old adage applies: garbage in, garbage out. Maybe it’s not the resolution or the implementation of the simulations that’s the problem. One could get all that right, but it wouldn’t matter if the starting assumption of a universe dominated by cold dark matter was the input garbage.

Galaxy thickness in Newton and MOND

Thick disks are not merely a product of simulations, they are endemic to Newtonian dynamics. As stars orbit around and around a galaxy’s center, they also oscillate up and down, bobbing in and out of the plane. How far up they get depends on how fast they’re going (the dynamical temperature of the stellar population) and how strong the restoring force to the plane of the disk is.

In the traditional picture of a thin spiral galaxy embedded in a quasi-spherical dark matter halo, the restoring force is provided by the stars in the disk. The dark matter halo is there to boost the radial force to make the rotation curve flat, and to stabilize the disk, for which it needs to be approximately spherical. The dark matter halo does not contribute much to the vertical restoring force because it adds little mass near the disk plane. In order to do that, the halo would have to be very squashed (small q) like the disk, in which case we revive the stability problem the halo was put there to solve.

This is why we expect low surface brightness disks to be thick. Their stars are spread thin, the surface mass density is low, so the restoring force to the disk should be small. Disks as thin as UGC 7321 shouldn’t be possible unless they are extremely cold^*# dynamically – a situation that is unlikely to persist in a cosmogony built by hierarchical merging. The simulations discussed above corroborate this expectation.

In MOND, there is no dark matter halo, but the modified force should boost the vertical restoring force as well as the radial force. One thus expects thinner disks in MOND than in Newton.

I pointed this out in McGaugh & de Blok (1998) along with pretty much everything else in the universe that people tell me I should consider without bothering to check if I’ve already considered. Here is the plot I published at the time:

**Figure 9** of McGaugh & de Blok (1998): Thickness q = z₀/h expected for disks of various central surface densities ₀. Shown along the top axis is the equivalent B-band central surface brightness ₀ for _* = 2. Parameters chosen for illustration are noted in the figure (a typical scale length h and two choices of central vertical velocity dispersion _z). Other plausible values give similar results. The solid lines are the Newtonian expectation and the dashed lines that of MOND. The Newtonian and MOND cases are similar at high surface densities but differ enormously at low surface densities. Newtonian disks become very thick at low surface brightness. In contrast, MOND disks can remain reasonably thin to low surface density.

There are many approximations that have to be made in constructing the figure above. I assumed disks were plane-parallel slabs of constant velocity dispersion, which they are not. But this suffices to illustrate the basic point, that disks should remain thinner^&% in MOND than in Newton as surface density decreases: as one sinks further into the MOND regime, there is relatively more restoring force keep disks thin. To duplicate this effect in Newton, one must invent two kinds of dark matter: a dissipational kind of dark matter that forms a dark matter disk in addition to the usual dissipationless cold dark matter that makes a quasi-spherical dark matter halo.

The idea of the plot above was to illustrate the trend of expected thickness for galaxies of different central surface brightness. One can also build a model to illustrate the expected thickness as a function of radius for a pair of galaxies, one high surface brightness (so it starts in the Newtonian regime at small radii) and one of low surface brightness (in the MOND regime everywhere). I have chosen numbers^** resembling the Milky Way for the high surface brightness galaxy model, and scaled the velocity dispersion of the low surface brightness model so it has very nearly the same thickness in the Newtonian regime. In MOND, both disks remain thin as a function of radius (they flare a lot in Newton) and the lower surface brightness disk model is thinner thanks to the relatively stronger restoring force that follows from being deeper in the MOND regime.

The thickness of two model disks, one high surface brightness (solid lines) and the other low surface brightness (dashed lines), as a function of radius. The two are similar in Newton (black), but differ in MOND *(blue)*. The restoring force to the disk is stronger in MOND, so there is less flaring with increasing radius. The low surface brightness galaxy is further in the MOND regime, leading naturally to a thinner disk.

These are not realistic disk models, but they again suffice to illustrate the point: thin disks occur naturally in MOND. Low surface brightness disks should be thick in LCDM (and in Newtonian dynamics in general), but can be as thin as UGC 7321 in MOND. I didn’t aim to make q ≈ 0.1 in the model low surface brightness disk; it just came out that way for numbers chosen to be reasonable representations of the genre.

What the distribution of thicknesses is depends on the accretion and heating history of each individual disk. I don’t claim to understand that. But the mere existence of dwarf galaxies with thin disks is a natural outcome in MOND that we once again struggle to comprehend in terms of dark matter.

*Seeing a galaxy highly inclined minimizes the inclination correction to the kinematic observations [V_rot = V_obs/sin(i)] but to build a mass model we also need to know the face-on surface density profile of the stars, the correction for which depends on 1/cos(i). So as a practical matter, the competition between sin(i) and cos(i) makes it difficult to analyze galaxies at either extreme.

^#Dynamically cold means the random motions (quantified by the velocity dispersion of stars σ) are small compared to ordered rotation (V) in the disk, something like V/σ ≈ 10. As a disk heats (higher σ) it thickens, as some of that random motion goes in the vertical direction perpendicular to the disk. Mergers heat disks because they bring kinetic energy in from random directions. Even after an object is absorbed, the splash it made is preserved in the vertical distribution of the stars which, once displaced, never settle back into a thin disk. (Gas can settle through dissipation, but point masses like stars cannot.)

^Oval distortions are a major source of systematic error in galaxy inclination estimates, especially for dwarf Irregulars. It is an asymmetric error: a galaxy with a mild oval distortion can be inferred to have an inclination (i > 0) even when seen face-on (i = 0), but it can never have an inclination more face-on (i < 0) than exactly face-on. This is one of the common drivers of claims that low mass galaxies fall off the Tully-Fisher relation. (Other common problems include a failure to account for gas mass, bad distance estimates, or not measuring V_flat.)

^$In a field with abominable terminology, what is meant by a “dwarf” galaxy is one of the worst offenders. One of my first conference contributions thirty years ago griped about the [mis]use of this term, and matters have not improved. For this particular figure, Benevides et al. (2025) define it to mean galaxies with stellar masses in the range 9 < log(M_*) < 9.5, which seems big to me, but at least it is below the mass of a typical L* spiral, which has log(M_*) ~ 10.5. For comparison, see Fig. 6 of the review of Bullock & Boylan-Kolchin (2017), who define “bright dwarfs” to have 7 < log(M_*) < 9, and go lower from there, but not higher into the regime that we’re calling dwarf right now. So what a dwarf galaxy is depends on context.

^%Note that the intrinsic distribution peaks below q = 0.2, so arguably one should perhaps adopt as typical the mode of the distribution (q ≈ 0.17).

^&Another way in which even the thin simulated objects are not representative of reality is that they are dynamically hot, as indicated by the κ_rot parameter printed with the image. This is the fraction of kinetic energy in rotation. One of the more favorable cases with κ_rot = 0.67 corresponds to V/σ = 2.5. That happens in reality, but higher values are common. Of course, thin disks and dynamical coldness go hand in hand. Since the simulations involve a lot of mergers, the fraction of kinetic energy in rotation is naturally small. So I’m not saying the simulations are wrong in what they predict given the input physics that they assume, but I am saying that this prediction does not match reality.

^*$The fraction of thin galaxies observed by DESI is slightly higher than found in the other surveys. Having looked at all these data, I am inclined to suspect the culprit is image quality: that of DESI is better. Regardless of the culprit for this small discrepancy between surveys, thin disks are much more common in reality than in the current generation of simulations.

^*#There seems to be a limit to how cold disks get, with a minimum velocity dispersion around ~7 km/s observed in face-on dwarfs when the appropriate number, according to Newton, would be more like 2 km/s, tops. I remember this number from observations in the ’80s and ’90s, along with lots of discussion then to the effect of how can it be so? but it is the new year and I’m feeling too lazy to hunt down all the citations so you get a meme instead.

^&%In an absolute sense, all other things being equal, which they’re not, disks do become thicker to lower surface brightness in both Newton and MOND. There is less restoring force for less surface mass density. It is the relative decline in restoring force and consequent thickening of the disk that is much more precipitous in Newton.

^**For the numerically curious, these models are exponential disks with surface density profiles Σ(R) = Σ₀ e^-R/R_d. Both models have a scale length R_d = 3 kpc. The HSB has Σ₀ = 866 M_☉ pc^-2; this is a good match to the Eilers et al. (2019) Milky Way disk; see McGaugh (2019). The LSB has Σ₀ = 100 M_☉ pc^-2, which corresponds roughly to what I consider the boundary of low surface brightness, a central B-band surface brightness of ~23 mag. arcsec^-2. For the velocity dispersion profile I also assume an exponential with scale length 2R_d (that’s what supposed to happen). The central velocity dispersion of the HSB is 100 km/s (an educated guess that gets us in the right ballpark) and that of the LSB is 33 km/s – the mass is down by a factor of ~9 so the velocity dispersion should be lower by a factor of $\sqrt{9}$ . (I let it be inexact so the solid and dashed Newtonian lines wouldn’t exactly overlap.)

These models are crude, being single-population (there can be multiple stellar populations each with their own velocity dispersion and vertical scale height) and lacking both a bulge and gas. The velocity dispersion profile sometimes falls with a scale length twice the disk scale length as expected, sometimes not. In the Milky Way, R_d ≈ 2.5 or 3 kpc, but the velocity dispersion falls off with a scale length that is not 5 or 6 kpc but rather 21 or 25 kpc. I have also seen the velocity dispersion profile flatten out rather than continue to fall with radius. That might itself be a hint of MOND, but there are lots of different aspects of the problem to consider.

The odd primordial halo of the Milky Way

The mass distribution of dark matter halos that we infer from observations tells us where the dark matter needs to be now. This differs form the mass distribution it had to start, as it gets altered by the process of galaxy formation. It is the primordial distribution that dark matter-only simulations predict most robustly. We* reverse-engineer the collapse of the baryons that make up the visible Galaxy to infer the primordial distribution, which turns out to be… odd.

The Gaia rotation curve and the mass of the Milky Way

As we discussed a couple of years ago, Gaia DR3 data indicate a declining rotation curve for the Milky Way. This decline becomes more steep, nearly Keplerian, in the outskirts of the Milky Way (17 < R < 30 kpc). This is may or may not be consistent with data further out, which gets hard to interpret as the LMC (at 50 kpc) perturbs orbits and the observed motions may not correspond to orbits in dynamical equilibrium. So how much do the data inform us about the gravitational potential?

Milky Way rotation curve (various data) including Gaia DR3 (multiple analyses). Also shown is the RAR model (blue line) that was fit to the terminal velocities from 3 < R < 8.2 kpc (gray points) and predates other data illustrated here.

I am skeptical of the Keplerian portion of this result (as discussed at length at the time) because other galaxies don’t do that. However, I am a big fan of listening to the data, and the people actually doing the work. Taken at face value, the Gaia data show a Keplerian decline with a total mass around 2 x 10¹¹ M_☉. If correct, this falsifies MOND.

How does dark matter fare? There is an implicit assumption made by many in the community that any failing of MOND is an automatic win for dark matter. However, it has been my experience that observations that are problematic for MOND are also problematic for dark matter. So let’s check.

Short answer: this is really weird in terms of dark matter. How weird? For starters, most recent non-Gaia dynamical analyses suggest a total mass closer to 10¹² M_☉, a factor of five higher than the Gaia value. I’m old enough to remember when the accepted mass was 2 x 10¹² M_☉, an order of magnitude higher. Yet even this larger mass is smaller than suggested by abundance matching recipes, which give more like 4 x 10¹² M_☉. So somewhere in the range 2 – 40 x 10¹¹ M_☉.

The Milky Mass has been adjusted so often, have we finally hit it?

The guy was all over the road. I had to swerve a number of times before I hit him.
Boston Driver’s Handbook (1982 edition)^&

If it sounds like we’re all over the map, that’s because we are. It is very hard to constrain the total mass of a dark matter halo. We can’t see it, nor tell where it ends. We infer, indirectly, that the edge is way out beyond the tracers we can see. Heck, even speaking of an “edge” is ill-defined. Theoretically, we expect it to taper off with the density of dark matter falling as ρ ~ r^-3, so there is no definitive edge. Somewhat arbitrarily,** we adopt the radius that encloses a density 200 times the average density of the universe as the “virial” radius. This is all completely notional, and it gets worse, as the process of forming a galaxy changes the initial mass distribution. What we observe today is the changed form, not the primordial initial condition for which the notional mass is defined.

Adiabatic compression during galaxy formation

To form a visible galaxy, baryons must dissipate and sink to the center of their parent dark matter halo. This process changes the mass distribution and alters the halo from its primordial state. In effect, the gravity of the sinking baryons drags some dark matter along^# with them.

The change to the dark matter halo is often called adiabatic compression. The actual process need not be adiabatic, but that’s how we approximate it. We’ve tested this approximation with detailed numerical simulations, and it works pretty well, at least if you do it right (there are boring debates about technique). What happens makes sense intuitively: the response of the primordial halo to the infall of baryons is to become more dense at the center. While this makes sense physically, it is problematic for LCDM as it takes an NFW halo that is already too dense at the center to be consistent with data and makes it more dense. This has been known forever, so opposing this is one thing feedback is invoked to do, which it may or may not do, depending on how it really works. Even if feedback can really turn a compressed cusp into a core, it is widely to expected to be important only in low mass galaxies where the gravitational potential well isn’t too deep. It isn’t supposed to be all that important in galaxies as massive as the Milky Way, though I’m sure that can change as needed.

There are a variety of challenges to implementing an accurate compression computation, so we usually don’t bother: the standard practice is to assume a halo model and fit it to the data. That will, at best, given a description of the current dark matter halo, not what it started as, which is our closest point of comparison with theory. To give an example of the effect, here is a Milky Way model I built a decade ago:

**Figure 13** from McGaugh (2016): Milky Way rotation curve from the data of Luna et al. (2006, red points) and McClure-Griffiths & Dickey (2007, gray points) together with a bulgeless baryonic mass model (black line). The total rotation is approximately fit (blue line) with an adiabatically compressed NFW halo (solid green line) using the procedure implemented by Sellwood & McGaugh (2005). The primordial halo before compression is shown as the dashed line. The parameters of the primordial halo are a concentration c = 7 and a mass *M₂₀₀ = 6 x 10¹¹ M_☉*. *Fitting NFW to the present halo instead gives c = 14, M₂₀₀ = 4 x 10¹¹ M_☉, so the difference is appreciable and depend on the quality and radial extent of the available data.*

The change from the green dashed line to the solid green line is the difference compression makes. That’s what happens if a baryon distribution like that of the Milky Way settles in an NFW halo. The inferred mass M₂₀₀ is lower and the concentration c higher than it originally was – and it is the original version that we should compare to the expectations of LCDM.

When I built this model, I considered several choices for the bulge/bar fraction: something reasonable, something probably too large, and something definitely too small (zero). The model above is the last case of zero bulge/bar. I show it because it is the only case for which the compression procedure worked. If there is a larger central concentration of baryons – i.e., a bulge and/or a bar – then the compression is greater. Too great, in fact: I could not obtain a fit (see also Binney & Piffl and this related discussion).

The calculation of the compression requires knowledge of the primordial halo parameters, which is what one is trying to obtain. So one has to guess an initial state, run the code, check how close it came, then iterate the initial guess. This is computationally expensive, so I was just eyeballing the fit above. Pengfei has done a lot of work to implement a method that iteratively computes the compression and rigorously fits it to data. So we decided to apply it to the newer Gaia DR3 data.

Fitting the Gaia rotation curve with adiabatically compressed halos

We need two inputs here: one, the rotation curve to fit, and two, the baryonic distribution of the Milky Way. The latter is hard to specify given our location within the Milky Way, so there are many different estimates. We tried a dozen.

Another challenge of doing this is deciding which data rotation curve data to fit. We chose to focus on the rotation curve of Jiao et al. (2023) because they made estimates of the systematic as well as random errors. The statistics of Gaia are so good it is practically impossible to fit any equilibrium model to them. There are aspects of the data for which we have to consider non-equilibrium effects (spiral arms, the bar, “snails” from external perturbations) so the usual assumptions are at best an approximation, plus there can always be systematic errors. So the approach is to believe the data, but with the uncertainty estimate of Jiao et al. (2023) that includes systematics.

For a halo model, we started with the boilerplate LCDM NFW halo^$. This doesn’t fit the data. Indeed, all attempts to fit NFW halos fail in similar ways for all of the different baryonic mass models we tried. The quasi-Keplerian part of the Gaia rotation curve simply cannot be fit: the NFW halo inevitably requires more mass further out.

Here are a few examples of the NFW fits:

**Fig. A.3** from Li et al. (2025). Fits of Galactic circular velocities using the NFW model implementing adiabatic halo contraction using 3 baryonic models. [Another 9 appear in the paper.] Data points with errors are the rotation velocities from Jiao et al. (2023), while open triangles show the data from Eilers et al. (2019), which are not fitted. [The radius ranges from 5 to 30 kpc.] Blue, purple, green and black solid lines correspond to the contributions by the stellar disk, central bar, gas (and dust if any), and compressed dark matter halo, respectively. The total contributions are shown using red solid lines. Black dashed lines are the inferred primordial halos.

LCDM as represented by NFW suffers the same failure mode as seen in MOND (plot at top): both theories overshoot the Gaia rotation curve at R > 17 kpc. This is an example of how data that are problematic for MOND are also problematic for dark matter.

We do have more freedom in the case of dark matter. So we tried a different halo model, Einasto. (For this and many other halo models, see Pengfei’s epic compendium of dark matter halo fits.) Where NFW has two parameters, a concentration c and mass M₂₀₀, Einasto has a third parameter that modulates the shape of the density profile^%. For a very specific choice of this third parameter (α = 0.17), it looks basically the same as NFW. But if we let α be free, then we can obtain a fit. Of all the baryonic models, the RAR model+compressed Einasto fits best:

**Fig. 1** from Li et al. (2025). Example of a circular velocity fit using the McGaugh19^$$ model for baryonic mass distributions. The purple, blue, and green lines represent the contributions of the bar, disk, and gas components, respectively. The solid and dashed black lines show the current and primordial dark matter halos, respectively. The solid red line indicates the total velocity profile. The black points show the latest Gaia measurements (Jiao et al. 2023), and the gray upward triangles and squares show the terminal velocities from (McClure-Griffiths & Dickey 2007, 2016), and Portail et al. (2017), respectively. The data marked with open symbols were not fit because they do not consider the systematic uncertainties.

So it is possible to obtain a fit considering adiabatic compression. But at what price? The parameters of the best-fit primordial Einasto halo shown above are c = 5.1, M₂₀₀ = 1.2 x 10¹¹ M_☉, and α = 2.75. That’s pretty far from the α = 0.17 expected in LCDM. The mass is lower than low. The concentration is also low. There are expectation values for all these quantities in LCDM, and all of them miss the mark.

**Fig. 2** from Li et al. (2025). Halo masses and concentrations of the primordial Galactic halos derived from the Gaia circular velocity fits using 12 baryonic models. The red and blue stars with errors represent the halos with and without adiabatic contraction, respectively. The predicted halo mass-concentration relation within 1 σ from simulations (Dutton & Macciò 2014) is shown as the declining band. The vertical band shows the expected range of the MW halo mass according to the abundance-
matching relation (Moster et al. 2013). The upper and lower limits are set by the highest stellar mass model plus 1 σ and the lowest stellar mass model minus 1 σ, respectively.

The expectation for mass and concentration is shown as the bands above. If the primordial halo were anything like what it should be in LCDM, the halo parameters represented by the red stars should be where the bands intersect. They’re nowhere close. The same goes for the shape parameter. The halo should have a density profile like the blue band in the plot below; instead it is more like the red band.

**Fig. 3** from Li et al. (2025). Structure of the inferred primordial and current Galactic halos, along with predictions for the cold and warm dark matter. The density profiles are scaled so that there is no need to assume or consider the masses or concentrations for these halos. The gray band indicates the range of the current halos derived from the Gaia velocity fits using the 12 baryonic models, and the red band shows their corresponding primordial halos within 1σ. The blue band presents the simulated halos with cold dark matter only (Dutton & Macciò 2014). The purple band shows the warm dark matter halos (normalized to match the primordial Galactic halo) with a core size spanning from 4.56 kpc (WDM5 in Macciò et al. 2012) to 7.0 kpc, corresponding to a particle mass of 0.05 keV and lower.

So the primordial halo of the Milky Way is pretty odd. From the perspective of LCDM, the mass is too low and the concentration is too low. The inner profile is too flat (a core rather than a cusp) and the outer profile is too steep. This outer steepness is a large part of why the mass comes out so low; there just isn’t a lot of halo out there. The characteristic density ρ_s is at least in the right ballpark, so aside from the inner slope, the outer slope, the mass, and the concentration, LCDM is doing great.

What if we ignore the naughty bits?

It is really hard for any halo model to fit the steep decline of the Gaia rotation curve at R > 17 kpc. Doing so is what makes the halo mass so small. I’m skeptical about this part of the data, so do things improve if we don’t sweat that part?

Ignoring the data at R > 17 kpc allows the mass to be larger, consistent with other dynamical determinations if not quite with abundance matching. However, the inner parts of the rotation curve still prefer a low density core. That is, something like the warm dark matter halo depicted as the purple band above rather than NFW with its dense central cusp. Or self-interacting dark matter. Or cold dark matter with just-so feedback. Or really anything that obfuscates the need to confront the dangerous question: why does MOND perform better?

*This post is based on the recently published paper by my former student Pengfei Li, who is now faculty at Nanjing University. They have a press release about it.

^&A few months after reading this in the Boston Driver’s Handbook, this exact thing happened to me.

**This goes back to BBKS in 1986 when the bedrock assumption was that the universe had Ω_m = 1, for which the virial radius was 188 times the critical density. 200 was close enough, and stuck, even though for LCDM the virial radius is more like an overdensity close to 100, which is even further out.

^#This is one of many processes that occur in simulations, which are great for examining the statistics of simulated galaxy-like objects but completely useless for modeling individual galaxies in the real universe. There may be similar objects, but one can never say “this galaxy is represented by that simulated thing.” To model a real galaxy requires a customized approach.

^$NFW halos consistently perform worse in fitting data than any other halo model, of which there are many. It has been falsified as a viable representation of reality so many times that I can’t recall them all, and yet they remain the go-to model. I think that’s partly thanks to their simplicity – it is mathematically straightforward to implement – and to the fact that is what simulations predict: LCDM halos should look like NFW. People, including scientists, often struggle to differentiate simulation from reality, so we keep flogging the dead horse.

^%The density profile of the NFW halo model asymptotes to power laws at both small and large radii: ρ → r^-1 as r → 0 and ρ → r^-3 as r → ∞. The third parameter of Einasto allows a much wider ranges of shapes.

^$$The McGaugh19 model user here is the one with a reasonable bulge/bar. This dense component can be fit in this case because we start with a halo model with a core rather than a cusp (closer to α = 1 than to the α = 0.17 of NFW/LCDM).

Non-equilibrium dynamics in galaxies that appear to lack dark matter: tidal dwarf galaxies

There are a number of galaxies that have been reported to lack dark matter. This is weird in a universe made of dark matter. It is also weird in MOND, which (if true) is what causes the inference of dark matter. So how can this happen?

In most cases, it doesn’t. These claims not only don’t make sense in either context, they are simply wrong. I don’t want to sound too harsh, as I’ve come close to making the same mistake myself. The root cause of this mistake is often a form of static thinking in dynamic situations that the here and now is always a representative test. The basic assumption we have to make to interpret observed velocities in terms of mass is that systems are in (or close to) gravitational equilibrium so that the kinetic energy is a measure of the gravitational potential. In most places, this is a good assumption, so we tend to forget we even made it.

However, no assumption is ever perfect. For example, Gaia has revealed a wealth of subtle non-equilibrium effects in the Milky Way. These are not so large as to invalidate the basic inference of the mass discrepancy, but neither can they be entirely ignored. Even maintaining the assumption of a symmetric but non-smooth mass profile in equilibrium complicates the analysis.

Since the apparent absence of dark matter is unexpected in either theory, one needs to question the assumptions whenever this inference is made. There is one situation in which it is expected, so let’s consider that special case:

Tidal dwarf galaxies

Most dwarf galaxies are primordial – they are the way they are because they formed that way. However, it is conceivable that some dwarfs may form in the tidal debris of collisions between large galaxies. These are tidal dwarf galaxies (TDGs). Here are some examples of interacting systems containing candidate TDGs:

**Fig. 1** from Lelli et al. (2015): *images of interacting systems with TDG candidates noted in yellow.*

I say candidate TDGs because it is hard to be sure a particular object is indeed tidal in origin. A good argument can be made that TDGs require such special conditions to form that perhaps they should not be able to form at all. As debris in tidal arms is being flung about in the (~ 200 km/s) potential well of a larger system, it is rather challenging for material to condense into a knot with a much smaller potential well (< 50 km/s). It can perhaps happen if the material in the tidal stream is both lumpy (to provide a seed to condense on) and sufficiently comoving (i.e., the tidal shear of the larger system isn’t too great), so maybe it happens on rare occasions. One way to distinguish TDGs from primordial dwarfs is metallicity: typical primordial dwarfs have low metallicity while TDGs have the higher metallicity of the giant system that is the source of the parent material.

A clean test of hypotheses

TDGs provide an interesting test of dark matter and MOND. In the vast majority of dark matter models, dark matter halos are dynamically hot, quasi-spherical systems with the particles that compose the dark matter (whatever it is) on eccentric, randomly oriented orbits that sum to a big, messy blob. Arguably it has to be this way in order to stabilize the disks of spiral galaxies. In contrast, the material that composes the tidal tails in which TDGs form originates in the baryonic material of the dynamically cold spiral disks where orbits are nearly circular in the same direction in the same thin plane. The phase space – the combination of position x,y,z and momentum v_x,v_y,v_z – of disk and halo couldn’t be more different. This means that when two big galaxies collide or have a close interaction, everything gets whacked and the two components go their separate ways. Starting in orderly disks, the stars and gas make long, coherent tidal tails. The dark matter does not. The expectation from these basic phase space considerations is consistent with detailed numerical simulations.

We now have a situation in which the dark matter has been neatly segregated from the luminous matter. Consequently, if TDGs are able to form, they must do it only* with baryonic mass. The ironic prediction of a universe dominated by dark matter is that TDGs should be devoid of dark matter.

In contrast, one cannot “turn off” the force law in MOND. MOND can boost the formation of TDGs in the first place, but if said TDGs wind up in the low acceleration regime, they must evince a mass discrepancy. So the ironic prediction here is that, in ignorance of MOND, MOND means that we would infer that TDGs do have dark matter.

Got that? Dark matter predicts TDGs with no dark matter. MOND predicts TDGs that look like they do have dark matter. That’s not confusing at all.

Clean in principle, messy in practice

Tests of these predictions have a colorful history. Bournaud et al. (2007) did a lovely job of combining simulations with observations of the Seashell system (NGC 5291 above) and came to a striking conclusion: the rotation curves of TDGs exceeded that expected for the baryons alone:

**Fig. 2** from Bournaud et al. (2007) *showing the rotation curves for the three TDGs identified in the image above.*

This was a strange, intermediary result. TDGs had more dark matter than the practically zero expected in LCDM, but less than comparable primordial dwarfs as expected in MOND. That didn’t make sense in either theory. They concluded that there must be a component of some other kind of dark matter that was not the traditional dark halo, but rather part of the spiral disk to begin with, perhaps unseen baryons in the form of very cold molecular gas.

Gentile et al. (2007) reexamined the situation, and concluded that the inclinations could be better constrained. When this was done, the result was more consistent with the prediction of MOND and the baryonic Tully-Fisher relation (BTFR. See their Fig. 2).

**Fig. 1** from Gentile et al. (2007): Rotation curve data (full circles) of the 3 tidal dwarf galaxies (Bournaud et al. 2007). The lower (red) curves are the Newtonian contribution V_bar of the baryons (and its uncertainty, indicated as dotted lines). The upper (black) curves are the MOND prediction and its uncertainty (dotted lines). The top panels have as an implicit assumption (following Bournaud et al.) an inclination angle of 45 degrees. In the middle panels the inclination is a free parameter, and the bottom panels show the fits made with the first estimate for the external field effect (EFE).

Clearly there was room for improvement, both in data quality and quantity. We decided to have a go at it ourselves, ultimately leading to Lelli et al. (2015), which is the source of the pretty image above. We reanalyzed the Seashell system, along with some new TDG candidates.

Making sense of these data is not easy. TDG candidates are embedded in tidal features. It is hard to know where the dwarf ends and the tidal stream begins, or even to be sure there is a clear distinction. Here is an example of the northern knot in the Seashell system:

**Fig. 5** from Lelli et al. (2015): *Top panels*: optical image (*left*), total H I map (*middle*), and H I velocity field (*right*). The dashed ellipse corresponds to the disc model described in Sect. 5.1. The cross and dashed line illustrate the kinematical centre and major axis, respectively. In the bottom-left corner, we show the linear scale (optical image) and the H I beam (total H I map and velocity field) as given in Table 6. In the total H I map, contours are at ~4.5, 9, 13.5, 18, and 22.5 M_⊙ pc^-2. *Bottom panels*: position-velocity diagrams obtained from the observed cube (*left*), model cube (*middle*), and residual cube (*right*) along the major and minor axes. Solid contours range from 2σ to 8σ in steps of 1σ. Dashed contours range from −2σ to −4σ in steps of −1σ. The horizontal and vertical lines correspond to the systemic velocity and dynamical centre, respectively.

Both the distribution of gas and the velocities along the tidal tail often blend smoothly across TDG candidates, making it hard to be sure they have formed a separate system. In the case above, I can see what we think is the velocity field of the TDG alone (contained by the ellipse in the upper right panel), but is that really an independent system that has completely decoupled from the tidal material from which it formed? Definite maybe!

Federico Lelli did amazing work to sort through these difficult-to-interpret data. At the end of the day, he found that there was no need for dark matter in any of these TDG candidates. The amplitude of the apparent circular speed was consistent with the enclosed mass of baryons.

**Figs. 11 and 13** from Lelli et al. (2015): the enclosed dynamical-to-baryonic mass ratio (left) and baryonic Tully-Fisher relation (right). TDGs (red points) are consistent with a mass ratio of unity: the observed baryons suffice; no dark matter is inferred. Contrary to Gentile et al., this manifests as a clear offset from the BTFR followed by normal galaxies.

Taken at face value, this absence of dark matter is a win for a universe made of dark matter and a falsification of MOND.

So we were prepared to say that, and did, but as Federico checked the numbers, it occurred to him to check the timescales. Mergers like this happen over the course of a few hundred million years, maybe a billion. The interactions we observe are ongoing; just how far into the process are they? Have the TDGs had time to settle down into dynamical equilibrium? That is the necessary assumption built into the mass ratio plotted above: the dynamical mass assumes the measured speed is that of a test particle in an equilibrium orbit. But these systems are manifestly not in equilibrium, at least on large scales. Maybe the TDGs have had time to settle down?

We can ask how long it takes to make an orbit at the observed speed, which is low by the standards of such systems (hence their offset from Tully-Fisher). To quote from the conclusions of the paper,

These [TDG] discs, however, have orbital times ranging from ~1 to ~3 Gyr, which are significantly longer than the TDG formation timescales (≲1 Gyr). This raises the question as to whether TDGs have had enough time to reach dynamical equilibrium.
Lelli et al. (2015)

So no, not really. We can’t be sure the velocities are measuring the local potential well as we want them to do. A particle should have had time to go around and around a few times to settle down in a new equilibrium configuration; here they’ve made 1/3, maybe 1/2 half of one orbit. Things have not had time to settle down, so there’s not really a good reason to expect that the dynamical mass calculation is reliable.

It would help to study older TDGs, as these would presumably have had time to settle down. We know of a few candidates, but as systems age, it becomes harder to gauge how likely they are to be legitimate TDGs. When you see a knot in a tidal arm, the odds seem good. If there has been time for the tidal stream to dissipate, it becomes less clear. So if such a thing turns out to need dark matter, is that because it is a TDG doing as MOND predicted, or just a primordial dwarf we mistakenly guessed was a TDG?

We gave one of these previously unexplored TDG candidates to a grad student. After much hard work combining observations from both radio and optical telescopes, she has demonstrated that it isn’t a TDG at all, in either paradigm. The metallicity is low, just as it should be for a primordial dwarf. Apparently it just happens to be projected along a tidal tail where it looks like a decent candidate TDG.

This further illustrates the trials and tribulations we encounter in trying to understand our vast universe.

*One expects cold dark matter halos to have subhalos, so it seems wise to suspect that perhaps TDGs condense onto these. Phase space says otherwise. It is not sufficient for tidal debris to intersect the location of a subhalo, the material must also “dock” in velocity space. Since tidal arms are being flung out at the speed that is characteristic of the giant system, the potential wells of the subhalos are barely speed bumps. They might perturb streams, but the probability of them being the seeds onto which TDGs condense is small: the phase space just doesn’t match up for the same reasons the baryonic and dark components get segregated in the first place. TDGs are one galaxy formation scenario the baryons have to pull off unassisted.

Kinematics suggest large masses for high redshift galaxies

This is what I hope will be the final installment in a series of posts describing the results published in McGaugh et al. (2024). I started by discussing the timescale for galaxy formation in LCDM and MOND which leads to different and distinct predictions. I then discussed the observations that constrain the growth of stellar mass over cosmic time and the related observation of stellar populations that are mature for the age of the universe. I then put on an LCDM hat to try to figure out ways to wriggle out of the obvious conclusion that galaxies grew too massive too fast. Exploring all the arguments that will be made is the hardest part, not because they are difficult to anticipate, but because there are so many* options to consider. This leads to many pages of minutiae that no one ever seems to read⁺, so one of the options I’ve discussed (e.g., super-efficient star formation) will likely emerge as the standard picture even if it comes pre-debunked.

The emphasis so far has been on the evolution of the stellar masses of galaxies because that is observationally most accessible. That gives us the opportunity to wriggle, because what we really want to measure to test LCDM is the growth of [dark] mass. This is well-predicted but invisible, so we can always play games to relate light to mass.

Mass assembly in LCDM from the IllustrisTNG50 simulation. The dark matter mass assembles hierarchically in the merger tree depicted at left; the size of the circles illustrates the dark matter halo mass. The corresponding stellar mass of the largest progenitor is shown at right as the red band. This does not keep pace with the apparent assembly of stellar mass (data points), but what is the underlying mass really doing?

Galaxy Kinematics

What we really want to know is the underlying mass. It is reasonable to expect that the light traces this mass, but is there another way to assess it? Yes: kinematics. The orbital speeds of objects in galaxies trace the total potential, including the dark matter. So, how massive were early galaxies? How does that evolve with redshift?

The rotation curve of NGC 6946 traced by stars at small radii and gas farther out. This is a typical flat rotation curve (data points) that exceeds what can be explained by the observed baryonic mass (red line deduced from the stars and gas pictured at right), leading to the inference of dark matter.

The rotation curve for NGC 6946 shows a number of well-established characteristics for nearby galaxies, including the dominance of baryons at small radii in high surface brightness galaxies and the famous flat outer portion of the rotation curve. Even when stars contribute as much mass as allowed by the inner rotation curve (“maximum disk“), there is a need for something extra further out (i.e., dark matter or MOND). In the case of dark matter, the amplitude of flat rotation is typically interpreted as being indicative^& of halo mass.

So far, the rotation curves of high redshift galaxies look very much like those of low redshift galaxies. There are some fast rotators at high redshift as well. Here is an example observed by Neeleman et al. (2020), who measure a flat rotation speed of 272 km/s for DLA0817g at z = 4.26. That’s more massive than either the Milky Way (~200 km/s) or Andromeda (~230 km/s), if not quite as big as local heavyweight champion UGC 2885 (300 km/s). DLA0817g looks to be a disk galaxy that formed early and is sedately rotating only 1.4 Gyr after the Big Bang. It is already massive at this time: not at all the little nuggets we expect from the CDM merger tree above.

**Fig. 1** from Neeleman et al. (2020): the velocity field (left) and position-velocity diagram (right) of DLA0817g. The velocity field looks like that of a rotating disk with the raw *position-velocity diagram* shows motions of ~200 km/s on either side of the center. When corrected for inclination, the flat rotation speed is 272 km/s, corresponding to a massive galaxy near the top of the Tully-Fisher relation.

This is anecdotal, of course, but there are a good number of similar cases that are already known. For example, the kinematics of ALESS 073.1 at z ≈ 5 indicate the presence of a massive stellar bulge as well as a rapidly rotating disk (Lelli et al. 2021). A similar case has been observed at z ≈ 6 (Tripodi et al. 2023). These kinematic observations indicate the presence of mature, massive disk galaxies well before they were expected to be in place (Pillepich et al. 2019; Wardlow 2021). The high rotation speeds observed in early disk galaxies sometimes exceed 250 (Neeleman et al. 2020) or even 300 km s⁻¹ (Nestor Shachar et al. 2023; Wang et al. 2024), comparable to the most massive local spirals (Noordermeer et al. 2007; Di Teodoro et al. 2021, 2023). That such rapidly rotating galaxies exist at high redshift indicates that there is a lot of mass present, not just light. We can’t just tweak the mass-to-light ratio of the stars to explain the photometry and also explain the kinematics.

In a seminal galaxy formation paper, Mo, Mao, & White (1998) predicted that “present-day disks were assembled recently (at z ≤ 1).” Today, we see that spiral galaxies are ubiquitous in JWST images up to z ∼ 6 (Ferreira et al. 2022, 2023; Kuhn et al. 2024). The early appearance of massive, dynamically cold (Di Teodoro et al. 2016; Lelli et al. 2018, 2023; Rizzo et al. 2023) disks in the first few billion years after the Big Bang is contradictory the natural prediction of ΛCDM. Early disks are expected to be small and dynamically hot (Dekel & Burkert 2014; Zolotov et al. 2015; Krumholz et al. 2018; Pillepich et al. 2019), but they are observed to be massive and dynamically cold. (Hot or cold in this context means a high or low amplitude of the velocity dispersion relative to the rotation speed; the modern Milky Way is cold with σ ~ 20 km/s and V_c ~ 200 km/s.) Understanding the stability and longevity of dynamically cold spiral disks is foundational to the problem.

Kinematic Scaling Relations

Beyond anecdotal cases, we can check on kinematic scaling relations like Tully–Fisher. These are expected to emerge late and evolve significantly with redshift in LCDM (e.g., Glowacki et al. 2021). In MOND, the normalization of the baryonic Tully–Fisher relation is set by a₀, so is immutable for all time if a₀ is constant. Let’s see what the data say:

**Figure 9** from McGaugh et al (2024): The baryonic Tully–Fisher (left) and dark matter fraction–surface brightness (right) relations. Local galaxy data (circles) are from Lelli et al. (2019; left) and Lelli et al. (2016; right). Higher-redshift data (squares) are from Nestor Shachar et al. (2023) in bins with equal numbers of galaxies color coded by redshift: 0.6 < z < 1.22 (blue), 1.22 < z < 2.14 (green), and 2.14 < z < 2.53 (red). Open squares with error bars illustrate the typical uncertainties. The relations known at low redshift also appear at higher redshift with no clear indication of evolution over a lookback time up to 11 Gyr.

Not much to see: the data from Nestor Shachar et al. (2023) show no clear indication of evolution. The same can be said for the dark matter fraction-surface brightness relation. (Glad to see that being plotted after I pointed it out.) The local relations are coincident with those at higher redshift for both relations within any sober assessment of the uncertainties – exactly what we measure and how matters at this level, and I’m not going to attempt to disentangle all that here. Neither am I about to attempt to assess the consistency (or lack thereof) with either LCDM or MOND; the data simply aren’t good enough for that yet. It is also not clear to me that everyone agrees on what LCDM predicts.

What I can do is check empirically how much evolution there is within the 100-galaxy data set of Nestor Shachar et al. (2023). To do that, I fit a line to their data (the left panel above) and measure the residuals: for a given rotation speed, how far is each galaxy from the expected mass? To compare this with the stellar masses discussed previously, I normalize those residuals to the same M_*^* = 9 x 10¹⁰ M_☉. If there is no evolution, the data will scatter around a constant value as function of redshift:

This figure reproduces the stellar mass-redshift data for L* galaxies (black points) and the monolithic (purple line) and LCDM (red and green lines) models discussed previously. The blue squares illustrate deviations of the data of Nestor Shachar et al. (2023) from the baryonic Tully-Fisher relation (dashed line, normalized to the same mass as the monolithic model). There is no indication of evolution in the baryonic Tully-Fisher relation, which was apparently established within the first few billion years after the Big Bang (z = 2.5 corresponds to a cosmic age of about 2.6 Gyr). The data are consistent with a monolithic galaxy formation model in which all the mass had been assembled into a single object early on.

The data scatter around a constant value as function of redshift: there is no perceptible evolution.

The kinematic data for rotating galaxies tells much the same story as the photometric data for galaxies in clusters. The are both consistent with a monolithic model that gathered together the bulk of the baryonic mass early on, and evolved as an island universe for most of the history of the cosmos. There is no hint of the decline in mass with redshift predicted by the LCDM simulations. Moreover, the kinematics trace mass, not just light. So while I am careful to consider the options for LCDM, I don’t know how we’re gonna get out of this one.

Empirically, it is an important observation that there is no apparent evolution in the baryonic Tully-Fisher relation out to z ~ 2.5. That’s a lookback time of ~11 Gyr, so most of cosmic history. That means that whatever physics sets the relation did so early. If the physics is MOND, this absence of evolution implies that a₀ is constant. There is some wiggle room in that given all the uncertainties, but this already excludes the picture in which a₀ evolves with the expansion rate through the coincidence a₀ ~ cH₀. That much evolution would be readily perceptible if H(z) evolves as it appears to do. In contrast, the coincidence a₀ ~ c²Λ^1/2 remains interesting since the cosmological constant is constant. Perhaps this is just a coincidence, or perhaps it is a hint that the anomalous acceleration of the expansion of the universe is somehow connected with the anomalous acceleration in galaxy dynamics.

Though I see no clear evidence for evolution in Tully-Fisher to date, it remains early days. For example, a very recent paper by Amvrosiadis et al. (2025) does show a hint of evolution in the sense of an offset in the normalization of the baryonic Tully-Fisher relation. This isn’t very significant, being different by less than 2σ; and again we find ourselves in a situation where we need to take a hard look at all the assumptions and population modeling and velocity measurements just to see if we’re talking about the same quantities before we even begin to assess consistency or the lack thereof. Nevertheless, it is an intriguing result. There is also another interesting anecdotal case: one of their highest redshift objects, ALESS 071.1 at z = 3.7, is also the most massive in the sample, with an estimated stellar mass of 2 x 10¹² M_☉. That is a crazy large number, comparable to or maybe larger than the entire dark matter halo of the Milky Way. It falls off the top of any of the graphs of stellar mass we discussed before. If correct, this one galaxy is an enormous problem for LCDM regardless of any other consideration. It is of course possible that this case will turn out to be wrong for some reason, so it remains early days for kinematics at high redshift.

Cluster Kinematics

It is even earlier days for cluster kinematics. First we have to find them, which was the focus of Jay Franck’s thesis. Once identified, we have to estimate their masses with the available data, which may or may not be up to the task. And of course we have to figure out what theory predicts.

LCDM makes a clear prediction for the growth of cluster mass. This work out OK at low redshift, in the sense that the cluster X-ray mass function is in good agreement with LCDM. Where the theory struggles is in the proclivity for the most massive clusters to appear sooner in cosmic history than anticipated. Like individual galaxies, they appear too big too soon. This trend persisted in Jay’s analysis, which identified candidate protoclusters at higher redshifts than expected. It also measured velocity dispersions that were consistently higher than found in simulations. That is, when Jay applied the search algorithm he used on the data to mock data from the Millennium simulation, the structures identified there had velocity dispersions on average a factor of two lower than seen in the data. That’s a big difference in terms of mass.

**Figure 11** from McGaugh et al. (2024): Measured velocity dispersions of protocluster candidates (Franck & McGaugh 2016a, 2016b) as a function of redshift. Point size grows with the assessed probability that the identified overdensities correspond to a real structure: all objects are shown as small points, candidates with P > 50% are shown as light blue midsize points, and the large dark blue points meet this criterion and additionally have at least 10 spectroscopically confirmed members. The MOND mass for an equilibrium system in the low-acceleration regime is noted at right; these are comparable to cluster masses at low redshift.

At this juncture, there is no way to know if the protocluster candidates Jay identified are or will become bound structures. We made some probability estimates that can be summed up as “some are probably real, but some probably are not.” The relative probability is illustrated by the size of the points in the plot above; the big blue points are the most likely to be real clusters, having at least ten galaxies at the same place on the sky at the same redshift, all with spectroscopically measured redshifts. Here the spectra are critical; photometric redshifts typically are not accurate enough to indicate that galaxies that happen to be nearby to each other on the sky are also that close in redshift space.

The net upshot is that there are at least some good candidate clusters at high redshift, and these have higher velocity dispersions than expected in LCDM. I did the exercise of working out what the equivalent mass in MOND would be, and it is about the same as what we find for clusters at low redshift. This estimate assumes dynamical equilibrium, which is very far from guaranteed. But the time at which these structures appear is consistent with the timescale for cluster formation in MOND (a couple Gyr; z ~ 3), so maybe? Certainly there shouldn’t be lots of massive clusters in LCDM at z ~ 3.

Kinematic Takeaways

While it remains early days for kinematic observations at high redshift, so far these data do nothing to contradict the obvious interpretation of the photometric data. There are mature, dynamically cold, fast rotating spiral galaxies in the early universe that were predicted not to be there by LCDM. Moreover, kinematics traces mass, not just light, so all the wriggling we might try to explain the latter doesn’t help with the former. The most obvious interpretation of the kinematic data to date is the same as that for the photometric data: galaxies formed early and grew massive quickly, as predicted a priori by MOND.

*The papers I write that cover both theories always seem to wind up lopsided in favor of LCDM in terms of the bulk of their content. That happens because it takes many pages to discuss all the ins and outs. In contrast, MOND just gets it right the first time, so that section is short: there’s not much more to say than “Yep, that’s what it predicted.”

⁺I’ve yet not heard directly any criticisms of our paper. The criticisms that I’ve heard second or third hand so far almost all fall in the category of things we explicitly discussed. That’s a pretty clear tell that the person leveling the critique hasn’t bothered to read it. I don’t expect everyone to agree with our take on this or that, but a competent critic would at least evince awareness that we had addressed their concern, even if not to their satisfaction. We rarely seem to reach that level: it is much easier to libel and slander than engage with the issues.

The one complaint I’ve heard so far that doesn’t fall in the category of things-we-already-discussed is that we didn’t do hydrodynamic simulations of star formation in molecular gas. That is a red herring. To predict the growth of stellar mass, all we need is a prescription for assembling mass and converting baryons into stars; this is essentially a bookkeeping exercise that can be done analytically. If this were a serious concern, it should be noted that most cosmological hydro-simulations also fail to meet this standard: they don’t resolve star formation, so they typically adopt some semi-empirical (i.e., data-informed) bookkeeping prescription for this “subgrid physics.”

Though I have not myself attempted to numerically simulate galaxy formation in MOND, Sanders (2008) did. More recently, Eappen et al. (2022) have done so, including molecular gas and feedback^$ and everything. They find a star formation history compatible with the analytic models we discuss in our paper.

^$Related detail: Eappen et al find that different feedback schemes make little difference to the end result. The deus ex machina invoked to solve all problems in LCDM is largely irrelevant in MOND. There’s a good physical reason for this: gravity in MOND is sourced by what you see; how it came to have its observed distribution is irrelevant. If 90% of the baryons are swept entirely out of the galaxy by some intense galactic wind, then they’re gone BYE BYE and don’t matter any more. In contrast, that is one of the scenarios sometimes invoked to form cores in dark matter halos that are initially cuspy: the departure of all those baryons perturbs the orbits of the dark matter particles and rearranges the structure of the halo. While that might work to alter halo structure, how it results in MOND-like phenomenology has never been satisfactorily explained. Mostly that is not seen as even necessary; converting cusp to core is close enough!

^&Though we typically associate the observed outer velocity with halo mass, an important caveat is that the radius also matters: M ~ RV², and most data for high redshift galaxies do not extend very far out in radius. Nevertheless, it takes a lot of mass to make rotation speeds of order 200 km/s within a few kpc, so it hardly matters if this is or is not representative of the dark matter halo: if it is all stars, then the kinematics directly corroborate the interpretation of the photometric data that the stellar mass is large. If it is representative of the dark matter halo, then we expect the halo radius to scale with the halo velocity (R₂₀₀ ~ V₂₀₀) so M₂₀₀~ V₂₀₀³ and again it appears that there is too much mass in place too early.

The fault in our stars: blame them, not the dark matter!

As discussed in recent posts, the appearance of massive galaxies in the early universe was predicted a priori by MOND (Sanders 1998, Sanders 2008, Eappen et al. 2022). This is problematic for LCDM. How problematic? That’s always the rub.

The data follow the evolutionary track of a monolithic model (purple line) rather than the track of the largest progenitor predicted by hierarchical LCDM (dotted lines leading to different final masses).

The problem that JWST observations pose for LCDM is that there is a population of galaxies in the high redshift universe that appear to evolve as giant monoliths rather than assembling hierarchically. Put that way, it is a fatal flaw: hierarchical assembly of mass is fundamental to the paradigm. But we don’t observe mass, we observe light. So the obvious “fix” is to adjust the mapping of observed light to predicted dark halo mass in order to match the observations. How plausible is this?

Merger trees from the Illustris-TNG50 simulation showing the hierarchical assembly of L* galaxies. The dotted lines in the preceding plot show the stellar mass growth of the largest progenitor, which is on the left of each merger tree. All progenitors were predicted to be tiny at z > 3, well short of what we observe.

Before trying to wriggle out of the basic result, note that doing so is not plausible from the outset. We need to make the curve of growth of the largest progenitors “look like” the monolithic model. They shouldn’t, by construction, so everything that follows is a fudge to try to avoid the obvious conclusion. But this sort of fudging has been done so many times before in so many ways (the “Frenk Principle” was coined nearly thirty years ago) that many scientists in the field have known nothing else. They seem to think that this is how science is supposed to work. This in turn feeds a convenient attitude that evades the duty to acknowledge that a theory is in trouble when it persistently has to be adjusted to make itself look like a competitor.

That noted, let’s wriggle!

Observational dodges

The first dodge is denial: somehow the JWST data are wrong or misleading. Early on, there were plausible concerns about the validity of some (some) photometric redshifts. There are enough spectroscopic redshifts now that this point is moot.

A related concern is that we “got lucky” with where we pointed JWST to start with, and the results so far are not typical of the universe at large. This is not quite as crazy as it sounds: the field of view of JWST is tiny, so there is no guarantee that the first snapshot will be representative. Moreover, a number of the first pointings intentionally targeted rich fields containing massive clusters, i.e., regions known to be atypical. However, as observations have accumulated, I have seen no indications of a reversal of our first impression, but rather lots of corroboration. So this hedge also now borders on reality denial.

A third observational concern that we worried a lot about in Franck & McGaugh (2017) is contamination by active galactic nuclei (AGN). Luminosity produced by accretion onto supermassive black holes (e.g., quasars) was more common in the early universe. Perhaps some of the light we are attributing to stars is actually produced by AGN. That’s a real concern, but long story short, AGN contamination isn’t enough to explain everything else away. Indeed, the AGN themselves are a problem in their own right: how do we make the supermassive black holes that power AGN so rapidly that they appear already in the early universe? Like the galaxies they inhabit, the black holes that power AGN should take a long time to assemble in the absence of the heavy seeds naturally provided by MOND but not dark matter.

An evergreen concern in astronomy is extinction by dust. Dust could play a role (Ferrara et al. 2023), but this would be a weird effect for it to have. Dust is made by stars, so we naively expect it to build up along with them. In order to explain high redshift JWST data with dust we have to do the opposite: make a lot of dust very early without a lot of stars, then eject it systematically from galaxies so that the net extinction declines with time – a galactic reveal sort of like a cosmic version of the dance of the seven veils. The rate of ejection for all galaxies must necessarily be fine-tuned to balance the barely evolving UV luminosity function with the rapidly evolving dark matter halo mass function. This evolution of the extinction has to coordinate with the dark matter evolution over a rather small window of cosmic time, there being only ∼10⁸ yr between z = 14 and 11. This seems like an implausible way to explain an unchanging luminosity density, which is more naturally explained by simply having stars form and be there for their natural lifetimes.

**Figure 5** from McGaugh et al. (2024): The UV luminosity function (left) observed by Donnan et al. (2024; points) compared to that predicted for ΛCDM by Yung et al. (2023; lines) as a function of redshift. Lines and points are color coded by redshift, with dark blue, light blue, green, orange, and red corresponding to z = 9, 10, 11, 12, and 14, respectively. There is a clear excess in the number density of galaxies that becomes more pronounced with redshift, ranging from a factor of ∼2 at z = 9 to an order of magnitude at z ≥ 11 (right). This excess occurs because the predicted number of sources declines with redshift while the observed numbers remain nearly constant with the data at z = 9, 10, and 11being right on top of each other.

The basic observation is that there is too much UV light produced by galaxies at all redshifts z > 9. What we’d rather have is the stellar mass function. JWST was designed to see optical light at the redshift of galaxy formation, but the universe surprised us and formed so many stars so early that we are stuck making inferences with the UV anyway. The relation of UV light to mass is dodgy, providing a knob to twist. So up next is the physics of light production.

In our discussion to this point, we have assumed that we know how to compute the luminosity evolution of a stellar population given a prescription for its star formation history. This is no small feat. This subject has a rich history with plenty of ups and downs, like most of astronomy. I’m not going to attempt to review all that here. I think we have this figured out well enough to do what we need to do for the purposes of our discussion here, but there are some obvious knobs to turn, so let’s turn ’em.

Blame the stars!

As noted above, we predict mass but observe light. So the program now is to squeeze more light out of less mass. Early dark matter halos too small? No problem; just make them brighter. More specifically, we need to make models in which the small dark matter halos that form first are better at producing photons from the small amount of baryons that they possess than are their low-redshift descendants. We have observational constraints on the latter; local star formation is inefficient, but maybe that wasn’t always the case. So the first obvious thing to try is to make star formation more efficient.

Super Efficient Star Formation

First, note that stellar populations evolve pretty much as we expect for stars, so this is a bit tricky. We have to retain the evolution we understand well for most of cosmic time while giving a big boost at early times. One way to do that is to have two distinct modes of star formation: the one we think of as normal that persists to this day, and an additional mode of super-efficient star formation (SEFS) at play in the early universe. This way we retain the usual results while potentially giving us the extra boost that we need to explain the JWST data. We argue that this is the least implausible path to preserving LCDM. We’re trying to make it work, and anticipate the arguments Dr. Z would make.

This SESF mode of star formation needs to be very efficient indeed, as there are galaxies that appear to have converted essentially all of their available baryons into stars. Let’s pause to observe that this is pretty silly. Space is very empty; it is hard to get enough mass together to form stars at all: there’s good reason that it is inefficient locally! The early universe is a bit denser by virtue of being smaller; at z = 9 the expansion factor is only 1/(1+z) = 0.1 of what it is now, so the density is (1+z)³ = 1,000 times greater. ON AVERAGE. That’s not really a big boost when it comes to forming structures like stars since the initial condition was extraordinarily uniform. The lack of early structure by far outweighs the difference in density; that is precisely why we’re having a problem. Still, I can at least imagine that there are regions that experience a cascade of violent relaxation and SESF once some threshold in gas density is exceeded that differentiates the normal model of star formation from SESF. Why a threshold in the gas? Because there’s not anything obvious in the dark matter picture to distinguish the galaxies that result from one or the other mode. CDM itself is scale free, after all, so we have to imagine a scale set by baryons that funnels protogalaxies into one mode or the other. Why, physically, is there a particular gas density that makes that happen? That’s a great question.

There have been observational indications that local star formation is related to a gas surface density threshold, so maybe there’s another threshold that kicks it up another notch. That’s just a plausibility argument, but that’s the straw I’m clutching at to justify SESF as the least implausible option. We know there’s at least one way in which a surface density scale might matter to star formation.

Writing out the (1+z)³ argument for the density above tickled the memory that I’d seen something similar claimed elsewhere. Looking it up, indeed Boylan-Kolchin (2024) does this, getting an extra (1+z)³ [for a total of (1+z)⁶] by invoking a surface density Σ that follows from an acceleration scale g: Σ=g/(πG). Very MONDish, that. At any rate, the extra boost is claimed to lift a corner of dark matter halo parameter space into the realm of viability. So, sure. Why not make that step two.

However we do it, making stars super-efficiently is what the data appear to require – if we confine our consideration to the mass predicted by LCDM. It’s a way of covering the lack of mass with an surplus of stars. Any mechanism that makes stars more efficiently will boost the dotted lines in the M_*-z diagram above in the right direction. Do they map into the data (and the monolithic model) as needed? Unclear! All we’ve done so far is offer plausibility arguments that maybe it could be so, not demonstrate a model that works without fine-tuning that woulda coulda shoulda made the right prediction in the first place.

The ideas become less plausible from here.

Blame the IMF!

The next obvious idea after making more stars in total is to just make more of the high mass stars that produce UV photons. The IMF is a classic boogeyman to accomplish this. I discussed this briefly before, and it came up in a related discussion in which it was suggested that “in the end what will probably happen is that the IMF will be found to be highly redshift dependent.”

OK, so, first, what is the IMF? The Initial Mass Function is the spectrum of masses with which stars form: how many stars of each mass, ranging from the brown dwarf limit (0.08 M_☉) to the most massive stars formed (around 100 M_☉). The numbers of stars formed in any star forming event is a strong function of mass: low mass stars are common, high mass stars are rare. Here, though, is the rub: integrating over the whole population, low mass stars contain most of the mass, but high mass stars produce most of the light. This makes the conversion of mass to light quite sensitive to the IMF.

The number of UV photons produced by a stellar population is especially sensitive to the IMF as only the most massive and short-lived O and B stars produce them. This is low-hanging fruit for the desperate theorist: just a few more of those UV-bright, short-lived stars, please! If we adjust the IMF to produce more of these high mass stars, then they crank out lots more UV photons (which goes in the direction we need) but they don’t contribute much to the total mass. Better yet, they don’t live long. They’re like icicles as murder weapons in mystery stories: they do their damage then melt away, leaving no further evidence. (Strictly speaking that’s not true: they leave corpses in the form of neutron stars or stellar mass black holes, but those are practically invisible. They also explode as supernovae, boosting the production of metals, but the amount is uncertain enough to get away with murder.)

There is a good plausibility argument for a variable IMF. To form a star, gravity has to overcome gas pressure to induce collapse. Gas pressure depends on temperature, and interstellar gas can cool more efficiently when it contains some metals (here I mean metals in the astronomy sense, which is everything in the periodic table that’s not hydrogen or helium). It doesn’t take much; a little oxygen (one of the first products of supernova explosions) goes a long way to make cooling more efficient than a primordial gas composed of only hydrogen and helium. Consequently, low metallicity regions have higher gas temperatures, so it makes sense that gas clouds would need more gravity to collapse, leading to higher mass stars. The early universe started with zero metals, and it takes time for stars to make them and to return them to the interstellar medium, so voila: metallicity varies with time so the IMF varies with redshift.

This sound physical argument is simple enough to make that it can be done in a small part of a blog post. This has helped it persist in our collective astronomical awareness for many decades. Unfortunately, it appears to have bugger-all to do with reality.

If metalliticy plays a strong role in determining the IMF, we would expect to see it in stellar populations of different metallicity. We measure the IMF for solar metallicity stars in the solar neighborhood. Globular clusters are composed of stars formed shortly after the Big Bang and have low metallicities. So following this line of argument, we anticipate that they would have a different IMF. There is no evidence that this is the case. Still, we only really need to tweak the high-mass end of the IMF, and those stars died a long time ago, so maybe this argument applies for them if not for the long-lived, low-mass stars that we observe today.

In addition to counting individual stars, we can get a constraint on the galaxy-wide average IMF from the scatter in the Tully-Fisher relation. The physical relation depends on mass, but we rely on light to trace that. So if the IMF varies wildly from galaxy to galaxy, it will induce scatter in Tully-Fisher. This is not observed; the amount of intrinsic scatter that we see is consistent with that expected for stochastic variations in the star formation history for a fixed IMF. That’s a pretty strong constraint, as it doesn’t take much variation in the IMF to cause a lot of scatter that we don’t see. This constraint applies to entire galaxies, so it tolerates variations in the IMF in individual star forming events, but whatever is setting the IMF apparently tends to the same result when averaged over the many star forming events it takes to build a galaxy.

Variation in the IMF has come up repeatedly over the years because it provides so much convenient flexibility. Early in my career, it was commonly invoked to explain the variation in spectral hardness with metallicity. If one looks at the spectra of HII regions (interstellar gas ionized by hot young stars), there is a trend for lower metallicity HII regions to be ionized by hotter stars. The argument above was invoked: clearly the IMF tended to have more high mass stars in low metallicity environments. However, the light emitted by stars also depends on metallicity; low metallicity stars are bluer than their high metallicity equivalents because there are few UV absorption lines from iron in their atmospheres. Taking care to treat the stars and interstellar gas self-consistentlty and integrating over a fixed IMF, I showed that the observed variation in spectral hardness was entirely explained by the variation in metallicity. There didn’t need to be more high mass stars in low metallicity regions, the stars were just hotter because that’s what happens in low metallicity stars. (I didn’t set out to do this; I was just trying to calibrate an abundance indicator that I would need for my thesis.)

Another example where excess high mass stars were invoked was to explain the apparently high optical depth to the surface of last scattering reported by WMAP. If those words don’t mean anything to you, don’t worry – all it means is that a couple of decades ago, we thought we needed lots more UV photons at high redshift (z ~ 17) than CDM naturally provided. The solution was, you guessed it, an IMF rich in high mass stars. Indeed, this result launched a thousand papers on supermassive Population III stars that didn’t pan out for reasons that were easily anticipated at the time. Nowadays, analysis to the Planck data suggest a much lower optical depth than initially inferred by WMAP, but JWST is observing too many UV photons at high redshift to remain consistent with Plank. This apparent tension for LCDM is a natural consequence of early structure formation in MOND; indeed, it is another thing that was specifically predicted (see section 3.1 of McGaugh 2004).

I relate all these stories of encounters with variations in the high mass end of the IMF because they’ve never once panned out. Maybe this time will be different.

Stochastic Star Formation

What else can we think up? There’s always another possibility. It’s a big universe, after all.

One suggestion I haven’t discussed yet is that high redshift galaxies appear overly bright from stochastic fluctuations in their early star formation. This again invokes the dubious relation between stellar mass and UV light, but in a more subtle way than simply stocking the IMF with a bunch more high mass stars. Instead, it notes that the instantaneous star formation rate is stochastic. The massive stars that produces all the UV light are short-lived, so the number present will fluctuate up and down. Over time, this averages out, but there hasn’t been much time yet in the early universe. So maybe the high redshift galaxies that seem to be over-luminous are just those that happen to be near a peak in the ups and downs of star formation. Galaxies will be brightest and most noticeable in this peak phase, so the real mass is less than it appears – albeit there must be a lot of galaxies in the off phase for every one that we see in the on phase.

One expects a lot of scatter in the inferred stellar mass in the early universe due to stochastic variations in the star formation rate. As time goes on, these average out and the inferred stellar mass becomes steady. That’s pretty much what is observed (data). The data track the monolithic model (purple line) and sometimes exceed it in the early, stochastic phase. The data bear no resemblance to hierarchical LCDM (orange line).

This makes a lot of sense to me. Indeed, it should happen at some level, especially in the chaotic early universe. It is also what I infer to be going on to explain why some measurements scatter above the monolithic line. That is the baseline star formation history for this population, with some scatter up and down at early times. Simply scattering from the orange LCDM line isn’t going to look like the purple monolithic line. The shape is wrong and the amplitude difference is too great to overcome in this fashion.

What else?

I’m sure we’ll come up with something, but I think I’ve covered everything I’ve heard so far. Indeed, most of these possibilities are obvious enough that I thought them up myself and wrote about them in McGaugh et al (2024). I don’t see anything in the wide-ranging discussion at KITP that wasn’t already in my paper.

I note this because I want to point out that we are following a well-worn script. This is the part where I tick off all the possibilities for more complicated LCDM models and point out their shortcomings. I expect the same response:

That’s too long to read. Dr. Z says it works, so he must be right since we already know that LCDM is correct.
Triton Station, 8 February 2022

People will argue about which of these auxiliary hypotheses is preferable. MOND is not an auxiliary hypothesis, but an entirely different paradigm, so it won’t be part of the discussion. After some debate, one of the auxiliaries (SESF not IMF!) will be adopted as the “standard” picture. This will be repeated until it becomes familiar, and once it is familiar it will seem that it was always so, and then people will assert that there was never a problem, indeed, that we expected it all along. This self-gaslighting reminds me of Feynman’s warning:

The first principle is that you must not fool yourself and you are the easiest person to fool.
Richard Feynman

What is persistently lacking in the community is any willingness to acknowledge, let alone engage with, the deeper question of why we have to keep invoking ad hoc patches to somehow match what MOND correctly predicted a priori. The sociology of invoking arbitrary auxiliary hypotheses to make these sorts of excuses for LCDM has been so consistently on display for so long that I wrote this parody a year ago:

It always seems to come down to special pleading:

Please don’t falsify LCDM! I ran out of computer time. I had a disk crash. I didn’t have a grant for supercomputer time. My simulation data didn’t come back from the processing center. A senior colleague insisted on a rewrite. Someone stole my laptop. There was an earthquake, a terrible flood, locusts! It wasn’t my fault! I swear to God!

And the community loves LCDM, so we fall for it every time.

PS – to appreciate the paraphrased quotes here, you need to hear it as it would be spoken by the pictured actors. So if you do not instantly recognize this scene from the Blues Brothers, you need to correct this shortcoming in your cultural education to get the full effect of the reference.

Old galaxies in the early universe

Continuing our discussion of galaxy formation and evolution in the age of JWST, we saw previously that there appears to be a population of galaxies that grew rapidly in the early universe, attaining stellar masses like those expected in a traditional monolithic model for a giant elliptical galaxy rather than a conventional hierarchical model that builds up gradually through many mergers. The formation of galaxies at incredibly high redshift, z > 10, implies the existence of a descendant population at intermediate redshift, 3 < z < 4, at which point they should have mature stellar populations. These galaxies should not only be massive, they should also have the spectral characteristics of old stellar populations – old, at least, for how old the universe itself is at this point.

*Theoretical predictions from* **Fig. 1** of McGaugh et al (2024) *combined with the data of* **Fig. 4**. The data follow the track of a monolithic model that forms early as a single galaxy rather than that of the largest progenitor of the hierarchical build-up expected in LCDM.

The data follow the track of stellar mass growth for an early-forming monolithic model. Do the ages of stars also look like that?

Here is a recent JWST spectrum published by de Graff et al. (2024). This appeared too recently for us to have cited in our paper, but it is a great example of what we’re talking about. This is an incredibly gorgeous spectrum of a galaxy at z = 4.9 when the universe was 1.2 Gyr old.

**Fig. 1** from de Graff et al. (2024): *JWST/NIRSpec PRISM spectrum (black line) of the massive quiescent galaxy RUBIES-EGS-QG-1 at a redshift of z = 4.8976.*

It is challenging to refrain from nerding out at great length over many of the details on display here. First, it is an incredible technical achievement. I’ve seen worse spectra of local galaxies. JWST was built to obtain images and spectra of galaxies so distant they approach the horizon of the observable universe. Its cameras are sensitive to the infrared part of the spectrum in order to capture familiar optical features that have been redshifted by a huge factor (compare the upper and lower x-axes). The telescope itself was launched into space well beyond the obscuring atmosphere of the earth, pointed precisely at a tiny, faint flicker of light in a vast, empty universe, captured photons that had been traveling for billions of years, and transmitted the data to Earth. That this is possible, and works, is an amazing feat of science, engineering, and societal commitment (it wasn’t exactly cheap).

In the raw 2D spectrum (at top) I can see by eye the basic features in the extracted, 1D spectrum (bottom). This is a useful and convincing reality check to an experienced observer even if at first glance it looks like a bug splot smeared by a windshield wiper. The essential result is apparent to the eye; the subsequent analysis simply fills in the precise numbers.

Looking from right to left, the spectrum runs from red to blue. It ramps up then crashes down around an observed wavelength of 2.3 microns. This is the 4000 Å break in the rest frame, a prominent feature of aging stellar populations. The amount of blue-to-red ramp-up and the subsequent depth of drop is a powerful diagnostic of stellar age.

In addition to the 4000 Å break, a number of prominent spectral lines are apparent. In particular, the Balmer absorption lines Hβ, Hγ, and Hδ are clear and deep. These are produced by A stars, which dominate the light of a stellar population after a few hundred million years. There’s the answer right there: the universe is only 1.2 Gyr old at this point, and the stars dominating the light aren’t much younger.

There are also some emission lines. These can be the sign of on-going star formation or an active galactic nucleus powered by a supermassive black hole. The authors attribute these to the latter, inferring that the star formation happened fast and furious early on, then basically stopped. That’s important to the rest of the spectrum; A stars only dominate for a while, and their lines are not so prominent if a population keeps making new stars. So this galaxy made a lot of stars, made them fast, then basically stopped. That is exactly the classical picture of a monolithic giant elliptical.

Here is the star formation history that de Graff et al. (2024) infer:

**Fig. 2** from de Graff et al. (2024): the star formation rate (top) and accumulated stellar mass (bottom) as a function of cosmic time (only the first 1.2 Gyr are shown). Results for stellar populations of two metallicities are shown (purple or blue lines). This affects the timing of the onset of star formation, but once going, an enormous mass of stars forms fast, in ~200 Myr.

There are all sorts of caveats about population modeling, but it is very hard to avoid the basic conclusion that lots of stars were assembled with incredible speed. A stellar mass a bit in excess of that of the Milky Way appears in the time it takes for the sun to orbit once. That number need not be exactly right to see that this is not a the gradual, linear, hierarchical assembly predicted by LCDM. The typical galaxy in LCDM is predicted to take ~7 Gyr to assemble half its stellar mass, not 0.1 Gyr. It’s as if the entire mass collapsed rapidly and experienced an intense burst of star formation during violent relaxation (Lynden-Bell 1967).

Collapse of shells within shells to form a massive galaxy rapidly in MOND (Sanders 2008). Note that the inner shells (inset) where most of the stars will be collapse even more rapidly than the overall monolith (dotted line).

Where MOND provides a natural explanation for this observation, the fiducial population model of de Graff et al. violates the LCDM baryon limit: there are more stars than there are baryons to make them from. It should be impossible to veer into the orange region above as the inferred star formation history does. The obvious solution is to adopt a higher metallicity (the blue model) even if that is a worse fit to the spectrum. Indeed, I find it hard to believe that so many stars could be made in such a small region of space without drastically increasing their metallicity, so there are surely things still to be worked out. But before we engage in too much excuse-making for the standard model, note that the orange region represents a double-impossibility. First, the star formation efficiency is 100%. Second, this is for an exceptionally rare, massive dark matter halo. The chances of spotting such an object in the area so far surveyed by JWST is small. So we not only need to convert all the baryons into stars, we also need to luck into seeing it happen in a halo so massive that it probably shouldn’t be there. And in the strictist reading, there still aren’t enough baryons. Does that look right to you?

Do these colors look right to you? Getting the color right is what stellar population modeling is all about.

OK, so I got carried away nerding out about this one object. There are other examples. Indeed, there are enough now to call them a population of old and massive quiescent galaxies at 3 < z < 4. These have the properties expected for the descendants of massive galaxies that form at z > 10.

Nanayakkara et al. (2024) model spectra for a dozen such galaxies. The spectra provide an estimate of the stellar mass at the redshift of observation. They also imply a star formation history from which we can estimate the age/redshift at which the galaxy had formed half of those stars, and when it quenched (stopped forming stars, or in practice here, when the 90% mark had been reached). There are, of course, large uncertainties in the modeling, but it is again hard to avoid the conclusion that lots of stars were formed early.

**Figure 7** from McGaugh et al. (2024): The stellar masses of quiescent galaxies from Nanayakkara et al. (2024). The inferred growth of stellar mass is shown for several cases, marking the time when half the stars were present (small green circles) to the quenching time when 90% of the stars were present (midsize orange circles) to the epoch of observation (large red circles). Illustrative star formation histories are shown as dotted lines with the time of formation t_i and the quenching timescale τ noted in Gyr. We omit the remaining lines for clarity, as many cross. There is a wide distribution of formation times from very early (t_i = 0.2 Gyr) to relatively late (>1 Gyr), but all of the galaxies in this sample are inferred to build their stellar mass rapidly and quench early (τ < 0.5 Gyr).

The dotted lines above are models I constructed in the spirit of monolithic models. The particular details aren’t important, but the inferred timescales are. To put galaxies in this part of the stellar mass-redshift plane, they have to start forming early (typically in the first billion years), form stars at a prolific rate, then quench rapidly (typically with e-folding timescales < 1 Gyr). I wouldn’t say any of these numbers are particularly well-measured, but they are indicative.

What is missing from this plot is the LCDM prediction. That’s not because I omitted it, it’s because the prediction for typical L* galaxies doesn’t fall within the plot limits. LCDM does not predict that typical galaxies should become this massive this early. I emphasize typical because there is always scatter, and some galaxies will grow ahead of the typical rate.

Not only are the observed galaxies massive, they have mature stellar populations that are pretty much done forming stars. This will sound normal to anyone who has studied the stellar populations of giant elliptical galaxies. But what does LCDM predict?

I searched through the Illustris TNG50 and TNG300 simulations for objects at redshift 3 that had stellar masses in the same range as the galaxies observed by Nanayakkara et al. (2024). The choice of z = 3 is constrained by the simulation output, which comes in increments of the expansion factor. To compare to real galaxies at 3 < z < 4 one can either look at the snapshot at z = 4 or the one at z = 3. I chose z = 3 to be conservative; this gives the simulation the maximum amount of time to produce quenched, massive galaxies.

These simulations do indeed produce some objects of the appropriate stellar mass. These are rare, as they are early adopters: galaxies that got big quicker than is typical. However, they are not quenched as observed: the simulated objects are still on the star forming main sequence (the correlation between star formation rate and stellar mass). The distribution of simulated objects does not appear to encompass that of real galaxies.

**Figure 8** from McGaugh et al. (2024): The stellar masses and star formation rates of galaxies from Nanayakkara et al. (2024; red symbols). Downward-pointing triangles are upper limits; some of these fall well below the edge of the plot and so are illustrated as the line of points along the bottom. Also shown are objects selected from the TNG50 (Pillepich et al. 2019; filled squares) and TNG300 (Pillepich et al. 2018; open squares) simulations at z = 3 to cover the same range of stellar mass. Unlike the observed galaxies, simulated objects with stellar masses comparable to real galaxies are mostly forming stars at a rapid pace. In the higher-resolution TNG50, none have quenched as observed.

If we want to hedge, we can note that TNG300 has a few objects that are kinda in the right ballpark. That’s a bit misleading, as the data are mostly upper limits. Moreover, these are the rare objects among a set of objects selected to be rare: it isn’t a resounding success if we have to scrape the bottom of the simulated barrel after cherry-picking which barrel. Worse, these few semi-quenched simulated objects are not present in TNG50. TNG50 is the higher resolution simulation, so presumably provides a better handle on the star formation in individual objects. It is conceivable that TNG300 “wins” by virtue of its larger volume, but that’s just saying we have more space in which to discover very rare entities. The prediction is that massive, quenched galaxies should be exceedingly rare, but in the real universe they seem mundane.

That said, I don’t think this problem is fundamental. Hierarchical assembly is still ongoing at this epoch, bringing with it merger-induced star formation. There’s an easy fix for that: change the star formation prescription. Instead of “wet” mergers with gas that can turn into stars, we just need to form all the stars already early on so that the subsequent mergers are “dry” – at least, for those mergers that build this particular population. One winds up needing a new and different mode of star formation. In addition to what we observe locally, there needs to be a separate mode of super-efficient star formation that somehow turns all of the available baryons into stars as soon as possible. That’s basically what I advocate as the least unreasonable possibility for LCDM in our paper. This is a necessary but not sufficient condition; these early stellar nuggets also need to assemble speedy quick to make really big galaxies. While it is straightforward to mess with the star formation prescription in models (if not in nature), the merger trees dictating the assembly history are less flexible.

Putting all the data together in a single figure, we can get a sense for the evolutionary trajectory of the growth of stellar mass in galaxies across cosmic time. This figure extends from the earliest galaxies so-far known at z ~ 14 when the universe was just a few hundred million years old (of order on orbital time in a mature galaxy) to the present over thirteen billion years later. In addition to data discussed previously, it also shows recent data with spectroscopic redshifts from JWST. This is important, as the sense of the figure doesn’t change if we throw away all the photometric redshifts, it just gets a little sparse around z ~ 8.

**Figure 10** from McGaugh et al. (2024): The data from Figures 4 and 6 shown together using the same symbols. Additional JWST data with spectroscopic redshifts are shown from Xiao et al. (2023; green triangles) and Carnall et al. (2024). The data of Carnall et al. (2024) distinguish between star-forming galaxies (small blue circles) and quiescent galaxies (red squares); the latter are in good agreement with the typical stellar mass determined from Schechter fits in clusters (large circles). The dashed red lines show the median growth predicted by the Illustris ΛCDM simulation (Rodriguez-Gomez et al. 2016) for model galaxies that reach final stellar masses of M_* = 10¹⁰, 10¹¹, and 10¹² M_☉. The solid lines show monolithic models with a final stellar mass of 9 x 10¹⁰ M_☉ and t_i = τ = 0.3, 0.4, and 0.5 Gyr, as might be appropriate for giant elliptical galaxies. The dotted line shows a model appropriate to a monolithic spiral galaxy with t_i = 0.5 and τ = 13.5 Gyr.

The solid lines are monolithic models we built to represent classical giant elliptical galaxies that form early and quench rapidly. These capture nicely the upper envelope of the data. They form most of their stars at z > 4, producing appropriately old populations at lower redshifts. The individual galaxy data merge smoothly into those for typical galaxies in clusters.

The LCDM prediction as represented by the Illustris suite of simulations is shown as the dashed red lines for objects of several final masses. These are nearly linear in log(M_*)-linear z space. Objects that end up with a typical L* elliptical galaxy mass at z = 0 deviate from the data almost immediately at z > 1. They disappear above z > 6 as the largest progenitors become tiny.

What can we do to fix this? Massive galaxies get a head start, as it were, by being massive at all epochs. But the shape of the evolutionary trajectory remains wrong. The top red line (for a final stellar masses of 10¹² M_☉) corresponds to a typical galaxy at z ~ 2, but it continues to grow to be atypical locally. The data don’t do that. Even with this boost, the largest progenitor is still predicted to be too small at z > 3 where there are now many examples of massive, quiescent galaxies – known both from JWST observations and from Jay Franck’s thesis before it. Again, the distribution of the data do not look like the predictions of LCDM.

One can abandon Illustris as the exemplar of LCDM, but it doesn’t really help. Other models show similar things, differing only in minor details. That’s because the issue is the mass assembly history they all share, not the details of the star formation. The challenge now is to tweak models to make them look more monolithic; i.e., change those red dashed lines into the solid black lines. One will need super-efficient star formation, if it is even possible. I’ll leave discussion of this and other obvious fudges to a future post.

Finally, note that there are a bunch of galaxies with JWST spectroscopic redshifts from 3 < z < 4 that are not exceptionally high mass (the small blue points). These are expected in any paradigm. They can be galaxies that are intrinsically low mass and won’t grow much further, or galaxies that may still grow a lot, just with a longer fuse on their star formation timescale. Such objects are ubiquitous in the local universe as spiral and irregular galaxies. Their location in the diagram above is consistent with the LCDM predictions, but is also readily explained by monolithic models with long star formation timescales. The dotted line shows a monolithic model that forms early (t_i = 0.5) but converts gas into stars gradually (τ = 13.5 Gyr rather than < 1 Gyr). This is a boilerplate model for a spiral that has been around for as long as the short-τ model for giant ellipticals. So while these lower mass galaxies exist, their location in the M_*-z plane doesn’t really add much to this discussion as yet. It is the massive galaxies that form early and become quiescent rapidly that most challenge LCDM.

Measuring the growth of the stellar mass of galaxies over cosmic time

This post continues the series summarizing our ApJ paper on high redshift galaxies. To keep it finite, I will focus here on the growth of stellar mass. The earlier post discussed what we expect in theory. This depends both on mass assembly (slow in LCDM, fast in MOND), how the assembled mass is converted into stars, and how those stars shine in light we can detect. We know a lot about stars and their evolution, so for this post I will assume we know how to convert a given star formation history into the evolution of the light it produces. There are of course caveats to that which we discuss in the paper, and perhaps will get to in a future post. It’s exhausting to be exhaustive, so not today, Satan.

The principle assumption we are obliged to make, at least to start, is that light traces mass. As mass assembles, some of it turns into stars, and those stars produce light. The astrophysics of stars and the light they produce is the same in any structure formation theory, so with this basic assumption, we can test the build-up of mass. In another post we will discuss some of the ways in which we might break this obvious assumption in order to save a favored theory. For now, we assume the obvious assumption holds, and what we see at high redshift provides a picture of how mass assembles.

Before JWST

This is not a new project; people have been doing it fo for decades. We like to think in terms of individual galaxies, but there are lots out there, so an important concept is the luminosity function, which describes the number of galaxies as a function of how bright they are. Here are some examples:

**Figure 3.** from Franck & McGaugh (2017) showing the number of galaxies as a function of their brightness in the 4.5 micron band of the Spitzer Space Telescope in candidate protoclusters from z = 2 to 6. Each panel notes the number of galaxies contributing to the Schechter luminosity function⁺ fit (gray bands), the apparent magnitude m* corresponding to the typical luminosity L*, and the redshift range. The magnitude m* is *characteristic* of how bright typical galaxies are at each redshift.

One reason to construct these luminosity functions is to quantify what is typical. Hundreds of galaxies inform each fit. The luminosity L* is representative of the typical galaxy, not just anecdotal individual examples. At each redshift, L* corresponds to an observed apparent magnitude m*, which we plot here:

**Figure 3** from McGaugh et al. (2024): The redshift dependence of the Spitzer [4.5] apparent magnitude m* of Schechter function fits to populations of galaxies in clusters and candidate protoclusters; each point represents *the characteristic* brightness of the galaxies in each cluster. The apparent brightness of galaxies gets fainter with increasing redshift because galaxies are more distant, with the amount they dim depending also on their evolution (lines). The purple line is the monolithic exponential model we discussed last time. The orange line is the prediction of the Millennium simulation (the state of the art at the time Jay Franck wrote his thesis) and the Munich galaxy formation model based on it. The open squares are the result of applying the same algorithm to the simulation as used on the data; this is what we would have observed if the universe looked like LCDM as depicted by the Munich model. The real universe does not look like that.

We plot faint to bright going up the y-axis; the numbers get smaller because of the backwards definition of the magnitude scale (which dates to ancient times in which the stars that appeared brightest to the human eye were “of the first magnitude,” then the next brightest of the second magnitude, and so on). The x-axis shows redshift. The top axis shows the corresponding age of the universe for vanilla LCDM parameters. Each point shows the apparent magnitude that is typical as informed by observations of dozens to hundreds of individual galaxies. Each galaxy has a spectroscopic redshift, which we made a requirement for inclusion in the sample. These are very accurate; no photometric redshifts are used to make the plot above.

One thing that impressed me when Jay made the initial version of this plot is how well the models match the evolution of m* at z < 2, which is most of cosmic time (the past ten billion years). This encourages one that the assumption adopted above, that we understand the evolution of stars well enough to do this, might actually be correct. I was, and remain, especially impressed with how well the monolithic model with a simple exponential star formation history matches these data. It’s as if the inferences the community had made about the evolution of giant elliptical galaxies from local observations were correct.

The new thing that Jay’s work showed was that the evolution of typical cluster galaxies at z > 2 persists in tracking the monolithic model that formed early (z_f = 10). There is a lot of scatter in the higher redshift data even though there is little at lower redshift. This is to be expected for both observational reasons – the data get rattier at larger distances – and theoretical ones: the exponential star formation history we assume is at best a crude average; at early times when short-lived but bright massive stars are present there will inevitably be stochastic variation around this trend. At later times the law of averages takes over and the scatter should settle down. That’s pretty much what we see.

What we don’t see is the decline in typical brightness predicted by contemporaneous LCDM models. The specific example shown is the Munich galaxy formation model based on the Millennium simulation. However, the prediction is generic: galaxies get faint at high redshift because they haven’t finished assembling yet. This is not a problem of misunderstanding stellar evolution, it is a failure of the hierarchical assembly paradigm.

In order to identify [proto]clusters at high redshift, Jay devised an algorithm to identify galaxies in close proximity on the sky and in redshift space, in excess of the average density around them. One question we had was whether the trend predicted by the LCDM model (the orange line above) would be reproduced in the data when analyzed in this way. To check, Jay made mock observations of a simulated lookback cone using the same algorithm. The results (not previously published) are the open squares in the plot above. These track the “right” answer known directly in the form of the orange line. Consequently, if the universe had looked as predicted, we could tell. It doesn’t.

The above plot is in terms of apparent magnitude. It is interesting to turn this into the corresponding stellar mass. There has also been work done on the subject after Jay’s, so I wanted to include it. An early version of a plot mapping m* to stellar mass and redshift to cosmic time that I came up with was this:

*The stellar mass of L* galaxies as a function of cosmic age. Data as noted in the inset. The purple/orange lines represent the monolithic/hierarchical models, as above.*

The more recent data (which also predate JWST) follow the same trend as the preceding data. All the data follow the path of the monolithic model. Note that the bulk of the stars are formed in situ in the first few billion years; the stellar mass barely changes after that. There is quite a bit of stellar evolution during this time, which is why m* in the figure above changes in a complicated fashion while the stellar mass remains constant. This again provides some encouragement that we understand how to model stellar populations.

The data in the first billion years are not entirely self-consistent. For example, the yellow points are rather higher in mass than the cyan points. This difference is not one in population modeling, but rather in how much of a correction is made for non-stellar, nebular emission. So as not to go down that rabbit hole, I chose to adopt the lowest stellar mass estimates for the figure that appears in the paper (below). Note that this is the most conservative choice; I’m trying to be as favorable to LCDM as is reasonably plausible.

**Figure 4** from McGaugh et al. (2024): *The characteristic stellar mass as a function of time with the corresponding redshift noted at the top.*

There were more recent models as well as more recent data, so I wanted to include those. There are, in fact, way too many models to illustrate without creating a confusing forest of lines, so in the end I chose a couple of popular ones, Illustris and FIRE. Illustris is the descendant of Millennium, and shows identical behavior. FIRE has a different scheme for forming stars, and does so more rapidly than Illustris. However, its predictions still fall well short of the data. This is because both simulations share the same LCDM cosmology with the same merger tree assembly of structure. Assembling the mass promptly enough is the problem; it isn’t simply a matter of making stars faster.

I’ll show one more version of this plot to illustrate the predicted evolutionary trajectories. In the plots above, I only show models that end up with the mass of a typical local giant elliptical. Galaxies come in a variety of masses, so what does that look like?

*The stellar mass of galaxies as a function of cosmic age. Data as above. The orange lines represent the hierarchical models that result in different final masses at z = 0.*

The curves of stellar growth predicted by LCDM have pretty much the same shape, just different amplitude. The most massive case illustrated above is reasonable insofar as there are real galaxies that massive, but they are rare. They are also rare in simulations, which make the predicted curve a bit jagged as there aren’t enough examples to define a smooth trajectory as there are for lower mass objects. More importantly, the shape is wrong. One can imagine that the galaxies we see at high redshift are abnormally massive, but even the most massive galaxies don’t start out that big at high redshift. Moreover, they continue to grow hierarchically in LCDM, so they wind up too big. In contrast, the data look like the monolithic model that we made on a lark, no muss, no fuss, no need to adjust anything.

This really shouldn’t have come as a surprise. We already knew that galaxies were impossibly massive at z ~ 4 before JWST discovered that this was also true at z ~ 10. The a priori prediction that LCDM has made since its inception (earlier models show the same thing) fails. More recent models fail, though I have faith that they will eventually succeed. This is the path theorists has always taken, and the obvious path here, as I remarked previously, is to make star formation (or at least light production) artificially more efficient so that the hierarchical model looks like the monolithic model. For completeness, I indulge in this myself in the paper (section 6.3) as an exercise in what it takes to save the phenomenon.

A two year delay

Regular readers of this blog will recall that in addition to the predictions I emphasized when JWST was launched, I also made a number of posts about the JWST results as they started to come in back in 2022. I had also prepared the above as a science paper that is now sections 1 to 3 of McGaugh et al. (2024). The idea was to have it ready to go so I could add a brief section on the new JWST results and submit right away – back in 2022. The early results were much as expected, but I did not rush to publish. Instead, it has taken over two years since then to complete what turned into a much longer manuscript. There are many reasons for this, but the scientific reason is that I didn’t believe many of the initial reports.

JWST was new and exciting and people fell all over themselves to publish things quickly. Too quickly. To do so, they relied on a calibration of the telescope plus detector system made while it was on the ground prior to launch. This is not the same as calibrating it on the sky, which is essential but takes some time. Consequently, some of the initial estimates were off.

Stellar masses and redshifts of galaxies from Labbe et al. The pink squares are the initial estimates that appeared in their first preprint in July 2022. The black squares with error bars are from the version published in February 2023. The shaded regions represent where galaxies are too massive too early for LCDM. The lighter region is where galaxies shouldn’t exist; the darker region is a where they cannot exist.

In the example above, all of the galaxies had both their initial mass and redshift estimates change with the updated calibration. So I was right to be skeptical, and wait for an improved analysis. I was also right that while some cases would change, the basic interpretation would not. All that happened in the example above was that the galaxies moved from the “can’t exist in LCDM” region (dark blue) into the “really shouldn’t exist in LCDM” region (light blue). However, the widespread impression was that we couldn’t trust photometric redshifts at all, so I didn’t see what new I could justifiably add in 2022. This was, after all, the attitude Jay and I had taken in his CCPC survey where we required spectroscopic redshifts.

So I held off. But then it became impossible to keep up with the fire hose of data that ensued. Every time I got the chance to update the manuscript, I found some interesting new result had been published that I had to include. New things were being discovered faster than I could read the literature. I found myself stuck in the Red Queen’s dilemma, running as fast as possible just to stay in place.

Ultimately, I think the delay was worthwhile. Lots new was learned, and actual spectroscopic redshifts began to appear. (Spectroscopy takes more telescope time than photometry – spreading out the light reduces the signal-to-noise per pixel, necessitating longer exposure times, so it always lags behind. One also discovers the galaxies in the same images that are used for photometry, so it also gets a head start.) Consequently, there is a lot more in the paper than I had planned on. This is another long blog post, so I will end it where I had planned for the original paper to end, with the updated version of the plot above.

Massive galaxies at high redshift from JWST

The stellar masses of galaxies discovered by JWST as a function of redshift is shown below. Unlike most of the plots above, these are individual galaxies rather than typical L* galaxies. Many are based on photometric redshifts, but those in solid black have spectroscopic redshifts. There are many galaxies that reside in a region they should not, at least according to LCDM models: their mass is too large at the observed redshift.

**Figure 6** from McGaugh et al. (2024): Mass estimates for high-redshift galaxies from JWST. Colored points based on photometric redshifts are from Adams et al. (2023; dark blue triangles), Atek et al. (2023; green circles), Labbé et al. (2023; open squares), Naidu et al. (2022; open star), Harikane et al. (2023; yellow diamonds), Casey et al. (2024; light blue left-pointing triangles), and Robertson et al. (2024; orange right-pointing triangles). Black points from Wang et al. (2023; squares), Carniani et al. (2024; triangles), Harikane et al. (2024; circles) and Castellano et al. (2024; star) have spectroscopic redshifts. The upper limit for the most massive galaxy in TNG100 (Springel et al. 2018) as assessed by Keller et al. (2023) is shown by the light blue line. This is consistent with the maximum stellar mass expected from the stellar mass–halo mass relation of Behroozi et al. (2020; solid blue line). These merge smoothly into the trend predicted by Yung et al. (2019b) for galaxies with a space density of 10⁻⁵ dex⁻¹ Mpc⁻³ (dashed blue line), though L. Yung et al. (2023) have revised this upward by ∼0.4 dex (dotted blue line). This closely follows the most massive objects in TNG300 (Pillepich et al. 2018; red line). The light gray region represents the parameter space in which galaxies were not expected in LCDM. The dark gray area is excluded by the limit on the available baryon mass (Behroozi & Silk 2018; Boylan-Kolchin 2023). [Note added: I copied this from the caption in our paper, but the links all seem to go to that rather than to each of the cited papers. You can get to them from our reference list if you want, but it’ll take some extra clicks. It looks like AAS has set it up this way to combat trawling by bots.]

One can see what I mean about a fire hose of results from the number of references given here. Despite the challenges of keeping track of all this, I take heart in the fact that many different groups are finding similar results. Even the results that were initially wrong remain problematic for LCDM. Despite all the masses and redshifts changing when the calibration was updated, the bulk of the data (the white squares, which are the black squares in the preceding plot) remain in the problematic region. The same result is replicated many times over by others.

The challenge, as usual, is assessing what LCDM actually predicts. The entire region of this plot is well away from the region predicted for typical galaxies. To reside here, a galaxy must be an outlier. But how extreme an outlier?

The dark gray region is the no-go zone. This is where dark matter halos do not have enough baryons to make the observed mass of stars. It should be impossible for galaxies to be here. I can think of ways to get around this, but that’s material for a future post. For now, it suffices to know that there should be no galaxies in the dark gray region. Indeed, there are not. A few straddle the edge, but nothing is definitively in that region given the uncertainties. So LCDM is not outright falsified by these data. This bar is set very low, as the galaxies that do skirt the edge require that basically all of the available baryons have been converted into starts practically instantaneously. This is not a reasonable.

*Not with ten thousand simulations could you do this.*

So what is a reasonable expectation for this diagram? That’s hard to say, but that’s what the white and light gray region attempts to depict. Galaxies might plausibly be in the white region but should not be in the light gray region for any sensible star formation efficiency.

One problem with this statement is that it isn’t clear what a sensible star formation efficiency is. We have a good idea of what it needs to be, on average, at low redshift. There is no clear indication that it changes as a function of redshift – at least until we hit results like this. Then we have to be on guard for confirmation bias in which we simply make the star formation efficiency be what we need it to be. (This is essentially what I advocate as the least unreasonable option in section 6.3 of the ApJ paper.)

OK, but what should the limit be? Keller et al. (2023) made a meta-analysis of the available simulations; I have used his analysis and my own reading of the literature to establish the lower boundary of the light gray area. It is conceivable that you would get the occasional galaxy this massive (the white region is OK), but not more so (the light gray region is not OK). The boundary is the most extreme galaxy in each simulation, so as far from typical as possible. The light gray region is really not OK; the only question is where exactly it sets in.

The exact location of this boundary is not easy to define. Different simulations give different answers for different reasons. These are extremal statistics; we’re asking what the one most massive galaxy is in an entire simulation. Higher resolution simulations perceive the formation of small structures like galaxies sooner, but large simulations have more opportunity for extreme events to happen. Which “wins” in terms of making the rare big galaxy early is a competition between these effects that appears, in my reading, to depend on details of simulation implementation that are unlikely to be representative of physical reality (even assuming LCDM is the correct underlying physics).

To make my own assessment, I reviewed the accessible simulations (they don’t all provide the necessary information) to fine the very most massive simulated galaxy as a function of redshift. As ever, I am looking for the case that is most favorable to LCDM. The version I found comes from the large-box, next generation Illustris simulation TNG300. This is the red line a bit into the gray area above. Galaxies really, really should not exist above or to the right of that line. Not only have I adopted the most generous simulation estimate I could find, I have also chosen not to normalize to the area surveyed by JWST. One should do this, but the area so far surveyed is tiny, so the line slides down. Even if galaxies as massive as this exist in TNG300, we have to have been really lucky to point JWST at that spot on a first go. So the red line is doubly generous, and yet there are still galaxies that exceed this limit.

The bottom line is that yes, JWST data pose a real problem for LCDM. It has been amusing watching this break people’s brains. I’ve seen papers that say this is a problem for LCDM because you’d have to turn more than half of the available baryons into stars and that’s crazy talk, and others that say LCDM is absolutely OK because there are enough baryons. The observational result is the same – galaxies with very high stellar-to-dark halo mass ratios, but the interpretation appears to be different because one group of authors is treating the light gray region as forbidden while the other sets the bar at the dark gray region. So the difference in interpretation is not a conflict in the data, but an inconsistency in what [we think] LCDM predicts.

That’s enough for today. Galaxy data at high redshift are clearly in conflict with the a priori predictions of LCDM. This was true before JWST, and remains true with JWST. Whether the observations can be reconciled with LCDM I leave as an exercise for scientists in the field, or at least until another post.

⁺A minor technical note: the Schechter function is widely used to describe the luminosity function of galaxies, so it provides a common language with which to quantify both their characteristic luminosity L* and space density Φ*. I make use of it here to quantify the brightness of the typical galaxy. It is, of course, not perfect. As we go from low to high redshift, the luminosity function becomes less Schechter-like and more power law-like, an evolution that you can see in Jay Franck’s plot. We chose to use Schechter fits for consistency with the previous work of Mancone et al. (2010) and Wylezalek et al. (2014), and also to down-weight the influence of the few very bright galaxies should they be active galactic nuclei or some other form of contaminant. Long story short, plausible contaminants (no photometric redshifts were used; sample galaxies all have spectroscopic redshifts) cannot explain the bulk of the data; our estimates of m* are robust and, if anything, underestimate how bright galaxies typically are.

On the timescale for galaxy formation

I’ve been wanting to expand on the previous post ever since I wrote it, which is over a month ago now. It has been a busy end to the semester. Plus, there’s a lot to say – nothing that hasn’t been said before, somewhere, somehow, yet still a lot to cobble together into a coherent story – if that’s even possible. This will be a long post, and there will be more after to narrate the story of our big paper in the ApJ. My sole ambition here is to express the predictions of galaxy formation theory in LCDM and MOND in the broadest strokes.

A theory is only as good as its prior. We can always fudge things after the fact, so what matters most is what we predict in advance. What do we expect for the timescale of galaxy formation? To tell you what I’m going to tell you, it takes a long time to build a massive galaxy in LCDM, but it happens much faster in MOND.

Basic Considerations

What does it take to make a galaxy? A typical giant elliptical galaxy has a stellar mass of 9 x 10¹⁰ M_☉. That’s a bit more than our own Milky Way, which has a stellar mass of 5 or 6 x 10¹⁰ M_☉ (depending who you ask) with another 10¹⁰ M_☉ or so in gas. So, in classic astronomy/cosmology style, let’s round off and say a big galaxy is about 10¹¹ M_☉. That’s a hundred billion stars, give or take.

How much of the universe does it take to make one big galaxy? The critical density of the universe is the over/under point for whether an expanding universe expands forever, or has enough self-gravity to halt the expansion and ultimately recollapse. Numerically, this quantity is ρ_crit = 3H₀²/(8πG), which for H₀ = 73 km/s/Mpc works out to 10^-29 g/cm³ or 1.5 x 10^-7 M_☉/pc³. This is a very small number, but provides the benchmark against which we measure densities in cosmology. The density of any substance X is Ω_X = ρ_X/ρ_crit. The stars and gas in galaxies are made of baryons, and we know the baryon density pretty well from Big Bang Nucleosynthesis: Ω_b = 0.04. That means the average density of normal matter is very low, only about 4 x 10^-31 g/cm³. That’s less than one hydrogen atom per cubic meter – most of space is an excellent vacuum!

This being the case, we need to scoop up a large volume to make a big galaxy. Going through the math, to gather up enough mass to make a 10¹¹ M_☉ galaxy, we need a sphere with a radius of 1.6 Mpc. That’s in today’s universe; in the past the universe was denser by (1+z)³, so at z = 10 that’s “only” 140 kpc. Still, modern galaxies are much smaller than that; the effective edge of the disk of the Milky Way is at a radius of about 20 kpc, and most of the baryonic mass is concentrated well inside that: the typical half-light radius of a 10¹¹ M_☉ galaxy is around 6 kpc. That’s a long way to collapse.

Monolithic Galaxy Formation

Given this much information, an early concept was monolithic galaxy formation. We have a big ball of gas in the early universe that collapses to form a galaxy. Why and how this got started was fuzzy. But we knew how much mass we needed and the volume it had to come from, so we can consider what happens as the gas collapses to create a galaxy.

Here we hit a big astrophysical reality check. Just how does the gas collapse? It has to dissipate energy to do so, and cool to form stars. Once stars form, they may feed energy back into the surrounding gas, reheating it and potentially preventing the formation of more stars. These processes are nontrivial to compute ab initio, and attempting to do so obsesses much of the community. We don’t agree on how these things work, so they are the knobs theorists can turn to change an answer they don’t like.

Even if we don’t understand star formation in detail, we do observe that stars have formed, and can estimate how many. Moreover, we do understand pretty well how stars evolve once formed. Hence a common approach is to build stellar population models with some prescribed star formation history and see what works. Spiral galaxies like the Milky Way formed a lot of stars in the past, and continue to do so today. To make 5 x 10¹⁰ M_☉ of stars in 13 Gyr requires an average star formation rate of 4 M_☉/yr. The current measured star formation rate of the Milky Way is estimated to be 2 ± 0.7 M_☉/yr, so the star formation rate has been nearly constant (averaging over stochastic variations) over time, perhaps with a gradual decline. Giant elliptical galaxies, in contrast, are “red and dead”: they have no current star formation and appear to have made most of their stars long ago. Rather than a roughly constant rate of star formation, they peaked early and declined rapidly. The cessation of star formation is also called quenching.

A common way to formulate the star formation rate in galaxies as a whole is the exponential star formation rate, SFR(t) = SFR₀ e^-t/τ. A spiral galaxy has a low baseline star formation rate SFR₀ and a long burn time τ ~ 10 Gyr while an elliptical galaxy has a high initial star formation rate and a short e-folding time like τ ~ 1 Gyr. Many variations on this theme are possible, and are of great interest astronomically, but this basic distinction suffices for our discussion here. From the perspective of the observed mass and stellar populations of local galaxies, the standard picture for a giant elliptical was a large, monolithic island universe that formed the vast majority of its stars early on then quenched with a short e-folding timescale.

Galaxies as Island Universes

The density parameter Ω provides another useful way to think about galaxy formation. As cosmologists, we obsess about the global value of Ω because it determines the expansion history and ultimate fate of the universe. Here it has a more modest application. We can think of the region in the early universe that will ultimately become a galaxy as its own little closed universe. With a density parameter Ω > 1, it is destined to recollapse.

A fun and funny fact of the Friedmann equation is that the matter density parameter Ω_m → 1 at early times, so the early universe when galaxies form is matter dominated. It is also very uniform (more on that below). So any subset that is a bit more dense than average will have Ω > 1 just because the average is very close to Ω = 1. We can then treat this region as its own little universe (a “top-hat overdensity”) and use the Friedmann equation to solve for its evolution, as in this sketch:

*The expansion of the early universe a(t) (blue line). A locally overdense region may behave as a closed universe, recollapsing in a finite time (red line) to potentially form a galaxy.*

That’s great, right? We have a simple, analytic solution derived from first principles that explains how a galaxy forms. We can plug in the numbers to find how long it takes to form our basic, big 10¹¹ M_☉ galaxy and… immediately encounter a problem. We need to know how overdense our protogalaxy starts out. Is its effective initial Ω_m = 2? 10? What value, at what time? The higher it is, the faster the evolution from initially expanding along with the rest of the universe to decoupling from the Hubble flow to collapsing. We know the math but we still need to know the initial condition.

Annoying Initial Conditions

The initial condition for galaxy formation is observed in the cosmic microwave background (CMB) at z = 1090. Where today’s universe is remarkably lumpy, the early universe is incredibly uniform. It is so smooth that it is homogeneous and isotropic to one part in a hundred thousand. This is annoyingly smooth, in fact. It would help to have some lumps – primordial seeds with Ω > 1 – from which structure can grow. The observed seeds are too tiny; the typical initial amplitude is 10^-5 so Ω_m = 1.00001. That takes forever to decouple and recollapse; it hasn’t yet had time to happen.

The cosmic microwave background as observed by ESA’s Planck satellite. This is an all-sky picture of the relic radiation field – essentially a snapshot of the universe when it was just a few hundred thousand years old. The variations in color are variations in temperature which correspond to variations in density. These variations are tiny, only about one part in 100,000. The early universe was very uniform; the real picture is a boring blank grayscale. We have to crank the contrast way up to see these minute variations.

We would like to know how the big galaxies of today – enormous agglomerations of stars and gas and dust separated by inconceivably vast distances – came to be. How can this happen starting from such homogeneous initial conditions, where all the mass is equally distributed? Gravity is an attractive force that makes the rich get richer, so it will grow the slight initial differences in density, but it is also weak and slow to act. A basic result in gravitational perturbation theory is that overdensities grow at the same rate the universe expands, which is inversely related to redshift. So if we see tiny fluctuations in density with amplitude 10^-5 at z = 1000, they should have only grown by a factor of 1000 and still be small today (10^-2 at z = 0). But we see structures of much higher contrast than that. You can’t here from there.

The rich large scale structure we see today is impossible starting from the smooth observed initial conditions. Yet here we are, so we have to do something to goose the process. This is one of the original motivations for invoking cold dark matter (CDM). If there is a substance that does not interact with photons, it can start to clump up early without leaving too large a mark on the relic radiation field. In effect, the initial fluctuations in mass are larger, just in the invisible substance. (That’s not to say the CDM doesn’t leave a mark on the CMB; it does, but it is subtle and entirely another story.) So the idea is that dark matter forms gravitational structures first, and the baryons fall in later to make galaxies.

An illustration of the the linear growth of overdensities. Structure can grow in the dark matter (long dashed lines) with the baryons catching up only after decoupling (short dashed line). In effect, the dark matter gives structure formation a head start, nicely explaining the apparently impossible growth factor. This has been standard picture for what seems like forever (illustration from Schramm 1992).

With the right amount of CDM – and it has to be just the right amount of a dynamically cold form of non-baryonic dark matter (stuff we still don’t know actually exists) – we can explain how the growth factor is 10⁵ since recombination instead of a mere 10³. The dark matter got a head start over the stuff we can see; it looks like 10⁵ because the normal matter lagged behind, being entangled with the radiation field in a way the dark matter was not.

This has been the imperative need in structure formation theory for so long that it has become undisputed lore; an element of the belief system so deeply embedded that it is practically impossible to question. I risk getting ahead of the story, but it is important to point out that, like the interpretation of so much of the relevant astrophysical data, this belief assumes that gravity is normal. This assumption dictates the growth rate of structure, which in turn dictates the need to invoke CDM to allow structure to form in the available time. If we drop this assumption, then we have to work out what happens in each and every alternative that we might consider. That definitely gets ahead of the story, so first let’s understand what we should expect in LCDM.

Hierarchical Galaxy formation in LCDM

LCDM predicts some things remarkably well but others not so much. The dark matter is well-behaved, responding only to gravity. Baryons, on the other hand, are messy – one has to worry about hydrodynamics in the gas, star formation, feedback, dust, and probably even magnetic fields. In a nutshell, LCDM simulations are very good at predicting the assembly of dark mass, but converting that into observational predictions relies on our incomplete knowledge of messy astrophysics. We know what the mass should be doing, but we don’t know so well how that translates to what we see. Mass good, light bad.

Starting with the assembly of mass, the first thing we learn is that the story of monolithic galaxy formation outlined above has to be wrong. Early density fluctuations start out tiny, even in dark matter. God didn’t plunk down island universes of galaxy mass then say “let there be galaxies!” The annoying initial conditions mean that little dark matter halos form first. These subsequently merge hierarchically to make ever bigger halos. Rather than top-down monolithic galaxy formation, we have the bottom-up hierarchical formation of dark matter halos.

The hierarchical agglomeration of dark matter halos into ever larger objects is often depicted as a merger tree. Here are four examples from the high resolution Illustris TNG50 simulation (Pillepich et al. 2019; Nelson et al. 2019).

Examples of merger trees from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019). Objects have been selected to have very nearly the same stellar mass at z=0. Mass is built up through a series of mergers. One large dark matter halo today (at top) has many antecedents (small halos at bottom). These merge hierarchically as illustrated by the connecting lines. *The size of the symbol is proportional to the halo mass.* I have added redshift *and the corresponding age of the universe for vanilla LCDM* in a more legible font. *The color bar illustrates the specific star formation rate*: the top row has objects that are still actively star forming like spirals; those in the bottom row are “red and dead” – things that have stopped forming stars, like giant elliptical galaxies. In all cases, there is a lot of merging and a modest rate of growth, with the typical object taking about half a Hubble time (~7 Gyr) to assemble half of its final stellar mass.

The hierarchical assembly of mass is generic in CDM. Indeed, it is one of its most robust predictions. Dark matter halos start small, and grow larger by a succession of many mergers. This gradual agglomeration is slow: note how tiny the dark matter halos at z = 10 are.

Strictly speaking, it isn’t even meaningful to talk about a single galaxy over the span of a Hubble time. It is hard to avoid this mental trap: surely the Milky Way has always been the Milky Way? so one imagines its evolution over time. This is monolithic thinking. Hierarchically, “the galaxy” refers at best to the largest progenitor, the object that traces the left edge of the merger trees above. But the other protogalactic chunks that eventually merge together are as much part of the final galaxy as the progenitor that happens to be largest.

This complicated picture is complicated further by what we can see being stars, not mass. The luminosity we observe forms through a combination of in situ growth (star formation in the largest progenitor) and ex situ growth through merging. There is no reason for some preferred set of protogalaxies to form stars faster than the others (though of course there is some scatter about the mean), so presumably the light traces the mass of stars formed traces the underlying dark mass. Presumably.

That we should see lots of little protogalaxies at high redshift is nicely illustrated by this lookback cone from Yung et al (2022). Here the color and size of each point corresponds to the stellar mass. Massive objects are common at low redshift but become progressively rare at high redshift, petering out at z > 4 and basically absent at z = 10. This realization of the observable stellar mass tracks the assembly of dark mass seen in merger trees.

This is what we expect to see in LCDM: lots of small protogalaxies at high redshift; the building blocks of later galaxies that had not yet merged. The observation of galaxies much brighter than this at high redshift by JWST poses a fundamental challenge to the paradigm: mass appears not to be subdivided as expected. So it is entirely justifiable that people have been freaking out that what we see are bright galaxies that are apparently already massive. That shouldn’t happen; it wasn’t predicted to happen; how can this be happening?

That’s all background that is assumed knowledge for our ApJ paper, so we’re only now getting to its Figure 1. This combines one of the merger trees above with its stellar mass evolution. The left panel shows the assembly of dark mass; the right pane shows the growth of stellar mass in the largest progenitor. This is what we expect to see in observations.

**Fig. 1** from McGaugh et al (2024): A merger tree for a model galaxy from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019, left panel) selected to have M_∗ ≈ 9 × 10¹⁰ M_⊙ at z = 0; i.e., the stellar mass of a local L^∗ giant elliptical galaxy (Driver et al. 2022). Mass assembles hierarchically, starting from small halos at high redshift (bottom edge) with the largest progenitor traced along the left of edge of the merger tree. The growth of stellar mass of the largest progenitor is shown in the right panel. This example (jagged line) is close to the median (dashed line) of comparable mass objects (Rodriguez-Gomez et al. 2016), and within the range of the scatter (the shaded band shows the 16th – 84th percentiles). A monolithic model that forms at z_f = 10 and evolves with an exponentially declining star formation rate with τ = 1 Gyr (purple line) is shown for comparison. The latter model forms most of its stars earlier than occurs in the simulation.

For comparison, we also show the stellar mass growth of a monolithic model for a giant elliptical galaxy. This is the classic picture we had for such galaxies before we realized that galaxy formation had to be hierarchical. This particular monolithic model forms at z_f = 10 and follows an exponential star formation rate with τ = 1 Gyr. It is one of the models published by Franck & McGaugh (2017). It is, in fact, the first model I asked Jay to construct when he started the project. Not because we expected it to best describe the data, as it turns out to do, but because the simple exponential model is a touchstone of stellar population modeling. It was a starter model: do this basic thing first to make sure you’re doing it right. We chose τ = 1 Gyr because that was the typical number bandied about for elliptical galaxies, and z_f = 10 because that seemed ridiculously early for a massive galaxy to form. At the time we built the model, it was ludicrously early to imagine a massive galaxy would form, from an LCDM perspective. A formation redshift z_f = 10 was, less than a decade ago, practically indistinguishable from the beginning of time, so we expected it to provide a limit that the data would not possibly approach.

In a remarkably short period, JWST has transformed z = 10 from inconceivable to run of the mill. I’m not going to go into the data yet – this all-theory post is already a lot – but to offer one spoiler: the data are consistent with this monolithic model. If we want to “fix” LCDM, we have to make the red line into the purple line for enough objects to explain the data. That proves to be challenging. But that’s moving the goalposts; the prediction was that we should see little protogalaxies at high redshift, not massive, monolith-style objects. Just look at the merger trees at z = 10!

Accelerated Structure Formation in MOND

In order to address these issues in MOND, we have to go back to the beginning. What is the evolution of a spherical region (a top-hat overdensity) that might collapse to form a galaxy? How does a spherical region under the influence of MOND evolve within an expanding universe?

The solution to this problem was first found by Felten (1984), who was trying to play the Newtonian cosmology trick in MOND. In conventional dynamics, one can solve the equation of motion for a point on the surface of a uniform sphere that is initially expanding and recover the essence of the Friedmann equation. It was reasonable to check if cosmology might be that simple in MOND. It was not. The appearance of a₀ as a physical scale makes the solution scale-dependent: there is no general solution that one can imagine applies to the universe as a whole.

Felten reasonably saw this as a failure. There were, however, some appealing aspects of his solution. For one, there was no such thing as a critical density. All MOND universes would eventually recollapse irrespective of their density (in the absence of the repulsion provided by a cosmological constant). It could take a very long time, which depended on the density, but the ultimate fate was always the same. There was no special value of Ω, and hence no flatness problem. The latter obsessed people at the time, so I’m somewhat surprised that no one seems to have made this connection. Too soon*, I guess.

There it sat for many years, an obscure solution for an obscure theory to which no one gave credence. When I became interested in the problem a decade later, I started methodically checking all the classic results. I was surprised to find how many things we needed dark matter to explain were just as well (or better) explained by MOND. My exact quote was “surprised the bejeepers out of us.” So, what about galaxy formation?

I started with the top-hat overdensity, and had the epiphany that Felten had already obtained the solution. He had been trying to solve all of cosmology, which didn’t work. But he had solved the evolution of a spherical region that starts out expanding with the rest of the universe but subsequently collapses under the influence of MOND. The overdensity didn’t need to be large, it just needed to be in the low acceleration regime. Something like the red cycloidal line in the second plot above could happen in a finite time. But how much?

The solution depends on scale and needs to be solved numerically. I am not the greatest programmer, and I had a lot else on my plate at the time. I was in no rush, as I figured I was the only one working on it. This is usually a good assumption with MOND, but not in this case. Bob Sanders had had the same epiphany around the same time, which I discovered when I received his manuscript to referee. So all credit is due to Bob: he said these things first.

First, he noted that galaxy formation in MOND is still hierarchical. Small things form first. Crudely speaking, structure formation is very similar to the conventional case, but now the goose comes from the change in the force law rather than extra dark mass. MOND is nonlinear, so the whole process gets accelerated. To compare with the linear growth of CDM:

A sketch of how structures grow over time under the influence of cold dark matter (left, from Schramm 1992, same as above) and MOND (right, from Sanders & McGaugh 2002; see also this further discussion and previous post). The slow linear growth of CDM (long-dashed line, left panel) is replaced by a rapid, nonlinear growth in MOND (solid lines at right; numbers correspond to different scales). Nonlinear growth moderates after cosmic expansion begins to accelerate (dashed vertical line in right panel).

The net effect is the same. A cosmic web of large scale structure emerges. They look qualitatively similar, but everything happens faster in MOND. This is why observations have persistently revealed structures that are more massive and were in place earlier than expected in contemporaneous LCDM models.

*Simulated structure formation in ΛCDM (top) and MOND (bottom) showing the more rapid emergence of similar structures in MOND (note the redshift of each panel). From McGaugh (2015).*

In MOND, small objects like globular clusters form first, but galaxies of a range of masses all collapse on a relatively short cosmic timescale. How short? Let’s consider our typical 10¹¹ M_☉ galaxy. Solving Felten’s equation for the evolution of a sphere numerically, peak expansion is reached after 300 Myr and collapse happens in a similar time. The whole galaxy is in place speedy quick, and the initial conditions don’t really matter: a uniform, initially expanding sphere in the low acceleration regime will behave this way. From our distant vantage point thirteen billion years later, the whole process looks almost monolithic (the purple line above) even though it is a chaotic hierarchical mess for the first few hundred million years (z > 14). In particular, it is easy to form half of the stellar mass early on: the mass is already assembled.

This is what JWST sees: galaxies that are already massive when the universe is just half a billion years old. I’m sure I should say more but I’m exhausted now and you may be too, so I’m gonna stop here by noting that in 1998, when Bob Sanders predicted that “Objects of galaxy mass are the first virialized objects to form (by z=10),” the contemporaneous prediction of LCDM was that “present-day disc [galaxies] were assembled recently (at z<=1)” and “there is nothing above redshift 7.” One of these predictions has been realized. It is rare in science that such a clear a priori prediction comes true, let alone one that seemed so unreasonable at the time, and which took a quarter century to corroborate.

*I am not quite this old: I was still an undergraduate in 1984. I hadn’t even decided to be an astronomer at that point; I certainly hadn’t started following the literature. The first time I heard of MOND was in a graduate course taught by Doug Richstone in 1988. He only mentioned it in passing while talking about dark matter, writing the equation on the board and saying maybe it could be this. I recall staring at it for a long few seconds, then shaking my head and muttering “no way.” I then completely forgot about it, not thinking about it again until it came up in our data for low surface brightness galaxies. I expect most other professionals have the same initial reaction, which is fair. The test of character comes when it crops up in their data, as it is doing now for the high redshift galaxy community.

A Blog About the Science and Sociology of Cosmology and Dark Matter