Kinematics suggest large masses for high redshift galaxies

This is what I hope will be the final installment in a series of posts describing the results published in McGaugh et al. (2024). I started by discussing the timescale for galaxy formation in LCDM and MOND which leads to different and distinct predictions. I then discussed the observations that constrain the growth of stellar mass over cosmic time and the related observation of stellar populations that are mature for the age of the universe. I then put on an LCDM hat to try to figure out ways to wriggle out of the obvious conclusion that galaxies grew too massive too fast. Exploring all the arguments that will be made is the hardest part, not because they are difficult to anticipate, but because there are so many* options to consider. This leads to many pages of minutiae that no one ever seems to read⁺, so one of the options I’ve discussed (e.g., super-efficient star formation) will likely emerge as the standard picture even if it comes pre-debunked.

The emphasis so far has been on the evolution of the stellar masses of galaxies because that is observationally most accessible. That gives us the opportunity to wriggle, because what we really want to measure to test LCDM is the growth of [dark] mass. This is well-predicted but invisible, so we can always play games to relate light to mass.

Mass assembly in LCDM from the IllustrisTNG50 simulation. The dark matter mass assembles hierarchically in the merger tree depicted at left; the size of the circles illustrates the dark matter halo mass. The corresponding stellar mass of the largest progenitor is shown at right as the red band. This does not keep pace with the apparent assembly of stellar mass (data points), but what is the underlying mass really doing?

Galaxy Kinematics

What we really want to know is the underlying mass. It is reasonable to expect that the light traces this mass, but is there another way to assess it? Yes: kinematics. The orbital speeds of objects in galaxies trace the total potential, including the dark matter. So, how massive were early galaxies? How does that evolve with redshift?

The rotation curve of NGC 6946 traced by stars at small radii and gas farther out. This is a typical flat rotation curve (data points) that exceeds what can be explained by the observed baryonic mass (red line deduced from the stars and gas pictured at right), leading to the inference of dark matter.

The rotation curve for NGC 6946 shows a number of well-established characteristics for nearby galaxies, including the dominance of baryons at small radii in high surface brightness galaxies and the famous flat outer portion of the rotation curve. Even when stars contribute as much mass as allowed by the inner rotation curve (“maximum disk“), there is a need for something extra further out (i.e., dark matter or MOND). In the case of dark matter, the amplitude of flat rotation is typically interpreted as being indicative^& of halo mass.

So far, the rotation curves of high redshift galaxies look very much like those of low redshift galaxies. There are some fast rotators at high redshift as well. Here is an example observed by Neeleman et al. (2020), who measure a flat rotation speed of 272 km/s for DLA0817g at z = 4.26. That’s more massive than either the Milky Way (~200 km/s) or Andromeda (~230 km/s), if not quite as big as local heavyweight champion UGC 2885 (300 km/s). DLA0817g looks to be a disk galaxy that formed early and is sedately rotating only 1.4 Gyr after the Big Bang. It is already massive at this time: not at all the little nuggets we expect from the CDM merger tree above.

**Fig. 1** from Neeleman et al. (2020): the velocity field (left) and position-velocity diagram (right) of DLA0817g. The velocity field looks like that of a rotating disk with the raw *position-velocity diagram* shows motions of ~200 km/s on either side of the center. When corrected for inclination, the flat rotation speed is 272 km/s, corresponding to a massive galaxy near the top of the Tully-Fisher relation.

This is anecdotal, of course, but there are a good number of similar cases that are already known. For example, the kinematics of ALESS 073.1 at z ≈ 5 indicate the presence of a massive stellar bulge as well as a rapidly rotating disk (Lelli et al. 2021). A similar case has been observed at z ≈ 6 (Tripodi et al. 2023). These kinematic observations indicate the presence of mature, massive disk galaxies well before they were expected to be in place (Pillepich et al. 2019; Wardlow 2021). The high rotation speeds observed in early disk galaxies sometimes exceed 250 (Neeleman et al. 2020) or even 300 km s⁻¹ (Nestor Shachar et al. 2023; Wang et al. 2024), comparable to the most massive local spirals (Noordermeer et al. 2007; Di Teodoro et al. 2021, 2023). That such rapidly rotating galaxies exist at high redshift indicates that there is a lot of mass present, not just light. We can’t just tweak the mass-to-light ratio of the stars to explain the photometry and also explain the kinematics.

In a seminal galaxy formation paper, Mo, Mao, & White (1998) predicted that “present-day disks were assembled recently (at z ≤ 1).” Today, we see that spiral galaxies are ubiquitous in JWST images up to z ∼ 6 (Ferreira et al. 2022, 2023; Kuhn et al. 2024). The early appearance of massive, dynamically cold (Di Teodoro et al. 2016; Lelli et al. 2018, 2023; Rizzo et al. 2023) disks in the first few billion years after the Big Bang is contradictory the natural prediction of ΛCDM. Early disks are expected to be small and dynamically hot (Dekel & Burkert 2014; Zolotov et al. 2015; Krumholz et al. 2018; Pillepich et al. 2019), but they are observed to be massive and dynamically cold. (Hot or cold in this context means a high or low amplitude of the velocity dispersion relative to the rotation speed; the modern Milky Way is cold with σ ~ 20 km/s and V_c ~ 200 km/s.) Understanding the stability and longevity of dynamically cold spiral disks is foundational to the problem.

Kinematic Scaling Relations

Beyond anecdotal cases, we can check on kinematic scaling relations like Tully–Fisher. These are expected to emerge late and evolve significantly with redshift in LCDM (e.g., Glowacki et al. 2021). In MOND, the normalization of the baryonic Tully–Fisher relation is set by a₀, so is immutable for all time if a₀ is constant. Let’s see what the data say:

**Figure 9** from McGaugh et al (2024): The baryonic Tully–Fisher (left) and dark matter fraction–surface brightness (right) relations. Local galaxy data (circles) are from Lelli et al. (2019; left) and Lelli et al. (2016; right). Higher-redshift data (squares) are from Nestor Shachar et al. (2023) in bins with equal numbers of galaxies color coded by redshift: 0.6 < z < 1.22 (blue), 1.22 < z < 2.14 (green), and 2.14 < z < 2.53 (red). Open squares with error bars illustrate the typical uncertainties. The relations known at low redshift also appear at higher redshift with no clear indication of evolution over a lookback time up to 11 Gyr.

Not much to see: the data from Nestor Shachar et al. (2023) show no clear indication of evolution. The same can be said for the dark matter fraction-surface brightness relation. (Glad to see that being plotted after I pointed it out.) The local relations are coincident with those at higher redshift for both relations within any sober assessment of the uncertainties – exactly what we measure and how matters at this level, and I’m not going to attempt to disentangle all that here. Neither am I about to attempt to assess the consistency (or lack thereof) with either LCDM or MOND; the data simply aren’t good enough for that yet. It is also not clear to me that everyone agrees on what LCDM predicts.

What I can do is check empirically how much evolution there is within the 100-galaxy data set of Nestor Shachar et al. (2023). To do that, I fit a line to their data (the left panel above) and measure the residuals: for a given rotation speed, how far is each galaxy from the expected mass? To compare this with the stellar masses discussed previously, I normalize those residuals to the same M_*^* = 9 x 10¹⁰ M_☉. If there is no evolution, the data will scatter around a constant value as function of redshift:

This figure reproduces the stellar mass-redshift data for L* galaxies (black points) and the monolithic (purple line) and LCDM (red and green lines) models discussed previously. The blue squares illustrate deviations of the data of Nestor Shachar et al. (2023) from the baryonic Tully-Fisher relation (dashed line, normalized to the same mass as the monolithic model). There is no indication of evolution in the baryonic Tully-Fisher relation, which was apparently established within the first few billion years after the Big Bang (z = 2.5 corresponds to a cosmic age of about 2.6 Gyr). The data are consistent with a monolithic galaxy formation model in which all the mass had been assembled into a single object early on.

The data scatter around a constant value as function of redshift: there is no perceptible evolution.

The kinematic data for rotating galaxies tells much the same story as the photometric data for galaxies in clusters. The are both consistent with a monolithic model that gathered together the bulk of the baryonic mass early on, and evolved as an island universe for most of the history of the cosmos. There is no hint of the decline in mass with redshift predicted by the LCDM simulations. Moreover, the kinematics trace mass, not just light. So while I am careful to consider the options for LCDM, I don’t know how we’re gonna get out of this one.

Empirically, it is an important observation that there is no apparent evolution in the baryonic Tully-Fisher relation out to z ~ 2.5. That’s a lookback time of ~11 Gyr, so most of cosmic history. That means that whatever physics sets the relation did so early. If the physics is MOND, this absence of evolution implies that a₀ is constant. There is some wiggle room in that given all the uncertainties, but this already excludes the picture in which a₀ evolves with the expansion rate through the coincidence a₀ ~ cH₀. That much evolution would be readily perceptible if H(z) evolves as it appears to do. In contrast, the coincidence a₀ ~ c²Λ^1/2 remains interesting since the cosmological constant is constant. Perhaps this is just a coincidence, or perhaps it is a hint that the anomalous acceleration of the expansion of the universe is somehow connected with the anomalous acceleration in galaxy dynamics.

Though I see no clear evidence for evolution in Tully-Fisher to date, it remains early days. For example, a very recent paper by Amvrosiadis et al. (2025) does show a hint of evolution in the sense of an offset in the normalization of the baryonic Tully-Fisher relation. This isn’t very significant, being different by less than 2σ; and again we find ourselves in a situation where we need to take a hard look at all the assumptions and population modeling and velocity measurements just to see if we’re talking about the same quantities before we even begin to assess consistency or the lack thereof. Nevertheless, it is an intriguing result. There is also another interesting anecdotal case: one of their highest redshift objects, ALESS 071.1 at z = 3.7, is also the most massive in the sample, with an estimated stellar mass of 2 x 10¹² M_☉. That is a crazy large number, comparable to or maybe larger than the entire dark matter halo of the Milky Way. It falls off the top of any of the graphs of stellar mass we discussed before. If correct, this one galaxy is an enormous problem for LCDM regardless of any other consideration. It is of course possible that this case will turn out to be wrong for some reason, so it remains early days for kinematics at high redshift.

Cluster Kinematics

It is even earlier days for cluster kinematics. First we have to find them, which was the focus of Jay Franck’s thesis. Once identified, we have to estimate their masses with the available data, which may or may not be up to the task. And of course we have to figure out what theory predicts.

LCDM makes a clear prediction for the growth of cluster mass. This work out OK at low redshift, in the sense that the cluster X-ray mass function is in good agreement with LCDM. Where the theory struggles is in the proclivity for the most massive clusters to appear sooner in cosmic history than anticipated. Like individual galaxies, they appear too big too soon. This trend persisted in Jay’s analysis, which identified candidate protoclusters at higher redshifts than expected. It also measured velocity dispersions that were consistently higher than found in simulations. That is, when Jay applied the search algorithm he used on the data to mock data from the Millennium simulation, the structures identified there had velocity dispersions on average a factor of two lower than seen in the data. That’s a big difference in terms of mass.

**Figure 11** from McGaugh et al. (2024): Measured velocity dispersions of protocluster candidates (Franck & McGaugh 2016a, 2016b) as a function of redshift. Point size grows with the assessed probability that the identified overdensities correspond to a real structure: all objects are shown as small points, candidates with P > 50% are shown as light blue midsize points, and the large dark blue points meet this criterion and additionally have at least 10 spectroscopically confirmed members. The MOND mass for an equilibrium system in the low-acceleration regime is noted at right; these are comparable to cluster masses at low redshift.

At this juncture, there is no way to know if the protocluster candidates Jay identified are or will become bound structures. We made some probability estimates that can be summed up as “some are probably real, but some probably are not.” The relative probability is illustrated by the size of the points in the plot above; the big blue points are the most likely to be real clusters, having at least ten galaxies at the same place on the sky at the same redshift, all with spectroscopically measured redshifts. Here the spectra are critical; photometric redshifts typically are not accurate enough to indicate that galaxies that happen to be nearby to each other on the sky are also that close in redshift space.

The net upshot is that there are at least some good candidate clusters at high redshift, and these have higher velocity dispersions than expected in LCDM. I did the exercise of working out what the equivalent mass in MOND would be, and it is about the same as what we find for clusters at low redshift. This estimate assumes dynamical equilibrium, which is very far from guaranteed. But the time at which these structures appear is consistent with the timescale for cluster formation in MOND (a couple Gyr; z ~ 3), so maybe? Certainly there shouldn’t be lots of massive clusters in LCDM at z ~ 3.

Kinematic Takeaways

While it remains early days for kinematic observations at high redshift, so far these data do nothing to contradict the obvious interpretation of the photometric data. There are mature, dynamically cold, fast rotating spiral galaxies in the early universe that were predicted not to be there by LCDM. Moreover, kinematics traces mass, not just light, so all the wriggling we might try to explain the latter doesn’t help with the former. The most obvious interpretation of the kinematic data to date is the same as that for the photometric data: galaxies formed early and grew massive quickly, as predicted a priori by MOND.

*The papers I write that cover both theories always seem to wind up lopsided in favor of LCDM in terms of the bulk of their content. That happens because it takes many pages to discuss all the ins and outs. In contrast, MOND just gets it right the first time, so that section is short: there’s not much more to say than “Yep, that’s what it predicted.”

⁺I’ve yet not heard directly any criticisms of our paper. The criticisms that I’ve heard second or third hand so far almost all fall in the category of things we explicitly discussed. That’s a pretty clear tell that the person leveling the critique hasn’t bothered to read it. I don’t expect everyone to agree with our take on this or that, but a competent critic would at least evince awareness that we had addressed their concern, even if not to their satisfaction. We rarely seem to reach that level: it is much easier to libel and slander than engage with the issues.

The one complaint I’ve heard so far that doesn’t fall in the category of things-we-already-discussed is that we didn’t do hydrodynamic simulations of star formation in molecular gas. That is a red herring. To predict the growth of stellar mass, all we need is a prescription for assembling mass and converting baryons into stars; this is essentially a bookkeeping exercise that can be done analytically. If this were a serious concern, it should be noted that most cosmological hydro-simulations also fail to meet this standard: they don’t resolve star formation, so they typically adopt some semi-empirical (i.e., data-informed) bookkeeping prescription for this “subgrid physics.”

Though I have not myself attempted to numerically simulate galaxy formation in MOND, Sanders (2008) did. More recently, Eappen et al. (2022) have done so, including molecular gas and feedback^$ and everything. They find a star formation history compatible with the analytic models we discuss in our paper.

^$Related detail: Eappen et al find that different feedback schemes make little difference to the end result. The deus ex machina invoked to solve all problems in LCDM is largely irrelevant in MOND. There’s a good physical reason for this: gravity in MOND is sourced by what you see; how it came to have its observed distribution is irrelevant. If 90% of the baryons are swept entirely out of the galaxy by some intense galactic wind, then they’re gone BYE BYE and don’t matter any more. In contrast, that is one of the scenarios sometimes invoked to form cores in dark matter halos that are initially cuspy: the departure of all those baryons perturbs the orbits of the dark matter particles and rearranges the structure of the halo. While that might work to alter halo structure, how it results in MOND-like phenomenology has never been satisfactorily explained. Mostly that is not seen as even necessary; converting cusp to core is close enough!

^&Though we typically associate the observed outer velocity with halo mass, an important caveat is that the radius also matters: M ~ RV², and most data for high redshift galaxies do not extend very far out in radius. Nevertheless, it takes a lot of mass to make rotation speeds of order 200 km/s within a few kpc, so it hardly matters if this is or is not representative of the dark matter halo: if it is all stars, then the kinematics directly corroborate the interpretation of the photometric data that the stellar mass is large. If it is representative of the dark matter halo, then we expect the halo radius to scale with the halo velocity (R₂₀₀ ~ V₂₀₀) so M₂₀₀~ V₂₀₀³ and again it appears that there is too much mass in place too early.

The fault in our stars: blame them, not the dark matter!

As discussed in recent posts, the appearance of massive galaxies in the early universe was predicted a priori by MOND (Sanders 1998, Sanders 2008, Eappen et al. 2022). This is problematic for LCDM. How problematic? That’s always the rub.

The data follow the evolutionary track of a monolithic model (purple line) rather than the track of the largest progenitor predicted by hierarchical LCDM (dotted lines leading to different final masses).

The problem that JWST observations pose for LCDM is that there is a population of galaxies in the high redshift universe that appear to evolve as giant monoliths rather than assembling hierarchically. Put that way, it is a fatal flaw: hierarchical assembly of mass is fundamental to the paradigm. But we don’t observe mass, we observe light. So the obvious “fix” is to adjust the mapping of observed light to predicted dark halo mass in order to match the observations. How plausible is this?

Merger trees from the Illustris-TNG50 simulation showing the hierarchical assembly of L* galaxies. The dotted lines in the preceding plot show the stellar mass growth of the largest progenitor, which is on the left of each merger tree. All progenitors were predicted to be tiny at z > 3, well short of what we observe.

Before trying to wriggle out of the basic result, note that doing so is not plausible from the outset. We need to make the curve of growth of the largest progenitors “look like” the monolithic model. They shouldn’t, by construction, so everything that follows is a fudge to try to avoid the obvious conclusion. But this sort of fudging has been done so many times before in so many ways (the “Frenk Principle” was coined nearly thirty years ago) that many scientists in the field have known nothing else. They seem to think that this is how science is supposed to work. This in turn feeds a convenient attitude that evades the duty to acknowledge that a theory is in trouble when it persistently has to be adjusted to make itself look like a competitor.

That noted, let’s wriggle!

Observational dodges

The first dodge is denial: somehow the JWST data are wrong or misleading. Early on, there were plausible concerns about the validity of some (some) photometric redshifts. There are enough spectroscopic redshifts now that this point is moot.

A related concern is that we “got lucky” with where we pointed JWST to start with, and the results so far are not typical of the universe at large. This is not quite as crazy as it sounds: the field of view of JWST is tiny, so there is no guarantee that the first snapshot will be representative. Moreover, a number of the first pointings intentionally targeted rich fields containing massive clusters, i.e., regions known to be atypical. However, as observations have accumulated, I have seen no indications of a reversal of our first impression, but rather lots of corroboration. So this hedge also now borders on reality denial.

A third observational concern that we worried a lot about in Franck & McGaugh (2017) is contamination by active galactic nuclei (AGN). Luminosity produced by accretion onto supermassive black holes (e.g., quasars) was more common in the early universe. Perhaps some of the light we are attributing to stars is actually produced by AGN. That’s a real concern, but long story short, AGN contamination isn’t enough to explain everything else away. Indeed, the AGN themselves are a problem in their own right: how do we make the supermassive black holes that power AGN so rapidly that they appear already in the early universe? Like the galaxies they inhabit, the black holes that power AGN should take a long time to assemble in the absence of the heavy seeds naturally provided by MOND but not dark matter.

An evergreen concern in astronomy is extinction by dust. Dust could play a role (Ferrara et al. 2023), but this would be a weird effect for it to have. Dust is made by stars, so we naively expect it to build up along with them. In order to explain high redshift JWST data with dust we have to do the opposite: make a lot of dust very early without a lot of stars, then eject it systematically from galaxies so that the net extinction declines with time – a galactic reveal sort of like a cosmic version of the dance of the seven veils. The rate of ejection for all galaxies must necessarily be fine-tuned to balance the barely evolving UV luminosity function with the rapidly evolving dark matter halo mass function. This evolution of the extinction has to coordinate with the dark matter evolution over a rather small window of cosmic time, there being only ∼10⁸ yr between z = 14 and 11. This seems like an implausible way to explain an unchanging luminosity density, which is more naturally explained by simply having stars form and be there for their natural lifetimes.

**Figure 5** from McGaugh et al. (2024): The UV luminosity function (left) observed by Donnan et al. (2024; points) compared to that predicted for ΛCDM by Yung et al. (2023; lines) as a function of redshift. Lines and points are color coded by redshift, with dark blue, light blue, green, orange, and red corresponding to z = 9, 10, 11, 12, and 14, respectively. There is a clear excess in the number density of galaxies that becomes more pronounced with redshift, ranging from a factor of ∼2 at z = 9 to an order of magnitude at z ≥ 11 (right). This excess occurs because the predicted number of sources declines with redshift while the observed numbers remain nearly constant with the data at z = 9, 10, and 11being right on top of each other.

The basic observation is that there is too much UV light produced by galaxies at all redshifts z > 9. What we’d rather have is the stellar mass function. JWST was designed to see optical light at the redshift of galaxy formation, but the universe surprised us and formed so many stars so early that we are stuck making inferences with the UV anyway. The relation of UV light to mass is dodgy, providing a knob to twist. So up next is the physics of light production.

In our discussion to this point, we have assumed that we know how to compute the luminosity evolution of a stellar population given a prescription for its star formation history. This is no small feat. This subject has a rich history with plenty of ups and downs, like most of astronomy. I’m not going to attempt to review all that here. I think we have this figured out well enough to do what we need to do for the purposes of our discussion here, but there are some obvious knobs to turn, so let’s turn ’em.

Blame the stars!

As noted above, we predict mass but observe light. So the program now is to squeeze more light out of less mass. Early dark matter halos too small? No problem; just make them brighter. More specifically, we need to make models in which the small dark matter halos that form first are better at producing photons from the small amount of baryons that they possess than are their low-redshift descendants. We have observational constraints on the latter; local star formation is inefficient, but maybe that wasn’t always the case. So the first obvious thing to try is to make star formation more efficient.

Super Efficient Star Formation

First, note that stellar populations evolve pretty much as we expect for stars, so this is a bit tricky. We have to retain the evolution we understand well for most of cosmic time while giving a big boost at early times. One way to do that is to have two distinct modes of star formation: the one we think of as normal that persists to this day, and an additional mode of super-efficient star formation (SEFS) at play in the early universe. This way we retain the usual results while potentially giving us the extra boost that we need to explain the JWST data. We argue that this is the least implausible path to preserving LCDM. We’re trying to make it work, and anticipate the arguments Dr. Z would make.

This SESF mode of star formation needs to be very efficient indeed, as there are galaxies that appear to have converted essentially all of their available baryons into stars. Let’s pause to observe that this is pretty silly. Space is very empty; it is hard to get enough mass together to form stars at all: there’s good reason that it is inefficient locally! The early universe is a bit denser by virtue of being smaller; at z = 9 the expansion factor is only 1/(1+z) = 0.1 of what it is now, so the density is (1+z)³ = 1,000 times greater. ON AVERAGE. That’s not really a big boost when it comes to forming structures like stars since the initial condition was extraordinarily uniform. The lack of early structure by far outweighs the difference in density; that is precisely why we’re having a problem. Still, I can at least imagine that there are regions that experience a cascade of violent relaxation and SESF once some threshold in gas density is exceeded that differentiates the normal model of star formation from SESF. Why a threshold in the gas? Because there’s not anything obvious in the dark matter picture to distinguish the galaxies that result from one or the other mode. CDM itself is scale free, after all, so we have to imagine a scale set by baryons that funnels protogalaxies into one mode or the other. Why, physically, is there a particular gas density that makes that happen? That’s a great question.

There have been observational indications that local star formation is related to a gas surface density threshold, so maybe there’s another threshold that kicks it up another notch. That’s just a plausibility argument, but that’s the straw I’m clutching at to justify SESF as the least implausible option. We know there’s at least one way in which a surface density scale might matter to star formation.

Writing out the (1+z)³ argument for the density above tickled the memory that I’d seen something similar claimed elsewhere. Looking it up, indeed Boylan-Kolchin (2024) does this, getting an extra (1+z)³ [for a total of (1+z)⁶] by invoking a surface density Σ that follows from an acceleration scale g: Σ=g/(πG). Very MONDish, that. At any rate, the extra boost is claimed to lift a corner of dark matter halo parameter space into the realm of viability. So, sure. Why not make that step two.

However we do it, making stars super-efficiently is what the data appear to require – if we confine our consideration to the mass predicted by LCDM. It’s a way of covering the lack of mass with an surplus of stars. Any mechanism that makes stars more efficiently will boost the dotted lines in the M_*-z diagram above in the right direction. Do they map into the data (and the monolithic model) as needed? Unclear! All we’ve done so far is offer plausibility arguments that maybe it could be so, not demonstrate a model that works without fine-tuning that woulda coulda shoulda made the right prediction in the first place.

The ideas become less plausible from here.

Blame the IMF!

The next obvious idea after making more stars in total is to just make more of the high mass stars that produce UV photons. The IMF is a classic boogeyman to accomplish this. I discussed this briefly before, and it came up in a related discussion in which it was suggested that “in the end what will probably happen is that the IMF will be found to be highly redshift dependent.”

OK, so, first, what is the IMF? The Initial Mass Function is the spectrum of masses with which stars form: how many stars of each mass, ranging from the brown dwarf limit (0.08 M_☉) to the most massive stars formed (around 100 M_☉). The numbers of stars formed in any star forming event is a strong function of mass: low mass stars are common, high mass stars are rare. Here, though, is the rub: integrating over the whole population, low mass stars contain most of the mass, but high mass stars produce most of the light. This makes the conversion of mass to light quite sensitive to the IMF.

The number of UV photons produced by a stellar population is especially sensitive to the IMF as only the most massive and short-lived O and B stars produce them. This is low-hanging fruit for the desperate theorist: just a few more of those UV-bright, short-lived stars, please! If we adjust the IMF to produce more of these high mass stars, then they crank out lots more UV photons (which goes in the direction we need) but they don’t contribute much to the total mass. Better yet, they don’t live long. They’re like icicles as murder weapons in mystery stories: they do their damage then melt away, leaving no further evidence. (Strictly speaking that’s not true: they leave corpses in the form of neutron stars or stellar mass black holes, but those are practically invisible. They also explode as supernovae, boosting the production of metals, but the amount is uncertain enough to get away with murder.)

There is a good plausibility argument for a variable IMF. To form a star, gravity has to overcome gas pressure to induce collapse. Gas pressure depends on temperature, and interstellar gas can cool more efficiently when it contains some metals (here I mean metals in the astronomy sense, which is everything in the periodic table that’s not hydrogen or helium). It doesn’t take much; a little oxygen (one of the first products of supernova explosions) goes a long way to make cooling more efficient than a primordial gas composed of only hydrogen and helium. Consequently, low metallicity regions have higher gas temperatures, so it makes sense that gas clouds would need more gravity to collapse, leading to higher mass stars. The early universe started with zero metals, and it takes time for stars to make them and to return them to the interstellar medium, so voila: metallicity varies with time so the IMF varies with redshift.

This sound physical argument is simple enough to make that it can be done in a small part of a blog post. This has helped it persist in our collective astronomical awareness for many decades. Unfortunately, it appears to have bugger-all to do with reality.

If metalliticy plays a strong role in determining the IMF, we would expect to see it in stellar populations of different metallicity. We measure the IMF for solar metallicity stars in the solar neighborhood. Globular clusters are composed of stars formed shortly after the Big Bang and have low metallicities. So following this line of argument, we anticipate that they would have a different IMF. There is no evidence that this is the case. Still, we only really need to tweak the high-mass end of the IMF, and those stars died a long time ago, so maybe this argument applies for them if not for the long-lived, low-mass stars that we observe today.

In addition to counting individual stars, we can get a constraint on the galaxy-wide average IMF from the scatter in the Tully-Fisher relation. The physical relation depends on mass, but we rely on light to trace that. So if the IMF varies wildly from galaxy to galaxy, it will induce scatter in Tully-Fisher. This is not observed; the amount of intrinsic scatter that we see is consistent with that expected for stochastic variations in the star formation history for a fixed IMF. That’s a pretty strong constraint, as it doesn’t take much variation in the IMF to cause a lot of scatter that we don’t see. This constraint applies to entire galaxies, so it tolerates variations in the IMF in individual star forming events, but whatever is setting the IMF apparently tends to the same result when averaged over the many star forming events it takes to build a galaxy.

Variation in the IMF has come up repeatedly over the years because it provides so much convenient flexibility. Early in my career, it was commonly invoked to explain the variation in spectral hardness with metallicity. If one looks at the spectra of HII regions (interstellar gas ionized by hot young stars), there is a trend for lower metallicity HII regions to be ionized by hotter stars. The argument above was invoked: clearly the IMF tended to have more high mass stars in low metallicity environments. However, the light emitted by stars also depends on metallicity; low metallicity stars are bluer than their high metallicity equivalents because there are few UV absorption lines from iron in their atmospheres. Taking care to treat the stars and interstellar gas self-consistentlty and integrating over a fixed IMF, I showed that the observed variation in spectral hardness was entirely explained by the variation in metallicity. There didn’t need to be more high mass stars in low metallicity regions, the stars were just hotter because that’s what happens in low metallicity stars. (I didn’t set out to do this; I was just trying to calibrate an abundance indicator that I would need for my thesis.)

Another example where excess high mass stars were invoked was to explain the apparently high optical depth to the surface of last scattering reported by WMAP. If those words don’t mean anything to you, don’t worry – all it means is that a couple of decades ago, we thought we needed lots more UV photons at high redshift (z ~ 17) than CDM naturally provided. The solution was, you guessed it, an IMF rich in high mass stars. Indeed, this result launched a thousand papers on supermassive Population III stars that didn’t pan out for reasons that were easily anticipated at the time. Nowadays, analysis to the Planck data suggest a much lower optical depth than initially inferred by WMAP, but JWST is observing too many UV photons at high redshift to remain consistent with Plank. This apparent tension for LCDM is a natural consequence of early structure formation in MOND; indeed, it is another thing that was specifically predicted (see section 3.1 of McGaugh 2004).

I relate all these stories of encounters with variations in the high mass end of the IMF because they’ve never once panned out. Maybe this time will be different.

Stochastic Star Formation

What else can we think up? There’s always another possibility. It’s a big universe, after all.

One suggestion I haven’t discussed yet is that high redshift galaxies appear overly bright from stochastic fluctuations in their early star formation. This again invokes the dubious relation between stellar mass and UV light, but in a more subtle way than simply stocking the IMF with a bunch more high mass stars. Instead, it notes that the instantaneous star formation rate is stochastic. The massive stars that produces all the UV light are short-lived, so the number present will fluctuate up and down. Over time, this averages out, but there hasn’t been much time yet in the early universe. So maybe the high redshift galaxies that seem to be over-luminous are just those that happen to be near a peak in the ups and downs of star formation. Galaxies will be brightest and most noticeable in this peak phase, so the real mass is less than it appears – albeit there must be a lot of galaxies in the off phase for every one that we see in the on phase.

One expects a lot of scatter in the inferred stellar mass in the early universe due to stochastic variations in the star formation rate. As time goes on, these average out and the inferred stellar mass becomes steady. That’s pretty much what is observed (data). The data track the monolithic model (purple line) and sometimes exceed it in the early, stochastic phase. The data bear no resemblance to hierarchical LCDM (orange line).

This makes a lot of sense to me. Indeed, it should happen at some level, especially in the chaotic early universe. It is also what I infer to be going on to explain why some measurements scatter above the monolithic line. That is the baseline star formation history for this population, with some scatter up and down at early times. Simply scattering from the orange LCDM line isn’t going to look like the purple monolithic line. The shape is wrong and the amplitude difference is too great to overcome in this fashion.

What else?

I’m sure we’ll come up with something, but I think I’ve covered everything I’ve heard so far. Indeed, most of these possibilities are obvious enough that I thought them up myself and wrote about them in McGaugh et al (2024). I don’t see anything in the wide-ranging discussion at KITP that wasn’t already in my paper.

I note this because I want to point out that we are following a well-worn script. This is the part where I tick off all the possibilities for more complicated LCDM models and point out their shortcomings. I expect the same response:

That’s too long to read. Dr. Z says it works, so he must be right since we already know that LCDM is correct.
Triton Station, 8 February 2022

People will argue about which of these auxiliary hypotheses is preferable. MOND is not an auxiliary hypothesis, but an entirely different paradigm, so it won’t be part of the discussion. After some debate, one of the auxiliaries (SESF not IMF!) will be adopted as the “standard” picture. This will be repeated until it becomes familiar, and once it is familiar it will seem that it was always so, and then people will assert that there was never a problem, indeed, that we expected it all along. This self-gaslighting reminds me of Feynman’s warning:

The first principle is that you must not fool yourself and you are the easiest person to fool.
Richard Feynman

What is persistently lacking in the community is any willingness to acknowledge, let alone engage with, the deeper question of why we have to keep invoking ad hoc patches to somehow match what MOND correctly predicted a priori. The sociology of invoking arbitrary auxiliary hypotheses to make these sorts of excuses for LCDM has been so consistently on display for so long that I wrote this parody a year ago:

It always seems to come down to special pleading:

Please don’t falsify LCDM! I ran out of computer time. I had a disk crash. I didn’t have a grant for supercomputer time. My simulation data didn’t come back from the processing center. A senior colleague insisted on a rewrite. Someone stole my laptop. There was an earthquake, a terrible flood, locusts! It wasn’t my fault! I swear to God!

And the community loves LCDM, so we fall for it every time.

PS – to appreciate the paraphrased quotes here, you need to hear it as it would be spoken by the pictured actors. So if you do not instantly recognize this scene from the Blues Brothers, you need to correct this shortcoming in your cultural education to get the full effect of the reference.

Measuring the growth of the stellar mass of galaxies over cosmic time

This post continues the series summarizing our ApJ paper on high redshift galaxies. To keep it finite, I will focus here on the growth of stellar mass. The earlier post discussed what we expect in theory. This depends both on mass assembly (slow in LCDM, fast in MOND), how the assembled mass is converted into stars, and how those stars shine in light we can detect. We know a lot about stars and their evolution, so for this post I will assume we know how to convert a given star formation history into the evolution of the light it produces. There are of course caveats to that which we discuss in the paper, and perhaps will get to in a future post. It’s exhausting to be exhaustive, so not today, Satan.

The principle assumption we are obliged to make, at least to start, is that light traces mass. As mass assembles, some of it turns into stars, and those stars produce light. The astrophysics of stars and the light they produce is the same in any structure formation theory, so with this basic assumption, we can test the build-up of mass. In another post we will discuss some of the ways in which we might break this obvious assumption in order to save a favored theory. For now, we assume the obvious assumption holds, and what we see at high redshift provides a picture of how mass assembles.

Before JWST

This is not a new project; people have been doing it fo for decades. We like to think in terms of individual galaxies, but there are lots out there, so an important concept is the luminosity function, which describes the number of galaxies as a function of how bright they are. Here are some examples:

**Figure 3.** from Franck & McGaugh (2017) showing the number of galaxies as a function of their brightness in the 4.5 micron band of the Spitzer Space Telescope in candidate protoclusters from z = 2 to 6. Each panel notes the number of galaxies contributing to the Schechter luminosity function⁺ fit (gray bands), the apparent magnitude m* corresponding to the typical luminosity L*, and the redshift range. The magnitude m* is *characteristic* of how bright typical galaxies are at each redshift.

One reason to construct these luminosity functions is to quantify what is typical. Hundreds of galaxies inform each fit. The luminosity L* is representative of the typical galaxy, not just anecdotal individual examples. At each redshift, L* corresponds to an observed apparent magnitude m*, which we plot here:

**Figure 3** from McGaugh et al. (2024): The redshift dependence of the Spitzer [4.5] apparent magnitude m* of Schechter function fits to populations of galaxies in clusters and candidate protoclusters; each point represents *the characteristic* brightness of the galaxies in each cluster. The apparent brightness of galaxies gets fainter with increasing redshift because galaxies are more distant, with the amount they dim depending also on their evolution (lines). The purple line is the monolithic exponential model we discussed last time. The orange line is the prediction of the Millennium simulation (the state of the art at the time Jay Franck wrote his thesis) and the Munich galaxy formation model based on it. The open squares are the result of applying the same algorithm to the simulation as used on the data; this is what we would have observed if the universe looked like LCDM as depicted by the Munich model. The real universe does not look like that.

We plot faint to bright going up the y-axis; the numbers get smaller because of the backwards definition of the magnitude scale (which dates to ancient times in which the stars that appeared brightest to the human eye were “of the first magnitude,” then the next brightest of the second magnitude, and so on). The x-axis shows redshift. The top axis shows the corresponding age of the universe for vanilla LCDM parameters. Each point shows the apparent magnitude that is typical as informed by observations of dozens to hundreds of individual galaxies. Each galaxy has a spectroscopic redshift, which we made a requirement for inclusion in the sample. These are very accurate; no photometric redshifts are used to make the plot above.

One thing that impressed me when Jay made the initial version of this plot is how well the models match the evolution of m* at z < 2, which is most of cosmic time (the past ten billion years). This encourages one that the assumption adopted above, that we understand the evolution of stars well enough to do this, might actually be correct. I was, and remain, especially impressed with how well the monolithic model with a simple exponential star formation history matches these data. It’s as if the inferences the community had made about the evolution of giant elliptical galaxies from local observations were correct.

The new thing that Jay’s work showed was that the evolution of typical cluster galaxies at z > 2 persists in tracking the monolithic model that formed early (z_f = 10). There is a lot of scatter in the higher redshift data even though there is little at lower redshift. This is to be expected for both observational reasons – the data get rattier at larger distances – and theoretical ones: the exponential star formation history we assume is at best a crude average; at early times when short-lived but bright massive stars are present there will inevitably be stochastic variation around this trend. At later times the law of averages takes over and the scatter should settle down. That’s pretty much what we see.

What we don’t see is the decline in typical brightness predicted by contemporaneous LCDM models. The specific example shown is the Munich galaxy formation model based on the Millennium simulation. However, the prediction is generic: galaxies get faint at high redshift because they haven’t finished assembling yet. This is not a problem of misunderstanding stellar evolution, it is a failure of the hierarchical assembly paradigm.

In order to identify [proto]clusters at high redshift, Jay devised an algorithm to identify galaxies in close proximity on the sky and in redshift space, in excess of the average density around them. One question we had was whether the trend predicted by the LCDM model (the orange line above) would be reproduced in the data when analyzed in this way. To check, Jay made mock observations of a simulated lookback cone using the same algorithm. The results (not previously published) are the open squares in the plot above. These track the “right” answer known directly in the form of the orange line. Consequently, if the universe had looked as predicted, we could tell. It doesn’t.

The above plot is in terms of apparent magnitude. It is interesting to turn this into the corresponding stellar mass. There has also been work done on the subject after Jay’s, so I wanted to include it. An early version of a plot mapping m* to stellar mass and redshift to cosmic time that I came up with was this:

*The stellar mass of L* galaxies as a function of cosmic age. Data as noted in the inset. The purple/orange lines represent the monolithic/hierarchical models, as above.*

The more recent data (which also predate JWST) follow the same trend as the preceding data. All the data follow the path of the monolithic model. Note that the bulk of the stars are formed in situ in the first few billion years; the stellar mass barely changes after that. There is quite a bit of stellar evolution during this time, which is why m* in the figure above changes in a complicated fashion while the stellar mass remains constant. This again provides some encouragement that we understand how to model stellar populations.

The data in the first billion years are not entirely self-consistent. For example, the yellow points are rather higher in mass than the cyan points. This difference is not one in population modeling, but rather in how much of a correction is made for non-stellar, nebular emission. So as not to go down that rabbit hole, I chose to adopt the lowest stellar mass estimates for the figure that appears in the paper (below). Note that this is the most conservative choice; I’m trying to be as favorable to LCDM as is reasonably plausible.

**Figure 4** from McGaugh et al. (2024): *The characteristic stellar mass as a function of time with the corresponding redshift noted at the top.*

There were more recent models as well as more recent data, so I wanted to include those. There are, in fact, way too many models to illustrate without creating a confusing forest of lines, so in the end I chose a couple of popular ones, Illustris and FIRE. Illustris is the descendant of Millennium, and shows identical behavior. FIRE has a different scheme for forming stars, and does so more rapidly than Illustris. However, its predictions still fall well short of the data. This is because both simulations share the same LCDM cosmology with the same merger tree assembly of structure. Assembling the mass promptly enough is the problem; it isn’t simply a matter of making stars faster.

I’ll show one more version of this plot to illustrate the predicted evolutionary trajectories. In the plots above, I only show models that end up with the mass of a typical local giant elliptical. Galaxies come in a variety of masses, so what does that look like?

*The stellar mass of galaxies as a function of cosmic age. Data as above. The orange lines represent the hierarchical models that result in different final masses at z = 0.*

The curves of stellar growth predicted by LCDM have pretty much the same shape, just different amplitude. The most massive case illustrated above is reasonable insofar as there are real galaxies that massive, but they are rare. They are also rare in simulations, which make the predicted curve a bit jagged as there aren’t enough examples to define a smooth trajectory as there are for lower mass objects. More importantly, the shape is wrong. One can imagine that the galaxies we see at high redshift are abnormally massive, but even the most massive galaxies don’t start out that big at high redshift. Moreover, they continue to grow hierarchically in LCDM, so they wind up too big. In contrast, the data look like the monolithic model that we made on a lark, no muss, no fuss, no need to adjust anything.

This really shouldn’t have come as a surprise. We already knew that galaxies were impossibly massive at z ~ 4 before JWST discovered that this was also true at z ~ 10. The a priori prediction that LCDM has made since its inception (earlier models show the same thing) fails. More recent models fail, though I have faith that they will eventually succeed. This is the path theorists has always taken, and the obvious path here, as I remarked previously, is to make star formation (or at least light production) artificially more efficient so that the hierarchical model looks like the monolithic model. For completeness, I indulge in this myself in the paper (section 6.3) as an exercise in what it takes to save the phenomenon.

A two year delay

Regular readers of this blog will recall that in addition to the predictions I emphasized when JWST was launched, I also made a number of posts about the JWST results as they started to come in back in 2022. I had also prepared the above as a science paper that is now sections 1 to 3 of McGaugh et al. (2024). The idea was to have it ready to go so I could add a brief section on the new JWST results and submit right away – back in 2022. The early results were much as expected, but I did not rush to publish. Instead, it has taken over two years since then to complete what turned into a much longer manuscript. There are many reasons for this, but the scientific reason is that I didn’t believe many of the initial reports.

JWST was new and exciting and people fell all over themselves to publish things quickly. Too quickly. To do so, they relied on a calibration of the telescope plus detector system made while it was on the ground prior to launch. This is not the same as calibrating it on the sky, which is essential but takes some time. Consequently, some of the initial estimates were off.

Stellar masses and redshifts of galaxies from Labbe et al. The pink squares are the initial estimates that appeared in their first preprint in July 2022. The black squares with error bars are from the version published in February 2023. The shaded regions represent where galaxies are too massive too early for LCDM. The lighter region is where galaxies shouldn’t exist; the darker region is a where they cannot exist.

In the example above, all of the galaxies had both their initial mass and redshift estimates change with the updated calibration. So I was right to be skeptical, and wait for an improved analysis. I was also right that while some cases would change, the basic interpretation would not. All that happened in the example above was that the galaxies moved from the “can’t exist in LCDM” region (dark blue) into the “really shouldn’t exist in LCDM” region (light blue). However, the widespread impression was that we couldn’t trust photometric redshifts at all, so I didn’t see what new I could justifiably add in 2022. This was, after all, the attitude Jay and I had taken in his CCPC survey where we required spectroscopic redshifts.

So I held off. But then it became impossible to keep up with the fire hose of data that ensued. Every time I got the chance to update the manuscript, I found some interesting new result had been published that I had to include. New things were being discovered faster than I could read the literature. I found myself stuck in the Red Queen’s dilemma, running as fast as possible just to stay in place.

Ultimately, I think the delay was worthwhile. Lots new was learned, and actual spectroscopic redshifts began to appear. (Spectroscopy takes more telescope time than photometry – spreading out the light reduces the signal-to-noise per pixel, necessitating longer exposure times, so it always lags behind. One also discovers the galaxies in the same images that are used for photometry, so it also gets a head start.) Consequently, there is a lot more in the paper than I had planned on. This is another long blog post, so I will end it where I had planned for the original paper to end, with the updated version of the plot above.

Massive galaxies at high redshift from JWST

The stellar masses of galaxies discovered by JWST as a function of redshift is shown below. Unlike most of the plots above, these are individual galaxies rather than typical L* galaxies. Many are based on photometric redshifts, but those in solid black have spectroscopic redshifts. There are many galaxies that reside in a region they should not, at least according to LCDM models: their mass is too large at the observed redshift.

**Figure 6** from McGaugh et al. (2024): Mass estimates for high-redshift galaxies from JWST. Colored points based on photometric redshifts are from Adams et al. (2023; dark blue triangles), Atek et al. (2023; green circles), Labbé et al. (2023; open squares), Naidu et al. (2022; open star), Harikane et al. (2023; yellow diamonds), Casey et al. (2024; light blue left-pointing triangles), and Robertson et al. (2024; orange right-pointing triangles). Black points from Wang et al. (2023; squares), Carniani et al. (2024; triangles), Harikane et al. (2024; circles) and Castellano et al. (2024; star) have spectroscopic redshifts. The upper limit for the most massive galaxy in TNG100 (Springel et al. 2018) as assessed by Keller et al. (2023) is shown by the light blue line. This is consistent with the maximum stellar mass expected from the stellar mass–halo mass relation of Behroozi et al. (2020; solid blue line). These merge smoothly into the trend predicted by Yung et al. (2019b) for galaxies with a space density of 10⁻⁵ dex⁻¹ Mpc⁻³ (dashed blue line), though L. Yung et al. (2023) have revised this upward by ∼0.4 dex (dotted blue line). This closely follows the most massive objects in TNG300 (Pillepich et al. 2018; red line). The light gray region represents the parameter space in which galaxies were not expected in LCDM. The dark gray area is excluded by the limit on the available baryon mass (Behroozi & Silk 2018; Boylan-Kolchin 2023). [Note added: I copied this from the caption in our paper, but the links all seem to go to that rather than to each of the cited papers. You can get to them from our reference list if you want, but it’ll take some extra clicks. It looks like AAS has set it up this way to combat trawling by bots.]

One can see what I mean about a fire hose of results from the number of references given here. Despite the challenges of keeping track of all this, I take heart in the fact that many different groups are finding similar results. Even the results that were initially wrong remain problematic for LCDM. Despite all the masses and redshifts changing when the calibration was updated, the bulk of the data (the white squares, which are the black squares in the preceding plot) remain in the problematic region. The same result is replicated many times over by others.

The challenge, as usual, is assessing what LCDM actually predicts. The entire region of this plot is well away from the region predicted for typical galaxies. To reside here, a galaxy must be an outlier. But how extreme an outlier?

The dark gray region is the no-go zone. This is where dark matter halos do not have enough baryons to make the observed mass of stars. It should be impossible for galaxies to be here. I can think of ways to get around this, but that’s material for a future post. For now, it suffices to know that there should be no galaxies in the dark gray region. Indeed, there are not. A few straddle the edge, but nothing is definitively in that region given the uncertainties. So LCDM is not outright falsified by these data. This bar is set very low, as the galaxies that do skirt the edge require that basically all of the available baryons have been converted into starts practically instantaneously. This is not a reasonable.

*Not with ten thousand simulations could you do this.*

So what is a reasonable expectation for this diagram? That’s hard to say, but that’s what the white and light gray region attempts to depict. Galaxies might plausibly be in the white region but should not be in the light gray region for any sensible star formation efficiency.

One problem with this statement is that it isn’t clear what a sensible star formation efficiency is. We have a good idea of what it needs to be, on average, at low redshift. There is no clear indication that it changes as a function of redshift – at least until we hit results like this. Then we have to be on guard for confirmation bias in which we simply make the star formation efficiency be what we need it to be. (This is essentially what I advocate as the least unreasonable option in section 6.3 of the ApJ paper.)

OK, but what should the limit be? Keller et al. (2023) made a meta-analysis of the available simulations; I have used his analysis and my own reading of the literature to establish the lower boundary of the light gray area. It is conceivable that you would get the occasional galaxy this massive (the white region is OK), but not more so (the light gray region is not OK). The boundary is the most extreme galaxy in each simulation, so as far from typical as possible. The light gray region is really not OK; the only question is where exactly it sets in.

The exact location of this boundary is not easy to define. Different simulations give different answers for different reasons. These are extremal statistics; we’re asking what the one most massive galaxy is in an entire simulation. Higher resolution simulations perceive the formation of small structures like galaxies sooner, but large simulations have more opportunity for extreme events to happen. Which “wins” in terms of making the rare big galaxy early is a competition between these effects that appears, in my reading, to depend on details of simulation implementation that are unlikely to be representative of physical reality (even assuming LCDM is the correct underlying physics).

To make my own assessment, I reviewed the accessible simulations (they don’t all provide the necessary information) to fine the very most massive simulated galaxy as a function of redshift. As ever, I am looking for the case that is most favorable to LCDM. The version I found comes from the large-box, next generation Illustris simulation TNG300. This is the red line a bit into the gray area above. Galaxies really, really should not exist above or to the right of that line. Not only have I adopted the most generous simulation estimate I could find, I have also chosen not to normalize to the area surveyed by JWST. One should do this, but the area so far surveyed is tiny, so the line slides down. Even if galaxies as massive as this exist in TNG300, we have to have been really lucky to point JWST at that spot on a first go. So the red line is doubly generous, and yet there are still galaxies that exceed this limit.

The bottom line is that yes, JWST data pose a real problem for LCDM. It has been amusing watching this break people’s brains. I’ve seen papers that say this is a problem for LCDM because you’d have to turn more than half of the available baryons into stars and that’s crazy talk, and others that say LCDM is absolutely OK because there are enough baryons. The observational result is the same – galaxies with very high stellar-to-dark halo mass ratios, but the interpretation appears to be different because one group of authors is treating the light gray region as forbidden while the other sets the bar at the dark gray region. So the difference in interpretation is not a conflict in the data, but an inconsistency in what [we think] LCDM predicts.

That’s enough for today. Galaxy data at high redshift are clearly in conflict with the a priori predictions of LCDM. This was true before JWST, and remains true with JWST. Whether the observations can be reconciled with LCDM I leave as an exercise for scientists in the field, or at least until another post.

⁺A minor technical note: the Schechter function is widely used to describe the luminosity function of galaxies, so it provides a common language with which to quantify both their characteristic luminosity L* and space density Φ*. I make use of it here to quantify the brightness of the typical galaxy. It is, of course, not perfect. As we go from low to high redshift, the luminosity function becomes less Schechter-like and more power law-like, an evolution that you can see in Jay Franck’s plot. We chose to use Schechter fits for consistency with the previous work of Mancone et al. (2010) and Wylezalek et al. (2014), and also to down-weight the influence of the few very bright galaxies should they be active galactic nuclei or some other form of contaminant. Long story short, plausible contaminants (no photometric redshifts were used; sample galaxies all have spectroscopic redshifts) cannot explain the bulk of the data; our estimates of m* are robust and, if anything, underestimate how bright galaxies typically are.

On the timescale for galaxy formation

I’ve been wanting to expand on the previous post ever since I wrote it, which is over a month ago now. It has been a busy end to the semester. Plus, there’s a lot to say – nothing that hasn’t been said before, somewhere, somehow, yet still a lot to cobble together into a coherent story – if that’s even possible. This will be a long post, and there will be more after to narrate the story of our big paper in the ApJ. My sole ambition here is to express the predictions of galaxy formation theory in LCDM and MOND in the broadest strokes.

A theory is only as good as its prior. We can always fudge things after the fact, so what matters most is what we predict in advance. What do we expect for the timescale of galaxy formation? To tell you what I’m going to tell you, it takes a long time to build a massive galaxy in LCDM, but it happens much faster in MOND.

Basic Considerations

What does it take to make a galaxy? A typical giant elliptical galaxy has a stellar mass of 9 x 10¹⁰ M_☉. That’s a bit more than our own Milky Way, which has a stellar mass of 5 or 6 x 10¹⁰ M_☉ (depending who you ask) with another 10¹⁰ M_☉ or so in gas. So, in classic astronomy/cosmology style, let’s round off and say a big galaxy is about 10¹¹ M_☉. That’s a hundred billion stars, give or take.

How much of the universe does it take to make one big galaxy? The critical density of the universe is the over/under point for whether an expanding universe expands forever, or has enough self-gravity to halt the expansion and ultimately recollapse. Numerically, this quantity is ρ_crit = 3H₀²/(8πG), which for H₀ = 73 km/s/Mpc works out to 10^-29 g/cm³ or 1.5 x 10^-7 M_☉/pc³. This is a very small number, but provides the benchmark against which we measure densities in cosmology. The density of any substance X is Ω_X = ρ_X/ρ_crit. The stars and gas in galaxies are made of baryons, and we know the baryon density pretty well from Big Bang Nucleosynthesis: Ω_b = 0.04. That means the average density of normal matter is very low, only about 4 x 10^-31 g/cm³. That’s less than one hydrogen atom per cubic meter – most of space is an excellent vacuum!

This being the case, we need to scoop up a large volume to make a big galaxy. Going through the math, to gather up enough mass to make a 10¹¹ M_☉ galaxy, we need a sphere with a radius of 1.6 Mpc. That’s in today’s universe; in the past the universe was denser by (1+z)³, so at z = 10 that’s “only” 140 kpc. Still, modern galaxies are much smaller than that; the effective edge of the disk of the Milky Way is at a radius of about 20 kpc, and most of the baryonic mass is concentrated well inside that: the typical half-light radius of a 10¹¹ M_☉ galaxy is around 6 kpc. That’s a long way to collapse.

Monolithic Galaxy Formation

Given this much information, an early concept was monolithic galaxy formation. We have a big ball of gas in the early universe that collapses to form a galaxy. Why and how this got started was fuzzy. But we knew how much mass we needed and the volume it had to come from, so we can consider what happens as the gas collapses to create a galaxy.

Here we hit a big astrophysical reality check. Just how does the gas collapse? It has to dissipate energy to do so, and cool to form stars. Once stars form, they may feed energy back into the surrounding gas, reheating it and potentially preventing the formation of more stars. These processes are nontrivial to compute ab initio, and attempting to do so obsesses much of the community. We don’t agree on how these things work, so they are the knobs theorists can turn to change an answer they don’t like.

Even if we don’t understand star formation in detail, we do observe that stars have formed, and can estimate how many. Moreover, we do understand pretty well how stars evolve once formed. Hence a common approach is to build stellar population models with some prescribed star formation history and see what works. Spiral galaxies like the Milky Way formed a lot of stars in the past, and continue to do so today. To make 5 x 10¹⁰ M_☉ of stars in 13 Gyr requires an average star formation rate of 4 M_☉/yr. The current measured star formation rate of the Milky Way is estimated to be 2 ± 0.7 M_☉/yr, so the star formation rate has been nearly constant (averaging over stochastic variations) over time, perhaps with a gradual decline. Giant elliptical galaxies, in contrast, are “red and dead”: they have no current star formation and appear to have made most of their stars long ago. Rather than a roughly constant rate of star formation, they peaked early and declined rapidly. The cessation of star formation is also called quenching.

A common way to formulate the star formation rate in galaxies as a whole is the exponential star formation rate, SFR(t) = SFR₀ e^-t/τ. A spiral galaxy has a low baseline star formation rate SFR₀ and a long burn time τ ~ 10 Gyr while an elliptical galaxy has a high initial star formation rate and a short e-folding time like τ ~ 1 Gyr. Many variations on this theme are possible, and are of great interest astronomically, but this basic distinction suffices for our discussion here. From the perspective of the observed mass and stellar populations of local galaxies, the standard picture for a giant elliptical was a large, monolithic island universe that formed the vast majority of its stars early on then quenched with a short e-folding timescale.

Galaxies as Island Universes

The density parameter Ω provides another useful way to think about galaxy formation. As cosmologists, we obsess about the global value of Ω because it determines the expansion history and ultimate fate of the universe. Here it has a more modest application. We can think of the region in the early universe that will ultimately become a galaxy as its own little closed universe. With a density parameter Ω > 1, it is destined to recollapse.

A fun and funny fact of the Friedmann equation is that the matter density parameter Ω_m → 1 at early times, so the early universe when galaxies form is matter dominated. It is also very uniform (more on that below). So any subset that is a bit more dense than average will have Ω > 1 just because the average is very close to Ω = 1. We can then treat this region as its own little universe (a “top-hat overdensity”) and use the Friedmann equation to solve for its evolution, as in this sketch:

*The expansion of the early universe a(t) (blue line). A locally overdense region may behave as a closed universe, recollapsing in a finite time (red line) to potentially form a galaxy.*

That’s great, right? We have a simple, analytic solution derived from first principles that explains how a galaxy forms. We can plug in the numbers to find how long it takes to form our basic, big 10¹¹ M_☉ galaxy and… immediately encounter a problem. We need to know how overdense our protogalaxy starts out. Is its effective initial Ω_m = 2? 10? What value, at what time? The higher it is, the faster the evolution from initially expanding along with the rest of the universe to decoupling from the Hubble flow to collapsing. We know the math but we still need to know the initial condition.

Annoying Initial Conditions

The initial condition for galaxy formation is observed in the cosmic microwave background (CMB) at z = 1090. Where today’s universe is remarkably lumpy, the early universe is incredibly uniform. It is so smooth that it is homogeneous and isotropic to one part in a hundred thousand. This is annoyingly smooth, in fact. It would help to have some lumps – primordial seeds with Ω > 1 – from which structure can grow. The observed seeds are too tiny; the typical initial amplitude is 10^-5 so Ω_m = 1.00001. That takes forever to decouple and recollapse; it hasn’t yet had time to happen.

The cosmic microwave background as observed by ESA’s Planck satellite. This is an all-sky picture of the relic radiation field – essentially a snapshot of the universe when it was just a few hundred thousand years old. The variations in color are variations in temperature which correspond to variations in density. These variations are tiny, only about one part in 100,000. The early universe was very uniform; the real picture is a boring blank grayscale. We have to crank the contrast way up to see these minute variations.

We would like to know how the big galaxies of today – enormous agglomerations of stars and gas and dust separated by inconceivably vast distances – came to be. How can this happen starting from such homogeneous initial conditions, where all the mass is equally distributed? Gravity is an attractive force that makes the rich get richer, so it will grow the slight initial differences in density, but it is also weak and slow to act. A basic result in gravitational perturbation theory is that overdensities grow at the same rate the universe expands, which is inversely related to redshift. So if we see tiny fluctuations in density with amplitude 10^-5 at z = 1000, they should have only grown by a factor of 1000 and still be small today (10^-2 at z = 0). But we see structures of much higher contrast than that. You can’t here from there.

The rich large scale structure we see today is impossible starting from the smooth observed initial conditions. Yet here we are, so we have to do something to goose the process. This is one of the original motivations for invoking cold dark matter (CDM). If there is a substance that does not interact with photons, it can start to clump up early without leaving too large a mark on the relic radiation field. In effect, the initial fluctuations in mass are larger, just in the invisible substance. (That’s not to say the CDM doesn’t leave a mark on the CMB; it does, but it is subtle and entirely another story.) So the idea is that dark matter forms gravitational structures first, and the baryons fall in later to make galaxies.

An illustration of the the linear growth of overdensities. Structure can grow in the dark matter (long dashed lines) with the baryons catching up only after decoupling (short dashed line). In effect, the dark matter gives structure formation a head start, nicely explaining the apparently impossible growth factor. This has been standard picture for what seems like forever (illustration from Schramm 1992).

With the right amount of CDM – and it has to be just the right amount of a dynamically cold form of non-baryonic dark matter (stuff we still don’t know actually exists) – we can explain how the growth factor is 10⁵ since recombination instead of a mere 10³. The dark matter got a head start over the stuff we can see; it looks like 10⁵ because the normal matter lagged behind, being entangled with the radiation field in a way the dark matter was not.

This has been the imperative need in structure formation theory for so long that it has become undisputed lore; an element of the belief system so deeply embedded that it is practically impossible to question. I risk getting ahead of the story, but it is important to point out that, like the interpretation of so much of the relevant astrophysical data, this belief assumes that gravity is normal. This assumption dictates the growth rate of structure, which in turn dictates the need to invoke CDM to allow structure to form in the available time. If we drop this assumption, then we have to work out what happens in each and every alternative that we might consider. That definitely gets ahead of the story, so first let’s understand what we should expect in LCDM.

Hierarchical Galaxy formation in LCDM

LCDM predicts some things remarkably well but others not so much. The dark matter is well-behaved, responding only to gravity. Baryons, on the other hand, are messy – one has to worry about hydrodynamics in the gas, star formation, feedback, dust, and probably even magnetic fields. In a nutshell, LCDM simulations are very good at predicting the assembly of dark mass, but converting that into observational predictions relies on our incomplete knowledge of messy astrophysics. We know what the mass should be doing, but we don’t know so well how that translates to what we see. Mass good, light bad.

Starting with the assembly of mass, the first thing we learn is that the story of monolithic galaxy formation outlined above has to be wrong. Early density fluctuations start out tiny, even in dark matter. God didn’t plunk down island universes of galaxy mass then say “let there be galaxies!” The annoying initial conditions mean that little dark matter halos form first. These subsequently merge hierarchically to make ever bigger halos. Rather than top-down monolithic galaxy formation, we have the bottom-up hierarchical formation of dark matter halos.

The hierarchical agglomeration of dark matter halos into ever larger objects is often depicted as a merger tree. Here are four examples from the high resolution Illustris TNG50 simulation (Pillepich et al. 2019; Nelson et al. 2019).

Examples of merger trees from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019). Objects have been selected to have very nearly the same stellar mass at z=0. Mass is built up through a series of mergers. One large dark matter halo today (at top) has many antecedents (small halos at bottom). These merge hierarchically as illustrated by the connecting lines. *The size of the symbol is proportional to the halo mass.* I have added redshift *and the corresponding age of the universe for vanilla LCDM* in a more legible font. *The color bar illustrates the specific star formation rate*: the top row has objects that are still actively star forming like spirals; those in the bottom row are “red and dead” – things that have stopped forming stars, like giant elliptical galaxies. In all cases, there is a lot of merging and a modest rate of growth, with the typical object taking about half a Hubble time (~7 Gyr) to assemble half of its final stellar mass.

The hierarchical assembly of mass is generic in CDM. Indeed, it is one of its most robust predictions. Dark matter halos start small, and grow larger by a succession of many mergers. This gradual agglomeration is slow: note how tiny the dark matter halos at z = 10 are.

Strictly speaking, it isn’t even meaningful to talk about a single galaxy over the span of a Hubble time. It is hard to avoid this mental trap: surely the Milky Way has always been the Milky Way? so one imagines its evolution over time. This is monolithic thinking. Hierarchically, “the galaxy” refers at best to the largest progenitor, the object that traces the left edge of the merger trees above. But the other protogalactic chunks that eventually merge together are as much part of the final galaxy as the progenitor that happens to be largest.

This complicated picture is complicated further by what we can see being stars, not mass. The luminosity we observe forms through a combination of in situ growth (star formation in the largest progenitor) and ex situ growth through merging. There is no reason for some preferred set of protogalaxies to form stars faster than the others (though of course there is some scatter about the mean), so presumably the light traces the mass of stars formed traces the underlying dark mass. Presumably.

That we should see lots of little protogalaxies at high redshift is nicely illustrated by this lookback cone from Yung et al (2022). Here the color and size of each point corresponds to the stellar mass. Massive objects are common at low redshift but become progressively rare at high redshift, petering out at z > 4 and basically absent at z = 10. This realization of the observable stellar mass tracks the assembly of dark mass seen in merger trees.

This is what we expect to see in LCDM: lots of small protogalaxies at high redshift; the building blocks of later galaxies that had not yet merged. The observation of galaxies much brighter than this at high redshift by JWST poses a fundamental challenge to the paradigm: mass appears not to be subdivided as expected. So it is entirely justifiable that people have been freaking out that what we see are bright galaxies that are apparently already massive. That shouldn’t happen; it wasn’t predicted to happen; how can this be happening?

That’s all background that is assumed knowledge for our ApJ paper, so we’re only now getting to its Figure 1. This combines one of the merger trees above with its stellar mass evolution. The left panel shows the assembly of dark mass; the right pane shows the growth of stellar mass in the largest progenitor. This is what we expect to see in observations.

**Fig. 1** from McGaugh et al (2024): A merger tree for a model galaxy from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019, left panel) selected to have M_∗ ≈ 9 × 10¹⁰ M_⊙ at z = 0; i.e., the stellar mass of a local L^∗ giant elliptical galaxy (Driver et al. 2022). Mass assembles hierarchically, starting from small halos at high redshift (bottom edge) with the largest progenitor traced along the left of edge of the merger tree. The growth of stellar mass of the largest progenitor is shown in the right panel. This example (jagged line) is close to the median (dashed line) of comparable mass objects (Rodriguez-Gomez et al. 2016), and within the range of the scatter (the shaded band shows the 16th – 84th percentiles). A monolithic model that forms at z_f = 10 and evolves with an exponentially declining star formation rate with τ = 1 Gyr (purple line) is shown for comparison. The latter model forms most of its stars earlier than occurs in the simulation.

For comparison, we also show the stellar mass growth of a monolithic model for a giant elliptical galaxy. This is the classic picture we had for such galaxies before we realized that galaxy formation had to be hierarchical. This particular monolithic model forms at z_f = 10 and follows an exponential star formation rate with τ = 1 Gyr. It is one of the models published by Franck & McGaugh (2017). It is, in fact, the first model I asked Jay to construct when he started the project. Not because we expected it to best describe the data, as it turns out to do, but because the simple exponential model is a touchstone of stellar population modeling. It was a starter model: do this basic thing first to make sure you’re doing it right. We chose τ = 1 Gyr because that was the typical number bandied about for elliptical galaxies, and z_f = 10 because that seemed ridiculously early for a massive galaxy to form. At the time we built the model, it was ludicrously early to imagine a massive galaxy would form, from an LCDM perspective. A formation redshift z_f = 10 was, less than a decade ago, practically indistinguishable from the beginning of time, so we expected it to provide a limit that the data would not possibly approach.

In a remarkably short period, JWST has transformed z = 10 from inconceivable to run of the mill. I’m not going to go into the data yet – this all-theory post is already a lot – but to offer one spoiler: the data are consistent with this monolithic model. If we want to “fix” LCDM, we have to make the red line into the purple line for enough objects to explain the data. That proves to be challenging. But that’s moving the goalposts; the prediction was that we should see little protogalaxies at high redshift, not massive, monolith-style objects. Just look at the merger trees at z = 10!

Accelerated Structure Formation in MOND

In order to address these issues in MOND, we have to go back to the beginning. What is the evolution of a spherical region (a top-hat overdensity) that might collapse to form a galaxy? How does a spherical region under the influence of MOND evolve within an expanding universe?

The solution to this problem was first found by Felten (1984), who was trying to play the Newtonian cosmology trick in MOND. In conventional dynamics, one can solve the equation of motion for a point on the surface of a uniform sphere that is initially expanding and recover the essence of the Friedmann equation. It was reasonable to check if cosmology might be that simple in MOND. It was not. The appearance of a₀ as a physical scale makes the solution scale-dependent: there is no general solution that one can imagine applies to the universe as a whole.

Felten reasonably saw this as a failure. There were, however, some appealing aspects of his solution. For one, there was no such thing as a critical density. All MOND universes would eventually recollapse irrespective of their density (in the absence of the repulsion provided by a cosmological constant). It could take a very long time, which depended on the density, but the ultimate fate was always the same. There was no special value of Ω, and hence no flatness problem. The latter obsessed people at the time, so I’m somewhat surprised that no one seems to have made this connection. Too soon*, I guess.

There it sat for many years, an obscure solution for an obscure theory to which no one gave credence. When I became interested in the problem a decade later, I started methodically checking all the classic results. I was surprised to find how many things we needed dark matter to explain were just as well (or better) explained by MOND. My exact quote was “surprised the bejeepers out of us.” So, what about galaxy formation?

I started with the top-hat overdensity, and had the epiphany that Felten had already obtained the solution. He had been trying to solve all of cosmology, which didn’t work. But he had solved the evolution of a spherical region that starts out expanding with the rest of the universe but subsequently collapses under the influence of MOND. The overdensity didn’t need to be large, it just needed to be in the low acceleration regime. Something like the red cycloidal line in the second plot above could happen in a finite time. But how much?

The solution depends on scale and needs to be solved numerically. I am not the greatest programmer, and I had a lot else on my plate at the time. I was in no rush, as I figured I was the only one working on it. This is usually a good assumption with MOND, but not in this case. Bob Sanders had had the same epiphany around the same time, which I discovered when I received his manuscript to referee. So all credit is due to Bob: he said these things first.

First, he noted that galaxy formation in MOND is still hierarchical. Small things form first. Crudely speaking, structure formation is very similar to the conventional case, but now the goose comes from the change in the force law rather than extra dark mass. MOND is nonlinear, so the whole process gets accelerated. To compare with the linear growth of CDM:

A sketch of how structures grow over time under the influence of cold dark matter (left, from Schramm 1992, same as above) and MOND (right, from Sanders & McGaugh 2002; see also this further discussion and previous post). The slow linear growth of CDM (long-dashed line, left panel) is replaced by a rapid, nonlinear growth in MOND (solid lines at right; numbers correspond to different scales). Nonlinear growth moderates after cosmic expansion begins to accelerate (dashed vertical line in right panel).

The net effect is the same. A cosmic web of large scale structure emerges. They look qualitatively similar, but everything happens faster in MOND. This is why observations have persistently revealed structures that are more massive and were in place earlier than expected in contemporaneous LCDM models.

*Simulated structure formation in ΛCDM (top) and MOND (bottom) showing the more rapid emergence of similar structures in MOND (note the redshift of each panel). From McGaugh (2015).*

In MOND, small objects like globular clusters form first, but galaxies of a range of masses all collapse on a relatively short cosmic timescale. How short? Let’s consider our typical 10¹¹ M_☉ galaxy. Solving Felten’s equation for the evolution of a sphere numerically, peak expansion is reached after 300 Myr and collapse happens in a similar time. The whole galaxy is in place speedy quick, and the initial conditions don’t really matter: a uniform, initially expanding sphere in the low acceleration regime will behave this way. From our distant vantage point thirteen billion years later, the whole process looks almost monolithic (the purple line above) even though it is a chaotic hierarchical mess for the first few hundred million years (z > 14). In particular, it is easy to form half of the stellar mass early on: the mass is already assembled.

This is what JWST sees: galaxies that are already massive when the universe is just half a billion years old. I’m sure I should say more but I’m exhausted now and you may be too, so I’m gonna stop here by noting that in 1998, when Bob Sanders predicted that “Objects of galaxy mass are the first virialized objects to form (by z=10),” the contemporaneous prediction of LCDM was that “present-day disc [galaxies] were assembled recently (at z<=1)” and “there is nothing above redshift 7.” One of these predictions has been realized. It is rare in science that such a clear a priori prediction comes true, let alone one that seemed so unreasonable at the time, and which took a quarter century to corroborate.

*I am not quite this old: I was still an undergraduate in 1984. I hadn’t even decided to be an astronomer at that point; I certainly hadn’t started following the literature. The first time I heard of MOND was in a graduate course taught by Doug Richstone in 1988. He only mentioned it in passing while talking about dark matter, writing the equation on the board and saying maybe it could be this. I recall staring at it for a long few seconds, then shaking my head and muttering “no way.” I then completely forgot about it, not thinking about it again until it came up in our data for low surface brightness galaxies. I expect most other professionals have the same initial reaction, which is fair. The test of character comes when it crops up in their data, as it is doing now for the high redshift galaxy community.

Massive Galaxies at High Redshift: we told you so

I was raised to believe that it was rude to tell people I told you so. Yet that’s pretty much the essence of the scientific method: we test hypotheses by making predictions, then checking to see which told us the correct result in advance of the experiment. So: I told you so.

Our paper on massive galaxies at high redshift is out in the Astrophysical Journal today. This is a scientific analysis of the JWST data that has accumulated to date as it pertains to testing galaxy formation as hypothesized by LCDM and MOND. That massive galaxies are observed to form early (z > 10) corroborates the long standing prediction of MOND, going back to Sanders (1998):

Objects of galaxy mass are the first virialized objects to form (by z=10), and larger structure develops rapidly

The contemporaneous LCDM prediction from Mo, Mao, & White (1998) – a touchstone of galaxy formation theory with nearly 2,000 citations – was

present-day disc [galaxies] were assembled recently (at z<=1).

This is not what JWST sees, as morphologically mature spiral galaxies are present to at least z = 6 (Ferreira et al 2024). More generally, LCDM was predicted to take a long time to build up the stellar mass of large galaxies, with the median time to reach half the final stellar mass being about half a Hubble time (seven billion years, give or take). In contrast, JWST has now observed many galaxies that meet this benchmark in the first billion years. That was not expected to happen.

In short, one theory got its prediction right, and the other got it wrong. I say expected, because we can always attempt to modify a theory to accommodate new facts. The a priori predictions of LCDM were wrong, but can it be adjusted to explain the data? Perhaps – but if so, that’s because it is incredibly flexible. That’s normally considered to be a bad thing in a theory, not a strength, especially when a competing theory got it right in the first place.

This has happened over and over and over again. After the initial shock of having MOND’s predictions come true in my own data (how can this be so?), I’ve spent the decades since devising and executing new tests of both theories. When it comes to making a priori predictions, MOND has won over and over again. It has consistently had more predictive success.

If you are a scientist reading this and that statement doesn’t sound right to you, that’s because you haven’t been paying attention. I get it: MOND seems too unlikely to pay attention to. I certainly didn’t before it reared its head in my own data. So ask yourself: what do you actually know about MOND? IT’S WRONG! OK, after that. Seriously: how many papers have you read about MOND? Do you know what its predictions are? Do you know what its successes are, or only just its failings? Can you write down its formula? If the answers to these questions do not come easily to you, it’s because you haven’t taken it seriously. Which, again, I get. But it is also an indication that you may not be playing with a complete set of facts. Ignorance is not a strong position from which to make scientific judgements.

I will expand more on the content of the science paper in future posts. For now, it boils down to I told you so.

You can also read more in SciNews, Newsweek, and the most in-depth article so far, in Courthouse News.

It is not linear

I just got back from a visit to the Carnegie Institution of Washington where I gave a talk and saw some old friends. I was a postdoc at the Department of Terrestrial Magnetism (DTM) in the ’90s. DTM is so-named because in their early days they literally traveled the world mapping the magnetic field. When I was there, DTM⁺ had a small extragalactic astronomy group including Vera Rubin*, Francois Schweizer, and John Graham. Working there as a Carnegie Fellow gave me great latitude to pursue whatever science I wanted, with the benefit of discussions with these great astronomers. After my initial work on low surface brightness galaxies had brought MOND to my attention, much of the follow-up work checking all (and I do mean all) the other constraints was done there, ultimately resulting in the triptych of papers showing that the bulk of the evidence available at that time favored MOND over the dark matter interpretation.

When I joined the faculty at the University of Maryland in 1998, I saw the need to develop a graduate course on cosmology, which did not exist there at that time. I began to consider how cosmic structure might form in MOND, but was taken aback when Simon White asked me to referee a paper on the subject by Bob Sanders. He had found much what I was finding, that there was no way to avoid an early burst of speedy galaxy formation. I had been scooped!

It has taken a quarter century to test our predictions, so any concern about who said what first seems silly now. Indeed, the bigger problem is informing people that these predictions were made at all. I had a huge eye roll last month when Physics Magazine came out with

February 12, 2024
NEWS FEATURE
JWST Sees More Galaxies than Expected
February 9, 2024

The new JWST observatory is revealing far more bright galaxies in the early Universe than anyone predicted, and astrophysicists have more than one explanation for the puzzle.
Physics Magazine

Far more bright galaxies in the early Universe than anyone predicted! Who could have predicted it? I guess I am anyone.

Joking aside, this is a great illustration of the inefficiency of scientific communication. I wrote a series of papers on the subject. I wasn’t alone; so did others. I gave talks about it. I’ve emphasized it in scientific reviews. My papers are frequently cited, ranking in the top 2% among the top 2% across all sciences. They’re cited by prominent cosmologists. Heck, I’ve even blogged about it. And yet, it comes as such a surprise that it couldn’t have possibly happened, to the extent that no one bothers to check what is in the literature. (There was a similar sociology around the prediction of the CMB second peak. It didn’t happen if we don’t look.)

So what did the Physics Magazine article talk about? More than one explanation, most of which are the conventionalist approaches we’ve talked about before – make star formation more efficient, or adjust the IMF (the mass spectrum with which stars form) to squeeze more UV photons out of fewer baryons. But there is also a paper by Sabti et al. that basically asserts “this can’t be happening!” which is exactly the point.

Sabti et al. ask whether the can boost the amplitude of structure formation in a way that satisfies both the new JWST observations and previous Hubble data. The answer is no:

We consider beyond-ΛCDM power-spectrum enhancements and show that any departure large enough to reproduce the abundance of ultramassive JWST candidates is in conflict with the HST data.
Sabti et al.

At first, this struck me as some form of reality denial, like an assertion that the luminosity density could not possible exceed LCDM predictions, even though that is exactly what it is observed to do:

*The integrated UV luminosity density as a function of redshift *from Adams et al. (2023)*. The data exceed the expectation for z > 10, even with the goal posts in motion.*

On a closer read, I realized my initial impression was wrong; they are making a much better argument. The star formation rate is what is really constrained by the UV luminosity, but if that is attributed to stellar mass, you can’t get there from here – even with some jiggering of structure formation. That appears to be correct, within the framework of their considerations. Yet an alteration of structure formation is exactly what led to the now-corroborated prediction of Sanders (1998), so something still seemed odd. Just how were they altering it?

It took a close read, but the issue is in their equation 3. They allow for more structure formation by increasing the amplitude. However, they maintain the usual linear growth rate. In effect, they boost the amplitude of the linear dashed line in the left panel below, while maintaining its shape:

*The growth rate of structure in CDM (linear, at left) and MOND (nonlinear, at right).*

This is strongly constrain at both higher and lower redshifts, so only a little boost in amplitude is possible, assuming linear growth. So what they’ve correctly shown is that the usual linear growth rate of LCDM cannot do what needs to be done. That just emphasizes my point: to get the rapid growth we observe in the narrow time range available above redshift ten, the rate of growth needs to be nonlinear.

Nonlinearity is unavoidable in MOND – hence the prediction of big galaxies at high redshift. Nonlinearity is a bear to calculate, which is part of the reason nobody wants to go there. Tough nougies. They teach us in grad school that the early universe is simple. It is a mantra to many who work in the field. I’m sorry, did God promise this? I understand the reasons why the early universe should be simple in standard FLRW cosmology, but what if the universe we live in isn’t that? No one has standing to promise that the early universe is as simple as expected. That’s just a fairy tale cosmologists tell their young so they can sleep at night.

⁺DTM has since been merged with the Geophysical Laboratory to become the Earth and Planets Laboratory. These departments shared the Broad Branch Road campus but maintained a friendly rivalry in the soccer Mud Cup, so named because the first Mud Cup was played on a field that was a such a quagmire that we all became completely covered in mud. It was great fun.

*Vera was always adamant that she was not a physicist, and yet a search returns the thumbnail

even though the Wikipedia article itself does not (at present) make this spurious “and physicist” assertion.

The evolution of the luminosity density

The results from the high redshift universe keep pouring in from JWST. It is a full time job, and then some, just to keep track. One intriguing aspect is the luminosity density of the universe at z > 10. I had not thought this to be problematic for LCDM, as it only depends on the overall number density of stars, not whether they’re in big or small galaxies. I checked this a couple of years ago, and it was fine. At that point we were limited to z < 10, so what about higher redshift?

It helps to have in mind the contrasting predictions of distinct hypotheses, so a quick reminder. LCDM predicts a gradual build up of the dark matter halo mass function that should presumably be tracked by the galaxies within these halos. MOND predicts that galaxies of a wide range of masses form abruptly, including the biggest ones. The big distinction I’ve focused on is the formation epoch of the most massive galaxies. These take a long time to build up in LCDM, which typically takes half a Hubble time (~7 billion years; z < 1) for a giant elliptical to build up half its final stellar mass. Baryonic mass assembly is considerably more rapid in MOND, so this benchmark can be attained much earlier, even within the first billion years after the Big Bang (z > 5).

In both theories, astrophysics plays a role. How does gas condense into galaxies, and then form into stars? Gravity just tells us when we can assemble the mass, not how it becomes luminous. So the critical question is whether the high redshift galaxies JWST sees are indeed massive. They’re much brighter than had been predicted by LCDM, and in-line with the simplest models evolutionary models one can build in MOND, so the latter is the more natural interpretation. However, it is much harder to predict how many galaxies form in MOND; it is straightforward to show that they should form fast but much harder to figure out how many do so – i.e., how many baryons get incorporated into collapsed objects, and how many get left behind, stranded in the intergalactic medium? Consequently, the luminosity density – the total number of stars, regardless of what size galaxies they’re in – did not seem like a straight-up test the way the masses of individual galaxies is.

It is not difficult to produce lots of stars at high redshift in LCDM. But those stars should be in many protogalactic fragments, not individually massive galaxies. As a reminder, here is the merger tree for a galaxy that becomes a bright elliptical at low redshift:

*Merger tree from De Lucia & Blaizot 2007 showing the hierarchical build-up of massive galaxies from many protogalactic fragments.*

At large lookback times, i.e., high redshift, galaxies are small protogalactic fragments that have not yet assembled into a large island universe. This happens much faster in MOND, so we expect that for many (not necessarily all) galaxies, this process is basically complete after a mere billion years or so, often less. In both theories, your mileage will vary: each galaxy will have its own unique formation history. Nevertheless, that’s the basic difference: big galaxies form quickly in MOND while they should still be little chunks at high z in LCDM.

The hierarchical formation of structure is a fundamental prediction of LCDM, so this is in principle a place it can break. That is why many people are following the usual script of blaming astrophysics, i.e., how stars form, not how mass assembles. The latter is fundamental while the former is fungible.

Gradual mass assembly is so fundamental that its failure would break LCDM. Indeed, it is so deeply embedded in the mental framework of people working on it that it doesn’t seem to occur to most of them to consider the possibility that it could work any other way. It simply has to work that way; we were taught so in grad school!

Here is a sketch of how structures grow over time under the influence of cold dark matter (left, from Schramm 1992) and MOND (right, from Sanders & McGaugh 2002; see also this further discussion). The slow linear growth of CDM (long-dashed line, left panel) is replaced by a rapid, nonlinear growth in MOND (solid lines at right; numbers correspond to different scales). Nonlinear growth moderates after cosmic expansion begins to accelerate (dashed vertical line in right panel).

A principle result in perturbation theory applied to density fluctuations in an expanding universe governed by General Relativity is that the growth rate of these proto-objects is proportional to the expansion rate of the universe – hence the linear long-dashed line in the left diagram. The baryons cannot match the observations by themselves because the universe has “only” expanded by a factor of a thousand since recombination while structure has grown by a factor of a hundred thousand. This was one of the primary motivations for inventing cold dark matter in the first place: it can grow at the theory-specified rate without obliterating the observed isotropy^% of the microwave background. The skeletal structure of the cosmic web grows in cold dark matter first; the baryons fall in afterwards (short-dashed line in left panel).

That’s how it works. Without dark matter, structure cannot form, so we needn’t consider MOND nor speak of it ever again forever and ever, amen.

Except, of course, that isn’t necessarily how structure formation works in MOND. Like every other inference of dark matter, the slow growth of perturbations assumes that gravity is normal. If we consider a different force law, then we have to revisit this basic result. Exactly how structure formation works in MOND is not a settled subject, but the panel at right illustrates how I think it might work. One seemingly unavoidable aspect is that MOND is nonlinear, so the growth rate becomes nonlinear at some point, which is rather early on if Milgrom’s constant a₀ does not evolve. Rather than needing dark matter to achieve a growth factory of 10⁵, the boost to the force law enables baryons do it on their own. That, in a nutshell, is why MOND predicts the early formation of big galaxies.

The same nonlinearity that makes structure grow fast in MOND also makes it very hard to predict the mass function. My nominal expectation is that the present-day galaxy baryonic mass function is established early and galaxies mostly evolve as closed boxes after that. Not exclusively; mergers still occasionally happen, as might continued gas accretion. In addition to the big galaxies that form their stars rapidly and eventually become giant elliptical galaxies, there will also be a population for which gas accretion is gradual^ enough to settle into a preferred plane and evolve into a spiral galaxy. But that is all gas physics and hand waving; for the mass function I simply don’t know how to extract a prediction from a nonlinear version of the Press-Schechter formalism. Somebody smarter than me should try that.

We do know how to do it for LCDM, at least for the dark matter halos, so there is a testable prediction there. The observable test depends on the messy astrophysics of forming stars and the shape of the mass function. The total luminosity density integrates over the shape, so is a rather forgiving test, as it doesn’t distinguish between stars in lots of tiny galaxies or the same number in a few big ones. Consequently, I hadn’t put much stock in it. But it is also a more robustly measured quantity, so perhaps it is more interesting than I gave it credit for, at least once we get to such high redshift that there should be hardly any stars.

Here is a plot of the ultraviolet (UV) luminosity density from Adams et al. (2023):

Fig. 8 from Adams et al. (2023) showing the integrated UV luminosity density as a function of redshift. UV light is produced by short-lived, massive stars, so makes a good proxy for the star formation rate (right axis).

The lower line is one⁺ a priori prediction of LCDM. I checked this back when JWST was launched, and saw no issues up to z=10, which remains true. However, the data now available at higher redshift are systematically higher than the prediction. The reason for this is simple, and the same as we’ve discussed before: dark matter halos are just beginning to get big; they don’t have enough baryons in them to make that many stars – at least not for the usual assumptions, or even just from extrapolating what we know quasi-empirically. (I say “quasi” because the extrapolation requires a theory-dependent rate of mass growth.)

The dashed line is what I consider to be a reasonable adjustment of the a priori prediction. Putting on an LCDM hat, it is actually closer to what I would have predicted myself because it has a constant star formation efficiency which is one of the knobs I prefer to fix empirically and then not touch. With that, everything is good up to z=10.5, maybe even to z=12 if we only believe^* the data with uncertainties. But the bulk of the high redshift data sit well above the plausible expectation of LCDM, so grasping at the dangling ends of the biggest error bars seems unlikely to save us from a fall.

Ignoring the model lines, the data flatten out at z > 10, which is another way of saying that the UV luminosity function isn’t evolving when it should be. This redshift range does not correspond to much cosmic time, only a few hundred million years, so it makes the empiricist in me uncomfortable to invoke astrophysical causes. We have to imagine that the physical conditions change rapidly in the first sliver of cosmic time at just the right fine-tuned rate to make it look like there is no evolution at all, then settle down into a star formation efficiency that remains constant in perpetuity thereafter.

Harikane et al. (2023) also come to the conclusion that there is too much star formation going on at high redshift (their Fig. 18 is like that of Adams above, but extending all the way to z=0). Like many, they appear to be unaware that the early onset of structure formation had been predicted, so discuss three conventional astrophysical solutions as if these were the only possibilities. Translating from their section 6, the astrophysical options are:

Star formation was more efficient early on
Active Galactic Nuclei (AGN)
A top heavy IMF

This is a pretty broad view of the things that are being considered currently, though I’m sure people will add to this list as time goes forward and entropy increases.

Taking these in reverse order, the idea of a top heavy IMF is that preferentially more massive stars form early on. These produce more light per unit mass, so one gets brighter galaxies than predicted with a normal IMF. This is an idea that recurs every so often; see, e.g., section 3.1.1 of McGaugh (2004) where I discuss it in the related context of trying to get LCDM models to reionize the universe early enough. Supermassive Population III stars were all the rage back then. Changing the mass spectrum^& with which stars form is one of those uber-free parameters that good modelers refrain from twiddling because it gives too much freedom. It is not a single knob so much as a Pandora’s box full of knobs that invoke a thousand Salpeter’s demons to do nearly anything at the price of understanding nothing.

As it happens, the option of a grossly variable IMF is already disfavored by the existence of quenched galaxies at z~3 that formed a normal stellar population at much higher redshift (z~11). These galaxies are composed of stars that have the spectral signatures appropriate for a population that formed with a normal IMF and evolved as stars do. This is exactly what we expect for galaxies that form early and evolve passively. Adjusting the IMF to explain the obvious makes a mockery of Occam’s razor.

AGN is a catchall term for objects like quasars that are powered by supermassive black holes at the centers of galaxies. This is a light source that is non-stellar, so we’ll overestimate the stellar mass if we mistake some light from AGN^# as being from stars. In addition, we know that AGN were more prolific in the early universe. That in itself is also a problem: just as forming galaxies early is hard, so too is it hard to form enough supermassive black holes that early. So this just becomes the same problem in a different guise. Besides, the resolution of JWST is good enough to see where the light is coming from, and it ain’t all from unresolved AGN. Harikane et al. estimate that the AGN contribution is only ~10%.

That leaves the star formation efficiency, which is certainly another knob to twiddle. On the one hand, this is a reasonable thing to do, since we don’t really know what the star formation efficiency in the early universe was. On the other, we expected the opposite: star formation should, if anything, be less efficient at high redshift when the metallicity was low so there were few ways for gas to cool, which is widely considered to be a prerequisite for initiating star formation. Indeed, inefficient cooling was an argument in favor of a top-heavy IMF (perhaps stars need to be more massive to overcome higher temperatures in the gas from which they form), so these two possibilities contradict one another: we can have one but not both.

To me, the star formation efficiency is the most obvious knob to twiddle, but it has to be rather fine-tuned. There isn’t much cosmic time over which the variation must occur, and yet it has to change rapidly and in such a way as to precisely balance the non-evolving UV luminosity function against a rapidly evolving dark matter halo mass function. Once again, we’re in the position of having to invoke astrophysics that we don’t understand to make up for a manifest deficit the behavior of dark matter. Funny how those messy baryons always cover up for that clean, pure, simple dark matter.

I could go on about these possibilities at great length (and did in the 2004 paper cited above). I decline to do so any farther: we keep digging this hole just to fill it again. These ideas only seem reasonable as knobs to turn if one doesn’t see any other way out, which is what happens if one has absolute faith in structure formation theory and is blissfully unaware of the predictions of MOND. So I can already see the community tromping down the familiar path of persuading ourselves that the unreasonable is reasonable, that what was not predicted is what we should have expected all along, that everything is fine with cosmology when it is anything but. We’ve done it so many times before.

Initially I had the cat stuffed back in the bag image here, but that was really for a theoretical paper that I didn’t quite make it to in this post. You’ll see it again soon. The observations discussed here are by observers doing their best in the context they know, so it doesn’t seem appropriate to that.

^%We were convinced of the need for non-baryonic dark matter before any fluctuations in the microwave background were detected; their absence at the level of one part in a thousand sufficed.

^The assembly of baryonic mass can and in most cases should be rapid. It is the settling of gas into a rotationally supported structure that takes time – this is influenced by gas physics, not just gravity. Regardless of gravity theory, gas needs to settle gently into a rotating disk in order for spiral galaxies to exist.

⁺There are other predictions that differ in detail, but this is a reasonable representative of the basic expectation.

*This is not necessarily unreasonable, as there is some proclivity to underestimate the uncertainties. That’s a general statement about the field; I have made no attempt to assess how reasonable these particular error bars are.

^&Top-heavy refers to there being more than the usual complement of bright but short-lived (tens of millions of years) stars. These stars are individually high mass (bigger than the sun), while long-lived stars are low mass. Though individually low in mass, these faint stars are very numerous. When one integrates over the population, one finds that most of the total stellar mass resides in the faint, low mass stars while much of the light is produced by the high mass stars. So a top heavy IMF explains high redshift galaxies by making them out of the brightest stars that require little mass to build. However, these stars will explode and go away on a short time scale, leaving little behind. If we don’t outright truncate the mass function (so many knobs here!), there could be some longer-lived stars leftover, but they must be few enough for the whole galaxy to fade to invisibility or we haven’t gained anything. So it is surprising, from this perspective, to see massive galaxies that appear to have evolved normally without any of these knobs getting twiddled.

^#Excess AGN were one possibility Jay Franck considered in his thesis as the explanation for what we then considered to be hyperluminous galaxies, but the known luminosity function of AGN up to z = 4 couldn’t explain the entire excess. With the clarity of hindsight, we were just seeing the same sorts of bright, early galaxies that JWST has brought into sharper focus.

Quantifying the excess masses of high redshift galaxies

As predicted, JWST has been seeing big galaxies at high redshift. There are now many papers on the subject, ranging in tone from “this is a huge problem for LCDM” to “this is not a problem for LCDM at all” – a dichotomy that persists. So – which is it?

It will take some time to sort out. There are several important aspects to the problem, one of which is agreeing on what LCDM actually predicts. It is fairly robust at predicting the number density of dark matter halos as a function of mass. To convert that into something observable requires understanding how baryons find their way into dark matter halos at early times, how those baryons condense into regions dense enough to form stars, what kinds of stars form there (thus determining observables like luminosity and spectral shape), and what happens in the immediate aftermath of early star formation (does feedback shut off star formation quickly or does it persist or is there some distribution over all possibilities). This is what simulators attempt to do. It is hard work, and they are a long way from agreeing with each other. Many of them appear to be a long way from agreeing with themselves, as their answers continue to evolve – sometimes because of genuine progress in the simulations, but sometimes in response to unanticipated* observations.

Observationally, we can hope to measure at least two distinct things: the masses of individual galaxies, and their number density – how many galaxies of a particular mass exist in a specified volume. I have mostly been worried about the first issue, as it appears that individual galaxies got too big too fast. In the hierarchical galaxy formation picture of LCDM, the massive galaxies of today were assembled from many smaller protogalaxies over an extended period of time, so big galaxies don’t emerge until comparatively late: it takes about seven billion years for a typical bright galaxy to assemble half its stellar mass. (The same hierarchical process is accelerated in MOND so galaxies can already be massive at z ≈ 10.) That there are examples of individual galaxies that are already massive in the early universe is a big issue.

How common should massive galaxies be? There are always early adopters: objects that grew faster than average for their mass. We’ll always see the brightest things first, so is what we’re seeing with JWST typical? Or is it just the bright tip of an iceberg that is perfectly reasonable in LCDM? This is what the luminosity function helps quantify: just how many galaxies of each mass are there? If we can quantify that, then we can quantify how many we should be able to see with a given survey of specified depth and sky coverage.

Astronomers have been measuring the galaxy luminosity function for a long time. Doing so at high redshift has always been an ambition, so JWST is hardly the first telescope to contribute to the subject. It is the newest and best, opening a regime where we had hoped to see protogalactic fragments directly. Instead, the first thing we see are galaxies bigger than we expected (in LCDM). This has been building for some time, so let’s take a step back to provide some context.

Steinhardt et al. (2016) pointed out what they call “the impossibly early galaxy problem.” They quantified this by comparing the observed luminosity function in various redshift bins to that predicted by LCDM. We’ve discussed their Fig. 1 before, so let’s look now at their Fig. 4:

**Figure 4** from Steinhardt et al. (2016). Colors correspond to redshift, with z = 4, 5, 6, 7, 8, 9, 10 being represented by blue, green, yellow, orange, red, pink, and black: there are fewer objects at high redshift where they’ve had less time to form. (a) Expected halo mass to monochromatic UV luminosity ratio, along with the required evolution to reconcile observation with theory, and (b) resulting corrected halo-mass functions derived as in Figure 1 with M_halo/L_UV evolving due to a stellar population starting at low metallicity at z = 12 and aging along the star-forming main sequence, as described in Section 4.1.1. Such a model would be reasonable given observational constraints, but cannot produce agreement between measured UV luminosity functions and simulated halo-mass functions.

In a perfect model, the points (data) would match the lines (theory) of the same color (redshift). This is not the case – observed galaxies are persistently brighter than predicted. Making that prediction is subject to all the conversions from dark matter mass to stellar mass to observed luminosity we mentioned above, so they also show what they expect and what it would take to match the data. These are the different lines in the top panel. There is a lot of discussion of this in their paper that boils down to these lines are different, and we cannot plausibly make them the same.

The word “plausibly” is doing a lot of work in that last sentence. Just because one set of authors finds something to be impossible (despite their best efforts) doesn’t mean anyone else accepts that. We usually don’t, even when we should**.

It occurs to me that not every reader may appreciate how redshift corresponds to cosmic time. So here is a graph for vanilla LCDM parameters:

The age-redshift relation for the vanilla LCDM cosmology. Everything at z > 3 is in the early universe, i.e., the first two billion years after the Big Bang. Everything at z > 10 is in the very early universe, the first half billion years when there has not yet been time to form big galaxies hierarchically.

Things don’t change much if we adopt slightly different cosmologies: this aspect of LCDM is well established. We used to think it would take a least a couple of billion years to form a big galaxy, so anything at z > 3 is surprising from that perspective. That’s not wrong, as there is an inverse relation between age and redshift, with increasing redshifts crammed into an ever smaller window of time. So while z = 5 and 10 sound very different, there is only about 700 Myr between them. That sounds like a long time to you and me, but the sun will only complete 3 orbits around the Galaxy in that time. This is why it is hard to imagine an object as large as the Milky Way starting from the near-homogeneity of the very early universe then having time to expand, decouple, recollapse, and form into something coherent so “quickly.” There is a much larger distance for material to travel than the current circumference of the solar circle, and not much time in which to do it. If we want to get it done by z = 10, there is less than 500 Myr available – about two orbits of the sun. We just can’t get there fast enough.

We’ve quickly become jaded to the absurdly high redshifts revealed by JWST, but there’s not much difference in cosmic time between these seemingly ever higher redshifts. Very early epochs were already being probed before JSWT; JWST just brings them into excruciating focus. To provide some historical perspective about what “high redshift” means, here is a quote from Schramm (1992). The full text is behind a paywall, so I’ll just quote a relevant paragraph:

Pushing the opposite direction from the “zone of mystery” epoch [the dark ages] between the background radiation and the existence of objects at high redshift is the discovery of objects at higher and higher redshift. The higher the redshift of objects found, the harder it is to have the slow growth of Figure 5 [SCDM] explain their existence. Some high redshift objects can be dismissed as statistical fluctuations if the bulk of objects still formed late. In the last year, the number of quasars with redshifts > 4 has gone to 30, with one having a redshift as large as 4.9… While such constraints are not yet a serious problem for linear growth models, eventually they might be.
David Schramm, 1992

Here we have a cosmologist already concerned 30 years ago that objects exist at z > 4. Crazy, that! Back then, the standard model was SCDM; one of the reasons to switch to LCDM was to address exactly this problem. That only buys us a couple of billion years, so now we’re smack up against the same problem all over again, just shifted to higher redshift. Some people are even invoking statistical fluctuations: same as it ever was.

Consequently, a critical question is how common these massive galaxies are. Sure, massive galaxies exist before we expected them. But are they just statistical fluctuations? This is a question we can address with the luminosity function.

Here is the situation just before JWST was launched. Yung et al. (2019) made a good faith effort to establish a prior: they made predictions for what JWST would see. This is how science is supposed to work. In the figure below, I compare that to what was known (Stefanon et al. 2021) from the Spitzer Space Telescope, in many ways the predecessor to JSWT:

**Figure 4** from McGaugh (2024). The number density Φ of galaxies as a function of their stellar mass 𝑀∗, color coded by redshift with 𝑧=6, 7, 8, 9, 10 in dark blue, light blue, green, orange, and red, respectively. The left panel shows predicted stellar mass functions [lines] with the corresponding data [circles]. The right panel shows the ratio of the observed-to-predicted density of galaxies. There is a clear excess of massive galaxies at high redshifts.

If you just look at the mass functions in the left panel, things look pretty good. This is one of the dangers of the logarithmic plots necessary to illustrate the large dynamic range of astronomical data: large differences may look small in log-log space. So I also plot the ratio of densities at right. There one can see a clear excess in the number density of high mass galaxies. There are nearly an order of magnitude more 10¹⁰ M_☉ galaxies than expected at z ≈ 8!

For technical reasons I don’t care to delve into, it is difficult to get the volume estimate right when constructing the luminosity function. So I can imagine there might be some systematic effects to scale the ratio up or down. That wouldn’t do anything to explain the bump at high masses, and it is rather harder to get the shape wrong, especially at the bright end. The faint end of the luminosity function is the hard part!

The Spitzer data already probes the early universe, before JWST reported results. As those have come in, it has started to be possible to construct luminosity functions at very high redshift. Here are some measurements from Harikane et al. (2023), Finkelstein et al. (2023), and Robertson et al. (2023) together with revised predictions from Yung et al. (2024).

**Figure 5** from McGaugh (2024). The number density of galaxies as a function of their rest-frame ultraviolet absolute magnitude observed by JWST, a proxy for stellar mass at high redshift. The left panel shows predicted luminosity functions [lines], color coded by redshift: blue, green, orange, red for 𝑧=9, 11, 12, 14, respectively. Data in the corresponding redshift bins are shown as squares, circles, and triangles. The right panel shows the ratio of the observed-to-predicted density of galaxies. The observed luminosity function barely evolves, in contrast to the prediction of substantial evolution as the first dark matter halos assemble. There is a large excess of bright galaxies at the highest redshifts observed.

Again, we see that there is an excess of bright galaxies at the highest redshifts.

As we look to progressively higher redshift, the light we observe shifts from familiar optical bands to the ultraviolet. This was a huge part of the motivation to build JWST: it is optimized for the infrared, so we can observed the redshifted optical light as our eyes would see it. Astronomers always push to the edge of what a telescope can do, so we start to run into this problem again at the highest redshifts. The mapping of ultraviolet light to stellar mass is one of the harder tasks in stellar population work, much less mapping that to a dark matter halo mass. So one promising conventional idea is “the up-scattering in UV luminosity of small, abundant halos due to stochastic, high efficiency star formation during the initial phases of galaxy formation (unregulated star formation)” discussed^$ by Finkelstein et al. (2023). I like this because, yeah, we expect lots of little halos, star formation is messy and star formation during the first phases of galaxy formation should be especially messy, so it is easy to imagine little halos stochastically lighting up in the UV. But can this be enough?

It remains to be seen if the observations can be explained by this or any of the usual tweaks to star formation. It seems like a big gap to overcome. I mean, just look at the left panel of the final figure above. The observed UV luminosity function is barely evolving while the prediction of LCDM is dropping like a rock. Indeed, the mass functions get jagged, which may be an indication that there are so few dark matter halos in the simulation volume at the redshift in question that they do not suffice to define a smooth mass function. Indeed, Harikane et al. estimate a luminosity density of ∼7 × 10⁻⁶ mag.⁻¹ Mpc⁻³ at 𝑧≈16. This point is omitted from the figure above because the corresponding prediction is NAN (not a number): there just isn’t anything big enough in the simulation to do be so bright that early.

There is good reason to be skeptical of the data at 𝑧≈16. There is also good reason to be skeptical of the simulations. These have yet to converge, and even the predictions of the same group continue to evolve. Yung et al. (2019) did the right thing to establish a prior before JWST’s launch, but they haven’t stuck by it. The density of rare, massive galaxies has gone up by a factor of 2 to 2.5 in Yung et al. (2024). They attribute this to the use of higher resolution simulations, which may very well be correct: in order to track the formation of the earliest structures, you have to resolve them. But it doesn’t exactly inspire confidence that we actually know what LCDM predicts, and it feels like the same sort of moving of the goalposts that I’ve witnessed over and over and over and over and over again.

It always seems to come down to special pleading:

And the community loves LCDM, so we fall for it every time.

*There is always a danger in turning knobs to fit the data, and there are plenty of knobs to turn. So what LCDM predicts is a very serious matter – a theory is only as good as its prior, and we should be skeptical if theorists keep adjusting what that is in response to observations they failed to predict. This is true even in the absence of the existential threat of MOND which implies that the entire field of cosmological simulations is betrayed by its most fundamental assumptions, reducing it to “garbage in, garbage out.”

**When I first found that MOND had predicted our observations of low surface brightness galaxies where dark matter had not, despite my best efforts to make it work out, Ortwin Gerhard asked me if he “had to believe it.” My instant reaction was “this is astronomy, we don’t have to believe anything.” More seriously, this question applies on many levels: do we believe the data? do we believe the interpretation? is this the only possible conclusion? At the time, I had already tried very hard to fix it, and had failed. Still, I was willing to imagine there might be some way out, and maybe someone could figure out something I had not. Since that time, lots of other people have tried and also failed. This has not kept some of them from claiming that they have succeeded, but they never seem to address the underlying problem, and most of these models are mere variations on things I tried and dismissed as obviously unworkable.

Now, as then, what we are obliged to believe is the data, to the limits of their accuracy. The data have improved substantially, and at this point it is clear that the radial acceleration relation exists⁺ and has remarkably small intrinsic scatter. What we can always argue about is the interpretation: sure, it looks exactly like MOND, and MOND was the only theory that predicted it in advance, and we haven’t been able to come up with a reasonable explanation in terms of dark matter, but perhaps one can be found in some dark matter model that does not yet exist.

⁺Of course, there will always be some people behind the times and in a state of denial, as this subject seems to defeat rationalism in the hearts and minds of particle physicists in the same way Darwin still enrages some of the more religiously inclined.

^$I directly quote Finkelstein’s coauthor Mauro Giavalisco from an email exchange.

Can’t be explained by science!

This clickbait title is inspired by the clickbait title of a recent story about high redshift galaxies observed by JWST. To speak in the same vernacular:

LOL!

What they mean, as I’ve discussed many times here, is that it is difficult to explain these observations in LCDM. LCDM does not encompass all of science. Science* predicted exactly this.

This story is one variation on the work of Labbe et al. that has been making the rounds since it appeared in Nature in late February. The concern is that these high redshift galaxies are big and bright. They got too big too soon.

Six high redshift galaxies from the JWST CEERS survey, as reported by Labbe et al. (2023). Not much to look at, but bear in mind that these objects are pushing the edge of the observable universe. By that standard, they are both bright and disarmingly obvious.

The work of Labbe et al. was one of the works informing the first concerns to emerge from JWST. Concerns were also raised about the credibility of those data. Are these galaxies really as massive as claimed, and at such high redshift? Let’s compare before and after publication:

The results here are mixed. On the one hand, we were right to be concerned about the initial analysis. This was based in part on a ground-based calibration of the telescope before it was launched. That’s not the same as performance on the sky, which is usually a bit worse than in the lab. JWST breaks that mold, as it is actually performing better than expected. That means the bright-looking galaxies aren’t quite as intrinsically bright as was initially thought.

The correct calibration reduces both the masses and the redshifts of these galaxies. The change isn’t subtle: galaxies are less massive (the mass scale is logarithmic!) and at lower redshift than initially thought. Amusingly, only one galaxy is above redshift 9 when the early talking point was big galaxies at z = 10. (There are other credible candidates for that.) Nevertheless, the objects are clearly there, and bright (i.e., massive). They are also early. We like to obsess about redshift, but there is an inverse relation between redshift and time, so there is not much difference in clock time between z = 7 and 10. Redshift 10 is just under 500 million years after the big bang; redshift 7 just under 750 million years. Those are both in the first billion years out of a current age of over thirteen billion years. The universe was still in its infancy for both.

Regardless of your perspective on cosmic time scales, the observed galaxies remain well into LCDM’s danger zone, even with the revised calibration. They are no longer fully in the no-go zone, so I’m sure we’ll see lots of papers explaining how the danger zone isn’t so dangerous after all, and that we should have expected it all along. That’s why it matters more what we predict before an observation than after the answer is known.

*I emphasize science here because one of the reactions I get when I point out that this was predicted is some variation on “That doesn’t count! [because I don’t understand the way it was done.]” And yet, the predictions made and published in advance of the observations keep coming true. It’s almost as if there might be something to this so-called scientific method.

On the one hand, I understand the visceral negative reaction. It is the same reaction I had when MOND first reared its ugly head in my own data for low surface brightness galaxies. This is apparently a psychological phase through which we must pass. On the other hand, the community seems stuck in this rut: it is high time to get past it. I’ve been trying to educate a reluctant audience for over a quarter century now. I know how it pains them because I shared that pain. I got over it. If you’re a scientist still struggling to do so, that’s on you.

There are some things we have to figure out for ourselves. If you don’t believe me, fine, but then get on with doing it yourself instead of burying your head in the sand. The first thing you have to do is give MOND a chance. When I allowed that possibility, I suddenly found myself working less hard than when I was desperately trying to save dark matter. If you come to the problem sure MOND is wrong⁺, you’ll always get the answer you want.

⁺I’ve been meaning to write a post (again) about the very real problems MOND suffers in clusters of galaxies. This is an important concern. It is also just one of hundreds of things to consider in the balance. We seem willing to give LCDM infinite mulligans while any problem MOND encounters is immediately seen as fatal. If we hold them to the same standard, both are falsified. If all we care about is explanatory power, LCDM always has that covered. If we care more about successful a priori predictions, MOND is less falsified than LCDM.

There is an important debate to be had on these issues, but we’re not having it. Instead, I frequently encounter people whose first response to any mention of MOND is to cite the bullet cluster in order to shut down discussion. They are unwilling to accept that there is a debate to be had, and are inevitably surprised to learn that LCDM has trouble explaining the bullet cluster too, let alone other clusters. It’s almost as if they are just looking for an excuse to not have to engage in serious thought that might challenge their belief system.

Triton Station

A Blog About the Science and Sociology of Cosmology and Dark Matter

Category: JWST