Old galaxies in the early universe

Continuing our discussion of galaxy formation and evolution in the age of JWST, we saw previously that there appears to be a population of galaxies that grew rapidly in the early universe, attaining stellar masses like those expected in a traditional monolithic model for a giant elliptical galaxy rather than a conventional hierarchical model that builds up gradually through many mergers. The formation of galaxies at incredibly high redshift, z > 10, implies the existence of a descendant population at intermediate redshift, 3 < z < 4, at which point they should have mature stellar populations. These galaxies should not only be massive, they should also have the spectral characteristics of old stellar populations – old, at least, for how old the universe itself is at this point.

*Theoretical predictions from* **Fig. 1** of McGaugh et al (2024) *combined with the data of* **Fig. 4**. The data follow the track of a monolithic model that forms early as a single galaxy rather than that of the largest progenitor of the hierarchical build-up expected in LCDM.

The data follow the track of stellar mass growth for an early-forming monolithic model. Do the ages of stars also look like that?

Here is a recent JWST spectrum published by de Graff et al. (2024). This appeared too recently for us to have cited in our paper, but it is a great example of what we’re talking about. This is an incredibly gorgeous spectrum of a galaxy at z = 4.9 when the universe was 1.2 Gyr old.

**Fig. 1** from de Graff et al. (2024): *JWST/NIRSpec PRISM spectrum (black line) of the massive quiescent galaxy RUBIES-EGS-QG-1 at a redshift of z = 4.8976.*

It is challenging to refrain from nerding out at great length over many of the details on display here. First, it is an incredible technical achievement. I’ve seen worse spectra of local galaxies. JWST was built to obtain images and spectra of galaxies so distant they approach the horizon of the observable universe. Its cameras are sensitive to the infrared part of the spectrum in order to capture familiar optical features that have been redshifted by a huge factor (compare the upper and lower x-axes). The telescope itself was launched into space well beyond the obscuring atmosphere of the earth, pointed precisely at a tiny, faint flicker of light in a vast, empty universe, captured photons that had been traveling for billions of years, and transmitted the data to Earth. That this is possible, and works, is an amazing feat of science, engineering, and societal commitment (it wasn’t exactly cheap).

In the raw 2D spectrum (at top) I can see by eye the basic features in the extracted, 1D spectrum (bottom). This is a useful and convincing reality check to an experienced observer even if at first glance it looks like a bug splot smeared by a windshield wiper. The essential result is apparent to the eye; the subsequent analysis simply fills in the precise numbers.

Looking from right to left, the spectrum runs from red to blue. It ramps up then crashes down around an observed wavelength of 2.3 microns. This is the 4000 Å break in the rest frame, a prominent feature of aging stellar populations. The amount of blue-to-red ramp-up and the subsequent depth of drop is a powerful diagnostic of stellar age.

In addition to the 4000 Å break, a number of prominent spectral lines are apparent. In particular, the Balmer absorption lines Hβ, Hγ, and Hδ are clear and deep. These are produced by A stars, which dominate the light of a stellar population after a few hundred million years. There’s the answer right there: the universe is only 1.2 Gyr old at this point, and the stars dominating the light aren’t much younger.

There are also some emission lines. These can be the sign of on-going star formation or an active galactic nucleus powered by a supermassive black hole. The authors attribute these to the latter, inferring that the star formation happened fast and furious early on, then basically stopped. That’s important to the rest of the spectrum; A stars only dominate for a while, and their lines are not so prominent if a population keeps making new stars. So this galaxy made a lot of stars, made them fast, then basically stopped. That is exactly the classical picture of a monolithic giant elliptical.

Here is the star formation history that de Graff et al. (2024) infer:

**Fig. 2** from de Graff et al. (2024): the star formation rate (top) and accumulated stellar mass (bottom) as a function of cosmic time (only the first 1.2 Gyr are shown). Results for stellar populations of two metallicities are shown (purple or blue lines). This affects the timing of the onset of star formation, but once going, an enormous mass of stars forms fast, in ~200 Myr.

There are all sorts of caveats about population modeling, but it is very hard to avoid the basic conclusion that lots of stars were assembled with incredible speed. A stellar mass a bit in excess of that of the Milky Way appears in the time it takes for the sun to orbit once. That number need not be exactly right to see that this is not a the gradual, linear, hierarchical assembly predicted by LCDM. The typical galaxy in LCDM is predicted to take ~7 Gyr to assemble half its stellar mass, not 0.1 Gyr. It’s as if the entire mass collapsed rapidly and experienced an intense burst of star formation during violent relaxation (Lynden-Bell 1967).

Collapse of shells within shells to form a massive galaxy rapidly in MOND (Sanders 2008). Note that the inner shells (inset) where most of the stars will be collapse even more rapidly than the overall monolith (dotted line).

Where MOND provides a natural explanation for this observation, the fiducial population model of de Graff et al. violates the LCDM baryon limit: there are more stars than there are baryons to make them from. It should be impossible to veer into the orange region above as the inferred star formation history does. The obvious solution is to adopt a higher metallicity (the blue model) even if that is a worse fit to the spectrum. Indeed, I find it hard to believe that so many stars could be made in such a small region of space without drastically increasing their metallicity, so there are surely things still to be worked out. But before we engage in too much excuse-making for the standard model, note that the orange region represents a double-impossibility. First, the star formation efficiency is 100%. Second, this is for an exceptionally rare, massive dark matter halo. The chances of spotting such an object in the area so far surveyed by JWST is small. So we not only need to convert all the baryons into stars, we also need to luck into seeing it happen in a halo so massive that it probably shouldn’t be there. And in the strictist reading, there still aren’t enough baryons. Does that look right to you?

Do these colors look right to you? Getting the color right is what stellar population modeling is all about.

OK, so I got carried away nerding out about this one object. There are other examples. Indeed, there are enough now to call them a population of old and massive quiescent galaxies at 3 < z < 4. These have the properties expected for the descendants of massive galaxies that form at z > 10.

Nanayakkara et al. (2024) model spectra for a dozen such galaxies. The spectra provide an estimate of the stellar mass at the redshift of observation. They also imply a star formation history from which we can estimate the age/redshift at which the galaxy had formed half of those stars, and when it quenched (stopped forming stars, or in practice here, when the 90% mark had been reached). There are, of course, large uncertainties in the modeling, but it is again hard to avoid the conclusion that lots of stars were formed early.

**Figure 7** from McGaugh et al. (2024): The stellar masses of quiescent galaxies from Nanayakkara et al. (2024). The inferred growth of stellar mass is shown for several cases, marking the time when half the stars were present (small green circles) to the quenching time when 90% of the stars were present (midsize orange circles) to the epoch of observation (large red circles). Illustrative star formation histories are shown as dotted lines with the time of formation t_i and the quenching timescale τ noted in Gyr. We omit the remaining lines for clarity, as many cross. There is a wide distribution of formation times from very early (t_i = 0.2 Gyr) to relatively late (>1 Gyr), but all of the galaxies in this sample are inferred to build their stellar mass rapidly and quench early (τ < 0.5 Gyr).

The dotted lines above are models I constructed in the spirit of monolithic models. The particular details aren’t important, but the inferred timescales are. To put galaxies in this part of the stellar mass-redshift plane, they have to start forming early (typically in the first billion years), form stars at a prolific rate, then quench rapidly (typically with e-folding timescales < 1 Gyr). I wouldn’t say any of these numbers are particularly well-measured, but they are indicative.

What is missing from this plot is the LCDM prediction. That’s not because I omitted it, it’s because the prediction for typical L* galaxies doesn’t fall within the plot limits. LCDM does not predict that typical galaxies should become this massive this early. I emphasize typical because there is always scatter, and some galaxies will grow ahead of the typical rate.

Not only are the observed galaxies massive, they have mature stellar populations that are pretty much done forming stars. This will sound normal to anyone who has studied the stellar populations of giant elliptical galaxies. But what does LCDM predict?

I searched through the Illustris TNG50 and TNG300 simulations for objects at redshift 3 that had stellar masses in the same range as the galaxies observed by Nanayakkara et al. (2024). The choice of z = 3 is constrained by the simulation output, which comes in increments of the expansion factor. To compare to real galaxies at 3 < z < 4 one can either look at the snapshot at z = 4 or the one at z = 3. I chose z = 3 to be conservative; this gives the simulation the maximum amount of time to produce quenched, massive galaxies.

These simulations do indeed produce some objects of the appropriate stellar mass. These are rare, as they are early adopters: galaxies that got big quicker than is typical. However, they are not quenched as observed: the simulated objects are still on the star forming main sequence (the correlation between star formation rate and stellar mass). The distribution of simulated objects does not appear to encompass that of real galaxies.

**Figure 8** from McGaugh et al. (2024): The stellar masses and star formation rates of galaxies from Nanayakkara et al. (2024; red symbols). Downward-pointing triangles are upper limits; some of these fall well below the edge of the plot and so are illustrated as the line of points along the bottom. Also shown are objects selected from the TNG50 (Pillepich et al. 2019; filled squares) and TNG300 (Pillepich et al. 2018; open squares) simulations at z = 3 to cover the same range of stellar mass. Unlike the observed galaxies, simulated objects with stellar masses comparable to real galaxies are mostly forming stars at a rapid pace. In the higher-resolution TNG50, none have quenched as observed.

If we want to hedge, we can note that TNG300 has a few objects that are kinda in the right ballpark. That’s a bit misleading, as the data are mostly upper limits. Moreover, these are the rare objects among a set of objects selected to be rare: it isn’t a resounding success if we have to scrape the bottom of the simulated barrel after cherry-picking which barrel. Worse, these few semi-quenched simulated objects are not present in TNG50. TNG50 is the higher resolution simulation, so presumably provides a better handle on the star formation in individual objects. It is conceivable that TNG300 “wins” by virtue of its larger volume, but that’s just saying we have more space in which to discover very rare entities. The prediction is that massive, quenched galaxies should be exceedingly rare, but in the real universe they seem mundane.

That said, I don’t think this problem is fundamental. Hierarchical assembly is still ongoing at this epoch, bringing with it merger-induced star formation. There’s an easy fix for that: change the star formation prescription. Instead of “wet” mergers with gas that can turn into stars, we just need to form all the stars already early on so that the subsequent mergers are “dry” – at least, for those mergers that build this particular population. One winds up needing a new and different mode of star formation. In addition to what we observe locally, there needs to be a separate mode of super-efficient star formation that somehow turns all of the available baryons into stars as soon as possible. That’s basically what I advocate as the least unreasonable possibility for LCDM in our paper. This is a necessary but not sufficient condition; these early stellar nuggets also need to assemble speedy quick to make really big galaxies. While it is straightforward to mess with the star formation prescription in models (if not in nature), the merger trees dictating the assembly history are less flexible.

Putting all the data together in a single figure, we can get a sense for the evolutionary trajectory of the growth of stellar mass in galaxies across cosmic time. This figure extends from the earliest galaxies so-far known at z ~ 14 when the universe was just a few hundred million years old (of order on orbital time in a mature galaxy) to the present over thirteen billion years later. In addition to data discussed previously, it also shows recent data with spectroscopic redshifts from JWST. This is important, as the sense of the figure doesn’t change if we throw away all the photometric redshifts, it just gets a little sparse around z ~ 8.

**Figure 10** from McGaugh et al. (2024): The data from Figures 4 and 6 shown together using the same symbols. Additional JWST data with spectroscopic redshifts are shown from Xiao et al. (2023; green triangles) and Carnall et al. (2024). The data of Carnall et al. (2024) distinguish between star-forming galaxies (small blue circles) and quiescent galaxies (red squares); the latter are in good agreement with the typical stellar mass determined from Schechter fits in clusters (large circles). The dashed red lines show the median growth predicted by the Illustris ΛCDM simulation (Rodriguez-Gomez et al. 2016) for model galaxies that reach final stellar masses of M_* = 10¹⁰, 10¹¹, and 10¹² M_☉. The solid lines show monolithic models with a final stellar mass of 9 x 10¹⁰ M_☉ and t_i = τ = 0.3, 0.4, and 0.5 Gyr, as might be appropriate for giant elliptical galaxies. The dotted line shows a model appropriate to a monolithic spiral galaxy with t_i = 0.5 and τ = 13.5 Gyr.

The solid lines are monolithic models we built to represent classical giant elliptical galaxies that form early and quench rapidly. These capture nicely the upper envelope of the data. They form most of their stars at z > 4, producing appropriately old populations at lower redshifts. The individual galaxy data merge smoothly into those for typical galaxies in clusters.

The LCDM prediction as represented by the Illustris suite of simulations is shown as the dashed red lines for objects of several final masses. These are nearly linear in log(M_*)-linear z space. Objects that end up with a typical L* elliptical galaxy mass at z = 0 deviate from the data almost immediately at z > 1. They disappear above z > 6 as the largest progenitors become tiny.

What can we do to fix this? Massive galaxies get a head start, as it were, by being massive at all epochs. But the shape of the evolutionary trajectory remains wrong. The top red line (for a final stellar masses of 10¹² M_☉) corresponds to a typical galaxy at z ~ 2, but it continues to grow to be atypical locally. The data don’t do that. Even with this boost, the largest progenitor is still predicted to be too small at z > 3 where there are now many examples of massive, quiescent galaxies – known both from JWST observations and from Jay Franck’s thesis before it. Again, the distribution of the data do not look like the predictions of LCDM.

One can abandon Illustris as the exemplar of LCDM, but it doesn’t really help. Other models show similar things, differing only in minor details. That’s because the issue is the mass assembly history they all share, not the details of the star formation. The challenge now is to tweak models to make them look more monolithic; i.e., change those red dashed lines into the solid black lines. One will need super-efficient star formation, if it is even possible. I’ll leave discussion of this and other obvious fudges to a future post.

Finally, note that there are a bunch of galaxies with JWST spectroscopic redshifts from 3 < z < 4 that are not exceptionally high mass (the small blue points). These are expected in any paradigm. They can be galaxies that are intrinsically low mass and won’t grow much further, or galaxies that may still grow a lot, just with a longer fuse on their star formation timescale. Such objects are ubiquitous in the local universe as spiral and irregular galaxies. Their location in the diagram above is consistent with the LCDM predictions, but is also readily explained by monolithic models with long star formation timescales. The dotted line shows a monolithic model that forms early (t_i = 0.5) but converts gas into stars gradually (τ = 13.5 Gyr rather than < 1 Gyr). This is a boilerplate model for a spiral that has been around for as long as the short-τ model for giant ellipticals. So while these lower mass galaxies exist, their location in the M_*-z plane doesn’t really add much to this discussion as yet. It is the massive galaxies that form early and become quiescent rapidly that most challenge LCDM.

Measuring the growth of the stellar mass of galaxies over cosmic time

This post continues the series summarizing our ApJ paper on high redshift galaxies. To keep it finite, I will focus here on the growth of stellar mass. The earlier post discussed what we expect in theory. This depends both on mass assembly (slow in LCDM, fast in MOND), how the assembled mass is converted into stars, and how those stars shine in light we can detect. We know a lot about stars and their evolution, so for this post I will assume we know how to convert a given star formation history into the evolution of the light it produces. There are of course caveats to that which we discuss in the paper, and perhaps will get to in a future post. It’s exhausting to be exhaustive, so not today, Satan.

The principle assumption we are obliged to make, at least to start, is that light traces mass. As mass assembles, some of it turns into stars, and those stars produce light. The astrophysics of stars and the light they produce is the same in any structure formation theory, so with this basic assumption, we can test the build-up of mass. In another post we will discuss some of the ways in which we might break this obvious assumption in order to save a favored theory. For now, we assume the obvious assumption holds, and what we see at high redshift provides a picture of how mass assembles.

Before JWST

This is not a new project; people have been doing it fo for decades. We like to think in terms of individual galaxies, but there are lots out there, so an important concept is the luminosity function, which describes the number of galaxies as a function of how bright they are. Here are some examples:

**Figure 3.** from Franck & McGaugh (2017) showing the number of galaxies as a function of their brightness in the 4.5 micron band of the Spitzer Space Telescope in candidate protoclusters from z = 2 to 6. Each panel notes the number of galaxies contributing to the Schechter luminosity function⁺ fit (gray bands), the apparent magnitude m* corresponding to the typical luminosity L*, and the redshift range. The magnitude m* is *characteristic* of how bright typical galaxies are at each redshift.

One reason to construct these luminosity functions is to quantify what is typical. Hundreds of galaxies inform each fit. The luminosity L* is representative of the typical galaxy, not just anecdotal individual examples. At each redshift, L* corresponds to an observed apparent magnitude m*, which we plot here:

**Figure 3** from McGaugh et al. (2024): The redshift dependence of the Spitzer [4.5] apparent magnitude m* of Schechter function fits to populations of galaxies in clusters and candidate protoclusters; each point represents *the characteristic* brightness of the galaxies in each cluster. The apparent brightness of galaxies gets fainter with increasing redshift because galaxies are more distant, with the amount they dim depending also on their evolution (lines). The purple line is the monolithic exponential model we discussed last time. The orange line is the prediction of the Millennium simulation (the state of the art at the time Jay Franck wrote his thesis) and the Munich galaxy formation model based on it. The open squares are the result of applying the same algorithm to the simulation as used on the data; this is what we would have observed if the universe looked like LCDM as depicted by the Munich model. The real universe does not look like that.

We plot faint to bright going up the y-axis; the numbers get smaller because of the backwards definition of the magnitude scale (which dates to ancient times in which the stars that appeared brightest to the human eye were “of the first magnitude,” then the next brightest of the second magnitude, and so on). The x-axis shows redshift. The top axis shows the corresponding age of the universe for vanilla LCDM parameters. Each point shows the apparent magnitude that is typical as informed by observations of dozens to hundreds of individual galaxies. Each galaxy has a spectroscopic redshift, which we made a requirement for inclusion in the sample. These are very accurate; no photometric redshifts are used to make the plot above.

One thing that impressed me when Jay made the initial version of this plot is how well the models match the evolution of m* at z < 2, which is most of cosmic time (the past ten billion years). This encourages one that the assumption adopted above, that we understand the evolution of stars well enough to do this, might actually be correct. I was, and remain, especially impressed with how well the monolithic model with a simple exponential star formation history matches these data. It’s as if the inferences the community had made about the evolution of giant elliptical galaxies from local observations were correct.

The new thing that Jay’s work showed was that the evolution of typical cluster galaxies at z > 2 persists in tracking the monolithic model that formed early (z_f = 10). There is a lot of scatter in the higher redshift data even though there is little at lower redshift. This is to be expected for both observational reasons – the data get rattier at larger distances – and theoretical ones: the exponential star formation history we assume is at best a crude average; at early times when short-lived but bright massive stars are present there will inevitably be stochastic variation around this trend. At later times the law of averages takes over and the scatter should settle down. That’s pretty much what we see.

What we don’t see is the decline in typical brightness predicted by contemporaneous LCDM models. The specific example shown is the Munich galaxy formation model based on the Millennium simulation. However, the prediction is generic: galaxies get faint at high redshift because they haven’t finished assembling yet. This is not a problem of misunderstanding stellar evolution, it is a failure of the hierarchical assembly paradigm.

In order to identify [proto]clusters at high redshift, Jay devised an algorithm to identify galaxies in close proximity on the sky and in redshift space, in excess of the average density around them. One question we had was whether the trend predicted by the LCDM model (the orange line above) would be reproduced in the data when analyzed in this way. To check, Jay made mock observations of a simulated lookback cone using the same algorithm. The results (not previously published) are the open squares in the plot above. These track the “right” answer known directly in the form of the orange line. Consequently, if the universe had looked as predicted, we could tell. It doesn’t.

The above plot is in terms of apparent magnitude. It is interesting to turn this into the corresponding stellar mass. There has also been work done on the subject after Jay’s, so I wanted to include it. An early version of a plot mapping m* to stellar mass and redshift to cosmic time that I came up with was this:

*The stellar mass of L* galaxies as a function of cosmic age. Data as noted in the inset. The purple/orange lines represent the monolithic/hierarchical models, as above.*

The more recent data (which also predate JWST) follow the same trend as the preceding data. All the data follow the path of the monolithic model. Note that the bulk of the stars are formed in situ in the first few billion years; the stellar mass barely changes after that. There is quite a bit of stellar evolution during this time, which is why m* in the figure above changes in a complicated fashion while the stellar mass remains constant. This again provides some encouragement that we understand how to model stellar populations.

The data in the first billion years are not entirely self-consistent. For example, the yellow points are rather higher in mass than the cyan points. This difference is not one in population modeling, but rather in how much of a correction is made for non-stellar, nebular emission. So as not to go down that rabbit hole, I chose to adopt the lowest stellar mass estimates for the figure that appears in the paper (below). Note that this is the most conservative choice; I’m trying to be as favorable to LCDM as is reasonably plausible.

**Figure 4** from McGaugh et al. (2024): *The characteristic stellar mass as a function of time with the corresponding redshift noted at the top.*

There were more recent models as well as more recent data, so I wanted to include those. There are, in fact, way too many models to illustrate without creating a confusing forest of lines, so in the end I chose a couple of popular ones, Illustris and FIRE. Illustris is the descendant of Millennium, and shows identical behavior. FIRE has a different scheme for forming stars, and does so more rapidly than Illustris. However, its predictions still fall well short of the data. This is because both simulations share the same LCDM cosmology with the same merger tree assembly of structure. Assembling the mass promptly enough is the problem; it isn’t simply a matter of making stars faster.

I’ll show one more version of this plot to illustrate the predicted evolutionary trajectories. In the plots above, I only show models that end up with the mass of a typical local giant elliptical. Galaxies come in a variety of masses, so what does that look like?

*The stellar mass of galaxies as a function of cosmic age. Data as above. The orange lines represent the hierarchical models that result in different final masses at z = 0.*

The curves of stellar growth predicted by LCDM have pretty much the same shape, just different amplitude. The most massive case illustrated above is reasonable insofar as there are real galaxies that massive, but they are rare. They are also rare in simulations, which make the predicted curve a bit jagged as there aren’t enough examples to define a smooth trajectory as there are for lower mass objects. More importantly, the shape is wrong. One can imagine that the galaxies we see at high redshift are abnormally massive, but even the most massive galaxies don’t start out that big at high redshift. Moreover, they continue to grow hierarchically in LCDM, so they wind up too big. In contrast, the data look like the monolithic model that we made on a lark, no muss, no fuss, no need to adjust anything.

This really shouldn’t have come as a surprise. We already knew that galaxies were impossibly massive at z ~ 4 before JWST discovered that this was also true at z ~ 10. The a priori prediction that LCDM has made since its inception (earlier models show the same thing) fails. More recent models fail, though I have faith that they will eventually succeed. This is the path theorists has always taken, and the obvious path here, as I remarked previously, is to make star formation (or at least light production) artificially more efficient so that the hierarchical model looks like the monolithic model. For completeness, I indulge in this myself in the paper (section 6.3) as an exercise in what it takes to save the phenomenon.

A two year delay

Regular readers of this blog will recall that in addition to the predictions I emphasized when JWST was launched, I also made a number of posts about the JWST results as they started to come in back in 2022. I had also prepared the above as a science paper that is now sections 1 to 3 of McGaugh et al. (2024). The idea was to have it ready to go so I could add a brief section on the new JWST results and submit right away – back in 2022. The early results were much as expected, but I did not rush to publish. Instead, it has taken over two years since then to complete what turned into a much longer manuscript. There are many reasons for this, but the scientific reason is that I didn’t believe many of the initial reports.

JWST was new and exciting and people fell all over themselves to publish things quickly. Too quickly. To do so, they relied on a calibration of the telescope plus detector system made while it was on the ground prior to launch. This is not the same as calibrating it on the sky, which is essential but takes some time. Consequently, some of the initial estimates were off.

Stellar masses and redshifts of galaxies from Labbe et al. The pink squares are the initial estimates that appeared in their first preprint in July 2022. The black squares with error bars are from the version published in February 2023. The shaded regions represent where galaxies are too massive too early for LCDM. The lighter region is where galaxies shouldn’t exist; the darker region is a where they cannot exist.

In the example above, all of the galaxies had both their initial mass and redshift estimates change with the updated calibration. So I was right to be skeptical, and wait for an improved analysis. I was also right that while some cases would change, the basic interpretation would not. All that happened in the example above was that the galaxies moved from the “can’t exist in LCDM” region (dark blue) into the “really shouldn’t exist in LCDM” region (light blue). However, the widespread impression was that we couldn’t trust photometric redshifts at all, so I didn’t see what new I could justifiably add in 2022. This was, after all, the attitude Jay and I had taken in his CCPC survey where we required spectroscopic redshifts.

So I held off. But then it became impossible to keep up with the fire hose of data that ensued. Every time I got the chance to update the manuscript, I found some interesting new result had been published that I had to include. New things were being discovered faster than I could read the literature. I found myself stuck in the Red Queen’s dilemma, running as fast as possible just to stay in place.

Ultimately, I think the delay was worthwhile. Lots new was learned, and actual spectroscopic redshifts began to appear. (Spectroscopy takes more telescope time than photometry – spreading out the light reduces the signal-to-noise per pixel, necessitating longer exposure times, so it always lags behind. One also discovers the galaxies in the same images that are used for photometry, so it also gets a head start.) Consequently, there is a lot more in the paper than I had planned on. This is another long blog post, so I will end it where I had planned for the original paper to end, with the updated version of the plot above.

Massive galaxies at high redshift from JWST

The stellar masses of galaxies discovered by JWST as a function of redshift is shown below. Unlike most of the plots above, these are individual galaxies rather than typical L* galaxies. Many are based on photometric redshifts, but those in solid black have spectroscopic redshifts. There are many galaxies that reside in a region they should not, at least according to LCDM models: their mass is too large at the observed redshift.

**Figure 6** from McGaugh et al. (2024): Mass estimates for high-redshift galaxies from JWST. Colored points based on photometric redshifts are from Adams et al. (2023; dark blue triangles), Atek et al. (2023; green circles), Labbé et al. (2023; open squares), Naidu et al. (2022; open star), Harikane et al. (2023; yellow diamonds), Casey et al. (2024; light blue left-pointing triangles), and Robertson et al. (2024; orange right-pointing triangles). Black points from Wang et al. (2023; squares), Carniani et al. (2024; triangles), Harikane et al. (2024; circles) and Castellano et al. (2024; star) have spectroscopic redshifts. The upper limit for the most massive galaxy in TNG100 (Springel et al. 2018) as assessed by Keller et al. (2023) is shown by the light blue line. This is consistent with the maximum stellar mass expected from the stellar mass–halo mass relation of Behroozi et al. (2020; solid blue line). These merge smoothly into the trend predicted by Yung et al. (2019b) for galaxies with a space density of 10⁻⁵ dex⁻¹ Mpc⁻³ (dashed blue line), though L. Yung et al. (2023) have revised this upward by ∼0.4 dex (dotted blue line). This closely follows the most massive objects in TNG300 (Pillepich et al. 2018; red line). The light gray region represents the parameter space in which galaxies were not expected in LCDM. The dark gray area is excluded by the limit on the available baryon mass (Behroozi & Silk 2018; Boylan-Kolchin 2023). [Note added: I copied this from the caption in our paper, but the links all seem to go to that rather than to each of the cited papers. You can get to them from our reference list if you want, but it’ll take some extra clicks. It looks like AAS has set it up this way to combat trawling by bots.]

One can see what I mean about a fire hose of results from the number of references given here. Despite the challenges of keeping track of all this, I take heart in the fact that many different groups are finding similar results. Even the results that were initially wrong remain problematic for LCDM. Despite all the masses and redshifts changing when the calibration was updated, the bulk of the data (the white squares, which are the black squares in the preceding plot) remain in the problematic region. The same result is replicated many times over by others.

The challenge, as usual, is assessing what LCDM actually predicts. The entire region of this plot is well away from the region predicted for typical galaxies. To reside here, a galaxy must be an outlier. But how extreme an outlier?

The dark gray region is the no-go zone. This is where dark matter halos do not have enough baryons to make the observed mass of stars. It should be impossible for galaxies to be here. I can think of ways to get around this, but that’s material for a future post. For now, it suffices to know that there should be no galaxies in the dark gray region. Indeed, there are not. A few straddle the edge, but nothing is definitively in that region given the uncertainties. So LCDM is not outright falsified by these data. This bar is set very low, as the galaxies that do skirt the edge require that basically all of the available baryons have been converted into starts practically instantaneously. This is not a reasonable.

*Not with ten thousand simulations could you do this.*

So what is a reasonable expectation for this diagram? That’s hard to say, but that’s what the white and light gray region attempts to depict. Galaxies might plausibly be in the white region but should not be in the light gray region for any sensible star formation efficiency.

One problem with this statement is that it isn’t clear what a sensible star formation efficiency is. We have a good idea of what it needs to be, on average, at low redshift. There is no clear indication that it changes as a function of redshift – at least until we hit results like this. Then we have to be on guard for confirmation bias in which we simply make the star formation efficiency be what we need it to be. (This is essentially what I advocate as the least unreasonable option in section 6.3 of the ApJ paper.)

OK, but what should the limit be? Keller et al. (2023) made a meta-analysis of the available simulations; I have used his analysis and my own reading of the literature to establish the lower boundary of the light gray area. It is conceivable that you would get the occasional galaxy this massive (the white region is OK), but not more so (the light gray region is not OK). The boundary is the most extreme galaxy in each simulation, so as far from typical as possible. The light gray region is really not OK; the only question is where exactly it sets in.

The exact location of this boundary is not easy to define. Different simulations give different answers for different reasons. These are extremal statistics; we’re asking what the one most massive galaxy is in an entire simulation. Higher resolution simulations perceive the formation of small structures like galaxies sooner, but large simulations have more opportunity for extreme events to happen. Which “wins” in terms of making the rare big galaxy early is a competition between these effects that appears, in my reading, to depend on details of simulation implementation that are unlikely to be representative of physical reality (even assuming LCDM is the correct underlying physics).

To make my own assessment, I reviewed the accessible simulations (they don’t all provide the necessary information) to fine the very most massive simulated galaxy as a function of redshift. As ever, I am looking for the case that is most favorable to LCDM. The version I found comes from the large-box, next generation Illustris simulation TNG300. This is the red line a bit into the gray area above. Galaxies really, really should not exist above or to the right of that line. Not only have I adopted the most generous simulation estimate I could find, I have also chosen not to normalize to the area surveyed by JWST. One should do this, but the area so far surveyed is tiny, so the line slides down. Even if galaxies as massive as this exist in TNG300, we have to have been really lucky to point JWST at that spot on a first go. So the red line is doubly generous, and yet there are still galaxies that exceed this limit.

The bottom line is that yes, JWST data pose a real problem for LCDM. It has been amusing watching this break people’s brains. I’ve seen papers that say this is a problem for LCDM because you’d have to turn more than half of the available baryons into stars and that’s crazy talk, and others that say LCDM is absolutely OK because there are enough baryons. The observational result is the same – galaxies with very high stellar-to-dark halo mass ratios, but the interpretation appears to be different because one group of authors is treating the light gray region as forbidden while the other sets the bar at the dark gray region. So the difference in interpretation is not a conflict in the data, but an inconsistency in what [we think] LCDM predicts.

That’s enough for today. Galaxy data at high redshift are clearly in conflict with the a priori predictions of LCDM. This was true before JWST, and remains true with JWST. Whether the observations can be reconciled with LCDM I leave as an exercise for scientists in the field, or at least until another post.

⁺A minor technical note: the Schechter function is widely used to describe the luminosity function of galaxies, so it provides a common language with which to quantify both their characteristic luminosity L* and space density Φ*. I make use of it here to quantify the brightness of the typical galaxy. It is, of course, not perfect. As we go from low to high redshift, the luminosity function becomes less Schechter-like and more power law-like, an evolution that you can see in Jay Franck’s plot. We chose to use Schechter fits for consistency with the previous work of Mancone et al. (2010) and Wylezalek et al. (2014), and also to down-weight the influence of the few very bright galaxies should they be active galactic nuclei or some other form of contaminant. Long story short, plausible contaminants (no photometric redshifts were used; sample galaxies all have spectroscopic redshifts) cannot explain the bulk of the data; our estimates of m* are robust and, if anything, underestimate how bright galaxies typically are.

A few videos for the new year

Happy new year to those who observe the Gregorian calendar. I will write a post on the observations that test the predictions discussed last time. It has been over a quarter century since Bob Sanders correctly predicted that massive galaxies would form by z = 10, and three years since I reiterated that for what JWST would see on this blog. This is a testament to both the scientific method and the inefficiency of communication.

Here I provide links to some recent interviews on the subject. These are listed in chronological order, which happen to flow in order of increasing technical detail.

The first entry is from my colleague Federico Lelli. It is in Italian rather than English, but short and easy on the ears. If nothing else, appreciate that Dr. Lelli did this on the absence of sleep afforded a new father.

Next is an interview I did with EarthSky. I thought this went well, and should be reasonably accessible.

Next is Scientific Sense:

Most recently, there is the entry from the AAS Journal Author Series. These are based on papers published in the journals of the American Astronomical Society in which authors basically narrate their papers, so this goes through it at an appropriately high (ApJ) level.

We discuss the “little red dots” some, which touches on the issues of size evolution that were discussed in the comments previously. I won’t add to that here beyond noting again that the apparent size evolution is proportional to (1+z), in the sense that high redshift galaxies are apparently smaller than those of similar stellar mass locally. This (1+z) is the factor that relates the angular diameter distance of the Robsertson-Walker metric to that of Euclidean geometry. Consequently, we would not infer any size evolution if the geometry were Euclidean. It’s as if cosmology flunks the Tolman test. Weird.

There is a further element of mystery towards the end where the notion that “we don’t know why” comes up repeatedly. This is always true at some deep philosophical level, but it is also why we construct and test hypotheses. Why does MOND persistently make successful predictions that LCDM did not? Usually we say the reason why has to do with the successful hypothesis coming closer to the truth.

That’s it for now. There will be more to come as time permits.

On the timescale for galaxy formation

I’ve been wanting to expand on the previous post ever since I wrote it, which is over a month ago now. It has been a busy end to the semester. Plus, there’s a lot to say – nothing that hasn’t been said before, somewhere, somehow, yet still a lot to cobble together into a coherent story – if that’s even possible. This will be a long post, and there will be more after to narrate the story of our big paper in the ApJ. My sole ambition here is to express the predictions of galaxy formation theory in LCDM and MOND in the broadest strokes.

A theory is only as good as its prior. We can always fudge things after the fact, so what matters most is what we predict in advance. What do we expect for the timescale of galaxy formation? To tell you what I’m going to tell you, it takes a long time to build a massive galaxy in LCDM, but it happens much faster in MOND.

Basic Considerations

What does it take to make a galaxy? A typical giant elliptical galaxy has a stellar mass of 9 x 10¹⁰ M_☉. That’s a bit more than our own Milky Way, which has a stellar mass of 5 or 6 x 10¹⁰ M_☉ (depending who you ask) with another 10¹⁰ M_☉ or so in gas. So, in classic astronomy/cosmology style, let’s round off and say a big galaxy is about 10¹¹ M_☉. That’s a hundred billion stars, give or take.

How much of the universe does it take to make one big galaxy? The critical density of the universe is the over/under point for whether an expanding universe expands forever, or has enough self-gravity to halt the expansion and ultimately recollapse. Numerically, this quantity is ρ_crit = 3H₀²/(8πG), which for H₀ = 73 km/s/Mpc works out to 10^-29 g/cm³ or 1.5 x 10^-7 M_☉/pc³. This is a very small number, but provides the benchmark against which we measure densities in cosmology. The density of any substance X is Ω_X = ρ_X/ρ_crit. The stars and gas in galaxies are made of baryons, and we know the baryon density pretty well from Big Bang Nucleosynthesis: Ω_b = 0.04. That means the average density of normal matter is very low, only about 4 x 10^-31 g/cm³. That’s less than one hydrogen atom per cubic meter – most of space is an excellent vacuum!

This being the case, we need to scoop up a large volume to make a big galaxy. Going through the math, to gather up enough mass to make a 10¹¹ M_☉ galaxy, we need a sphere with a radius of 1.6 Mpc. That’s in today’s universe; in the past the universe was denser by (1+z)³, so at z = 10 that’s “only” 140 kpc. Still, modern galaxies are much smaller than that; the effective edge of the disk of the Milky Way is at a radius of about 20 kpc, and most of the baryonic mass is concentrated well inside that: the typical half-light radius of a 10¹¹ M_☉ galaxy is around 6 kpc. That’s a long way to collapse.

Monolithic Galaxy Formation

Given this much information, an early concept was monolithic galaxy formation. We have a big ball of gas in the early universe that collapses to form a galaxy. Why and how this got started was fuzzy. But we knew how much mass we needed and the volume it had to come from, so we can consider what happens as the gas collapses to create a galaxy.

Here we hit a big astrophysical reality check. Just how does the gas collapse? It has to dissipate energy to do so, and cool to form stars. Once stars form, they may feed energy back into the surrounding gas, reheating it and potentially preventing the formation of more stars. These processes are nontrivial to compute ab initio, and attempting to do so obsesses much of the community. We don’t agree on how these things work, so they are the knobs theorists can turn to change an answer they don’t like.

Even if we don’t understand star formation in detail, we do observe that stars have formed, and can estimate how many. Moreover, we do understand pretty well how stars evolve once formed. Hence a common approach is to build stellar population models with some prescribed star formation history and see what works. Spiral galaxies like the Milky Way formed a lot of stars in the past, and continue to do so today. To make 5 x 10¹⁰ M_☉ of stars in 13 Gyr requires an average star formation rate of 4 M_☉/yr. The current measured star formation rate of the Milky Way is estimated to be 2 ± 0.7 M_☉/yr, so the star formation rate has been nearly constant (averaging over stochastic variations) over time, perhaps with a gradual decline. Giant elliptical galaxies, in contrast, are “red and dead”: they have no current star formation and appear to have made most of their stars long ago. Rather than a roughly constant rate of star formation, they peaked early and declined rapidly. The cessation of star formation is also called quenching.

A common way to formulate the star formation rate in galaxies as a whole is the exponential star formation rate, SFR(t) = SFR₀ e^-t/τ. A spiral galaxy has a low baseline star formation rate SFR₀ and a long burn time τ ~ 10 Gyr while an elliptical galaxy has a high initial star formation rate and a short e-folding time like τ ~ 1 Gyr. Many variations on this theme are possible, and are of great interest astronomically, but this basic distinction suffices for our discussion here. From the perspective of the observed mass and stellar populations of local galaxies, the standard picture for a giant elliptical was a large, monolithic island universe that formed the vast majority of its stars early on then quenched with a short e-folding timescale.

Galaxies as Island Universes

The density parameter Ω provides another useful way to think about galaxy formation. As cosmologists, we obsess about the global value of Ω because it determines the expansion history and ultimate fate of the universe. Here it has a more modest application. We can think of the region in the early universe that will ultimately become a galaxy as its own little closed universe. With a density parameter Ω > 1, it is destined to recollapse.

A fun and funny fact of the Friedmann equation is that the matter density parameter Ω_m → 1 at early times, so the early universe when galaxies form is matter dominated. It is also very uniform (more on that below). So any subset that is a bit more dense than average will have Ω > 1 just because the average is very close to Ω = 1. We can then treat this region as its own little universe (a “top-hat overdensity”) and use the Friedmann equation to solve for its evolution, as in this sketch:

*The expansion of the early universe a(t) (blue line). A locally overdense region may behave as a closed universe, recollapsing in a finite time (red line) to potentially form a galaxy.*

That’s great, right? We have a simple, analytic solution derived from first principles that explains how a galaxy forms. We can plug in the numbers to find how long it takes to form our basic, big 10¹¹ M_☉ galaxy and… immediately encounter a problem. We need to know how overdense our protogalaxy starts out. Is its effective initial Ω_m = 2? 10? What value, at what time? The higher it is, the faster the evolution from initially expanding along with the rest of the universe to decoupling from the Hubble flow to collapsing. We know the math but we still need to know the initial condition.

Annoying Initial Conditions

The initial condition for galaxy formation is observed in the cosmic microwave background (CMB) at z = 1090. Where today’s universe is remarkably lumpy, the early universe is incredibly uniform. It is so smooth that it is homogeneous and isotropic to one part in a hundred thousand. This is annoyingly smooth, in fact. It would help to have some lumps – primordial seeds with Ω > 1 – from which structure can grow. The observed seeds are too tiny; the typical initial amplitude is 10^-5 so Ω_m = 1.00001. That takes forever to decouple and recollapse; it hasn’t yet had time to happen.

The cosmic microwave background as observed by ESA’s Planck satellite. This is an all-sky picture of the relic radiation field – essentially a snapshot of the universe when it was just a few hundred thousand years old. The variations in color are variations in temperature which correspond to variations in density. These variations are tiny, only about one part in 100,000. The early universe was very uniform; the real picture is a boring blank grayscale. We have to crank the contrast way up to see these minute variations.

We would like to know how the big galaxies of today – enormous agglomerations of stars and gas and dust separated by inconceivably vast distances – came to be. How can this happen starting from such homogeneous initial conditions, where all the mass is equally distributed? Gravity is an attractive force that makes the rich get richer, so it will grow the slight initial differences in density, but it is also weak and slow to act. A basic result in gravitational perturbation theory is that overdensities grow at the same rate the universe expands, which is inversely related to redshift. So if we see tiny fluctuations in density with amplitude 10^-5 at z = 1000, they should have only grown by a factor of 1000 and still be small today (10^-2 at z = 0). But we see structures of much higher contrast than that. You can’t here from there.

The rich large scale structure we see today is impossible starting from the smooth observed initial conditions. Yet here we are, so we have to do something to goose the process. This is one of the original motivations for invoking cold dark matter (CDM). If there is a substance that does not interact with photons, it can start to clump up early without leaving too large a mark on the relic radiation field. In effect, the initial fluctuations in mass are larger, just in the invisible substance. (That’s not to say the CDM doesn’t leave a mark on the CMB; it does, but it is subtle and entirely another story.) So the idea is that dark matter forms gravitational structures first, and the baryons fall in later to make galaxies.

An illustration of the the linear growth of overdensities. Structure can grow in the dark matter (long dashed lines) with the baryons catching up only after decoupling (short dashed line). In effect, the dark matter gives structure formation a head start, nicely explaining the apparently impossible growth factor. This has been standard picture for what seems like forever (illustration from Schramm 1992).

With the right amount of CDM – and it has to be just the right amount of a dynamically cold form of non-baryonic dark matter (stuff we still don’t know actually exists) – we can explain how the growth factor is 10⁵ since recombination instead of a mere 10³. The dark matter got a head start over the stuff we can see; it looks like 10⁵ because the normal matter lagged behind, being entangled with the radiation field in a way the dark matter was not.

This has been the imperative need in structure formation theory for so long that it has become undisputed lore; an element of the belief system so deeply embedded that it is practically impossible to question. I risk getting ahead of the story, but it is important to point out that, like the interpretation of so much of the relevant astrophysical data, this belief assumes that gravity is normal. This assumption dictates the growth rate of structure, which in turn dictates the need to invoke CDM to allow structure to form in the available time. If we drop this assumption, then we have to work out what happens in each and every alternative that we might consider. That definitely gets ahead of the story, so first let’s understand what we should expect in LCDM.

Hierarchical Galaxy formation in LCDM

LCDM predicts some things remarkably well but others not so much. The dark matter is well-behaved, responding only to gravity. Baryons, on the other hand, are messy – one has to worry about hydrodynamics in the gas, star formation, feedback, dust, and probably even magnetic fields. In a nutshell, LCDM simulations are very good at predicting the assembly of dark mass, but converting that into observational predictions relies on our incomplete knowledge of messy astrophysics. We know what the mass should be doing, but we don’t know so well how that translates to what we see. Mass good, light bad.

Starting with the assembly of mass, the first thing we learn is that the story of monolithic galaxy formation outlined above has to be wrong. Early density fluctuations start out tiny, even in dark matter. God didn’t plunk down island universes of galaxy mass then say “let there be galaxies!” The annoying initial conditions mean that little dark matter halos form first. These subsequently merge hierarchically to make ever bigger halos. Rather than top-down monolithic galaxy formation, we have the bottom-up hierarchical formation of dark matter halos.

The hierarchical agglomeration of dark matter halos into ever larger objects is often depicted as a merger tree. Here are four examples from the high resolution Illustris TNG50 simulation (Pillepich et al. 2019; Nelson et al. 2019).

Examples of merger trees from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019). Objects have been selected to have very nearly the same stellar mass at z=0. Mass is built up through a series of mergers. One large dark matter halo today (at top) has many antecedents (small halos at bottom). These merge hierarchically as illustrated by the connecting lines. *The size of the symbol is proportional to the halo mass.* I have added redshift *and the corresponding age of the universe for vanilla LCDM* in a more legible font. *The color bar illustrates the specific star formation rate*: the top row has objects that are still actively star forming like spirals; those in the bottom row are “red and dead” – things that have stopped forming stars, like giant elliptical galaxies. In all cases, there is a lot of merging and a modest rate of growth, with the typical object taking about half a Hubble time (~7 Gyr) to assemble half of its final stellar mass.

The hierarchical assembly of mass is generic in CDM. Indeed, it is one of its most robust predictions. Dark matter halos start small, and grow larger by a succession of many mergers. This gradual agglomeration is slow: note how tiny the dark matter halos at z = 10 are.

Strictly speaking, it isn’t even meaningful to talk about a single galaxy over the span of a Hubble time. It is hard to avoid this mental trap: surely the Milky Way has always been the Milky Way? so one imagines its evolution over time. This is monolithic thinking. Hierarchically, “the galaxy” refers at best to the largest progenitor, the object that traces the left edge of the merger trees above. But the other protogalactic chunks that eventually merge together are as much part of the final galaxy as the progenitor that happens to be largest.

This complicated picture is complicated further by what we can see being stars, not mass. The luminosity we observe forms through a combination of in situ growth (star formation in the largest progenitor) and ex situ growth through merging. There is no reason for some preferred set of protogalaxies to form stars faster than the others (though of course there is some scatter about the mean), so presumably the light traces the mass of stars formed traces the underlying dark mass. Presumably.

That we should see lots of little protogalaxies at high redshift is nicely illustrated by this lookback cone from Yung et al (2022). Here the color and size of each point corresponds to the stellar mass. Massive objects are common at low redshift but become progressively rare at high redshift, petering out at z > 4 and basically absent at z = 10. This realization of the observable stellar mass tracks the assembly of dark mass seen in merger trees.

This is what we expect to see in LCDM: lots of small protogalaxies at high redshift; the building blocks of later galaxies that had not yet merged. The observation of galaxies much brighter than this at high redshift by JWST poses a fundamental challenge to the paradigm: mass appears not to be subdivided as expected. So it is entirely justifiable that people have been freaking out that what we see are bright galaxies that are apparently already massive. That shouldn’t happen; it wasn’t predicted to happen; how can this be happening?

That’s all background that is assumed knowledge for our ApJ paper, so we’re only now getting to its Figure 1. This combines one of the merger trees above with its stellar mass evolution. The left panel shows the assembly of dark mass; the right pane shows the growth of stellar mass in the largest progenitor. This is what we expect to see in observations.

**Fig. 1** from McGaugh et al (2024): A merger tree for a model galaxy from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019, left panel) selected to have M_∗ ≈ 9 × 10¹⁰ M_⊙ at z = 0; i.e., the stellar mass of a local L^∗ giant elliptical galaxy (Driver et al. 2022). Mass assembles hierarchically, starting from small halos at high redshift (bottom edge) with the largest progenitor traced along the left of edge of the merger tree. The growth of stellar mass of the largest progenitor is shown in the right panel. This example (jagged line) is close to the median (dashed line) of comparable mass objects (Rodriguez-Gomez et al. 2016), and within the range of the scatter (the shaded band shows the 16th – 84th percentiles). A monolithic model that forms at z_f = 10 and evolves with an exponentially declining star formation rate with τ = 1 Gyr (purple line) is shown for comparison. The latter model forms most of its stars earlier than occurs in the simulation.

For comparison, we also show the stellar mass growth of a monolithic model for a giant elliptical galaxy. This is the classic picture we had for such galaxies before we realized that galaxy formation had to be hierarchical. This particular monolithic model forms at z_f = 10 and follows an exponential star formation rate with τ = 1 Gyr. It is one of the models published by Franck & McGaugh (2017). It is, in fact, the first model I asked Jay to construct when he started the project. Not because we expected it to best describe the data, as it turns out to do, but because the simple exponential model is a touchstone of stellar population modeling. It was a starter model: do this basic thing first to make sure you’re doing it right. We chose τ = 1 Gyr because that was the typical number bandied about for elliptical galaxies, and z_f = 10 because that seemed ridiculously early for a massive galaxy to form. At the time we built the model, it was ludicrously early to imagine a massive galaxy would form, from an LCDM perspective. A formation redshift z_f = 10 was, less than a decade ago, practically indistinguishable from the beginning of time, so we expected it to provide a limit that the data would not possibly approach.

In a remarkably short period, JWST has transformed z = 10 from inconceivable to run of the mill. I’m not going to go into the data yet – this all-theory post is already a lot – but to offer one spoiler: the data are consistent with this monolithic model. If we want to “fix” LCDM, we have to make the red line into the purple line for enough objects to explain the data. That proves to be challenging. But that’s moving the goalposts; the prediction was that we should see little protogalaxies at high redshift, not massive, monolith-style objects. Just look at the merger trees at z = 10!

Accelerated Structure Formation in MOND

In order to address these issues in MOND, we have to go back to the beginning. What is the evolution of a spherical region (a top-hat overdensity) that might collapse to form a galaxy? How does a spherical region under the influence of MOND evolve within an expanding universe?

The solution to this problem was first found by Felten (1984), who was trying to play the Newtonian cosmology trick in MOND. In conventional dynamics, one can solve the equation of motion for a point on the surface of a uniform sphere that is initially expanding and recover the essence of the Friedmann equation. It was reasonable to check if cosmology might be that simple in MOND. It was not. The appearance of a₀ as a physical scale makes the solution scale-dependent: there is no general solution that one can imagine applies to the universe as a whole.

Felten reasonably saw this as a failure. There were, however, some appealing aspects of his solution. For one, there was no such thing as a critical density. All MOND universes would eventually recollapse irrespective of their density (in the absence of the repulsion provided by a cosmological constant). It could take a very long time, which depended on the density, but the ultimate fate was always the same. There was no special value of Ω, and hence no flatness problem. The latter obsessed people at the time, so I’m somewhat surprised that no one seems to have made this connection. Too soon*, I guess.

There it sat for many years, an obscure solution for an obscure theory to which no one gave credence. When I became interested in the problem a decade later, I started methodically checking all the classic results. I was surprised to find how many things we needed dark matter to explain were just as well (or better) explained by MOND. My exact quote was “surprised the bejeepers out of us.” So, what about galaxy formation?

I started with the top-hat overdensity, and had the epiphany that Felten had already obtained the solution. He had been trying to solve all of cosmology, which didn’t work. But he had solved the evolution of a spherical region that starts out expanding with the rest of the universe but subsequently collapses under the influence of MOND. The overdensity didn’t need to be large, it just needed to be in the low acceleration regime. Something like the red cycloidal line in the second plot above could happen in a finite time. But how much?

The solution depends on scale and needs to be solved numerically. I am not the greatest programmer, and I had a lot else on my plate at the time. I was in no rush, as I figured I was the only one working on it. This is usually a good assumption with MOND, but not in this case. Bob Sanders had had the same epiphany around the same time, which I discovered when I received his manuscript to referee. So all credit is due to Bob: he said these things first.

First, he noted that galaxy formation in MOND is still hierarchical. Small things form first. Crudely speaking, structure formation is very similar to the conventional case, but now the goose comes from the change in the force law rather than extra dark mass. MOND is nonlinear, so the whole process gets accelerated. To compare with the linear growth of CDM:

A sketch of how structures grow over time under the influence of cold dark matter (left, from Schramm 1992, same as above) and MOND (right, from Sanders & McGaugh 2002; see also this further discussion and previous post). The slow linear growth of CDM (long-dashed line, left panel) is replaced by a rapid, nonlinear growth in MOND (solid lines at right; numbers correspond to different scales). Nonlinear growth moderates after cosmic expansion begins to accelerate (dashed vertical line in right panel).

The net effect is the same. A cosmic web of large scale structure emerges. They look qualitatively similar, but everything happens faster in MOND. This is why observations have persistently revealed structures that are more massive and were in place earlier than expected in contemporaneous LCDM models.

*Simulated structure formation in ΛCDM (top) and MOND (bottom) showing the more rapid emergence of similar structures in MOND (note the redshift of each panel). From McGaugh (2015).*

In MOND, small objects like globular clusters form first, but galaxies of a range of masses all collapse on a relatively short cosmic timescale. How short? Let’s consider our typical 10¹¹ M_☉ galaxy. Solving Felten’s equation for the evolution of a sphere numerically, peak expansion is reached after 300 Myr and collapse happens in a similar time. The whole galaxy is in place speedy quick, and the initial conditions don’t really matter: a uniform, initially expanding sphere in the low acceleration regime will behave this way. From our distant vantage point thirteen billion years later, the whole process looks almost monolithic (the purple line above) even though it is a chaotic hierarchical mess for the first few hundred million years (z > 14). In particular, it is easy to form half of the stellar mass early on: the mass is already assembled.

This is what JWST sees: galaxies that are already massive when the universe is just half a billion years old. I’m sure I should say more but I’m exhausted now and you may be too, so I’m gonna stop here by noting that in 1998, when Bob Sanders predicted that “Objects of galaxy mass are the first virialized objects to form (by z=10),” the contemporaneous prediction of LCDM was that “present-day disc [galaxies] were assembled recently (at z<=1)” and “there is nothing above redshift 7.” One of these predictions has been realized. It is rare in science that such a clear a priori prediction comes true, let alone one that seemed so unreasonable at the time, and which took a quarter century to corroborate.

*I am not quite this old: I was still an undergraduate in 1984. I hadn’t even decided to be an astronomer at that point; I certainly hadn’t started following the literature. The first time I heard of MOND was in a graduate course taught by Doug Richstone in 1988. He only mentioned it in passing while talking about dark matter, writing the equation on the board and saying maybe it could be this. I recall staring at it for a long few seconds, then shaking my head and muttering “no way.” I then completely forgot about it, not thinking about it again until it came up in our data for low surface brightness galaxies. I expect most other professionals have the same initial reaction, which is fair. The test of character comes when it crops up in their data, as it is doing now for the high redshift galaxy community.

Massive Galaxies at High Redshift: we told you so

I was raised to believe that it was rude to tell people I told you so. Yet that’s pretty much the essence of the scientific method: we test hypotheses by making predictions, then checking to see which told us the correct result in advance of the experiment. So: I told you so.

Our paper on massive galaxies at high redshift is out in the Astrophysical Journal today. This is a scientific analysis of the JWST data that has accumulated to date as it pertains to testing galaxy formation as hypothesized by LCDM and MOND. That massive galaxies are observed to form early (z > 10) corroborates the long standing prediction of MOND, going back to Sanders (1998):

Objects of galaxy mass are the first virialized objects to form (by z=10), and larger structure develops rapidly

The contemporaneous LCDM prediction from Mo, Mao, & White (1998) – a touchstone of galaxy formation theory with nearly 2,000 citations – was

present-day disc [galaxies] were assembled recently (at z<=1).

This is not what JWST sees, as morphologically mature spiral galaxies are present to at least z = 6 (Ferreira et al 2024). More generally, LCDM was predicted to take a long time to build up the stellar mass of large galaxies, with the median time to reach half the final stellar mass being about half a Hubble time (seven billion years, give or take). In contrast, JWST has now observed many galaxies that meet this benchmark in the first billion years. That was not expected to happen.

In short, one theory got its prediction right, and the other got it wrong. I say expected, because we can always attempt to modify a theory to accommodate new facts. The a priori predictions of LCDM were wrong, but can it be adjusted to explain the data? Perhaps – but if so, that’s because it is incredibly flexible. That’s normally considered to be a bad thing in a theory, not a strength, especially when a competing theory got it right in the first place.

This has happened over and over and over again. After the initial shock of having MOND’s predictions come true in my own data (how can this be so?), I’ve spent the decades since devising and executing new tests of both theories. When it comes to making a priori predictions, MOND has won over and over again. It has consistently had more predictive success.

If you are a scientist reading this and that statement doesn’t sound right to you, that’s because you haven’t been paying attention. I get it: MOND seems too unlikely to pay attention to. I certainly didn’t before it reared its head in my own data. So ask yourself: what do you actually know about MOND? IT’S WRONG! OK, after that. Seriously: how many papers have you read about MOND? Do you know what its predictions are? Do you know what its successes are, or only just its failings? Can you write down its formula? If the answers to these questions do not come easily to you, it’s because you haven’t taken it seriously. Which, again, I get. But it is also an indication that you may not be playing with a complete set of facts. Ignorance is not a strong position from which to make scientific judgements.

I will expand more on the content of the science paper in future posts. For now, it boils down to I told you so.

You can also read more in SciNews, Newsweek, and the most in-depth article so far, in Courthouse News.

What if we never find dark matter?

Some people have asked me to comment on the Scientific American article What if We Never Find Dark Matter? by Slatyer & Tait. For the most part, I find it unobjectionable – from a certain point of view. It is revealing to examine this point of view, starting with the title, which frames the subject in a way that gives us permission to believe in dark matter while never finding it. This framing is profoundly unscientific, as it invites a form of magical thinking that could usher in a thousand years of dark epicycles (feedback being the modern epicycle) on top of the decades it has already sustained.

The article does recognize that a modification of gravity is at least a logical possibility. The mere mention of this is progress, if grudging and slow. They can’t bring themselves to name a specific theory: they never say MOND and only allude obliquely to a single relativistic theory as if saying its name out loud would bring a curse^% upon their house.

Of course, they mention modified gravity merely to dismiss it:

A universe without dark matter would require striking modifications to the laws of gravity… [which] seems exceptionally difficult.

Yes it is. But it has also proven exceptionally difficult to detect dark matter. That hasn’t stopped people from making valiant efforts to do so. So the argument is that we should try really hard to accomplish the exceptionally difficult task of detecting dark matter, but we shouldn’t bother trying to modify gravity because doing so would be exceptionally difficult.

This speaks to motivations – is one idea better motivated? In the 1980s, cold dark matter was motivated by both astronomical observations and physical theory. Absent the radical thought of modifying gravity, we had a clear need for unseen mass. Some of that unseen mass could simply have been undetected normal matter, but most of it needed to be some form of non-baryonic dark matter that exceeded the baryon density allowed by Big Bang Nucleosynthesis and did not interact directly with photons. That meant entirely new physics from beyond the Standard Model of particle physics: no particle in the known stable of particles suffices. This new physics was seen as a good thing, because particle physicists already had the feeling that there should be something more than the Standard Model. There was a desire for Grand Unified Theories (GUTs) and supersymmetry (SUSY). SUSY naturally provides a home for particles that could be the dark matter, in particular the Weakly Interacting Massive Particles (WIMPs) that are the prime target for the vast majority of experiments that are working to achieve the exceptionally difficult task of detecting them. So there was a confluence of reasons from very different perspectives to make the search for WIMPs very well motivated.

That was then. Fast forward a few decades, and the search for WIMPs has failed. Repeatedly. Continuing to pursue it is an example of the sunk cost fallacy. We keep doing it because we’ve already done so much of it that surely we should keep going. So I feel the need to comment on this seemingly innocuous remark:

although many versions of supersymmetry predict WIMP dark matter, the converse isn’t true; WIMPs are viable dark matter candidates even in a universe without supersymmetry.

Strictly speaking, this is correct. It is also weak sauce. The neutrino is an example of a weakly interacting particle that has some mass. We know neutrinos exist, and they reside in the Standard Model – no need for supersymmetry. We also know that they cannot be the dark matter, so it would be disingenuous to conflate the two. Beyond that, it is possible to imagine a practically infinite variety of particles that are weakly interacting by not part of supersymmetry. That’s just throwing mud at the wall. SUSY WIMPs were extraordinarily well motivated, with the WIMP miracle being the beautiful argument that launched a thousand experiments. But lacking SUSY – which seems practically dead at this juncture – WIMPS as originally motivated are dead along with it. The motivation for more generic WIMPs is lacking, so the above statement is nothing more than an assertion that runs interference for the fact that we no longer have good reason to expect WIMPs at all.

There is also an element of disciplinary-centric thinking: if you’re a particle physicist, you can build a dark matter detector and maybe make a major discovery or at least get great gobs of grants in the effort to do so. If instead what is going on is really a modification of gravity, then your expertise is irrelevant and there is no reason to keep shoveling money into your field. Worse, a career spent at the bottom of a mine shaft working on dark matter detectors is a waste of effort. I can understand why people don’t want to hear that message, but that just brings us back to the sunk cost fallacy.

Speaking of money, I occasionally get scientists who come up to me Big Mad that grant money gets spent on MOND research, as that would be a waste of taxpayer money. I can assure them that no government dollars have been harmed in the pursuit of MOND research. Certainly not in the U.S., at any rate. But lots and lots of tax dollars have been burned in the search for dark matter, and the article we’re discussing advocates spending a whole lot more to search for dark matter candidates that are nowhere near as well motivated as WIMPs were. That’s why I keep asking: how do we know when to stop? I don’t expect other scientists to agree to my interpretation of the data, but I do expect them to have a criterion whereby they would accede that dark matter is incorrect. If we lack any notion of how we could figure out that we are wrong, then we’ve made the leap from science to religion. So far, such criteria are sadly lacking, and I see precious little evidence of people rising to the challenge. Indeed, I frequently get the opposite, as other scientists have frequently asserted to me that they would only consider MOND as a last resort. OK, when does that happen? There’s always another particle we can think up, so the answer seems to be “never.”

I wrote long ago that “After WIMPs, the next obvious candidate is axions.” Sure enough, this article spills a lot of ink discussing axions. Rather than dwell on this different doomed idea for dark matter, let’s take a gander at the remarkable art made to accompany the article, because we are visual animals and graphical representations are important.

Where to start? Right in the center is a scroll of an old-timey star chart. On top of that are several depictions of what I guess are meant to be galaxies*. Around those is an ethereal dragon representing the unknown dark matter. The depiction of dark matter as an unfathomable monster is at once both spot on and weirdly anthropomorphic. Is this a fabled beast the adventurous hero is supposed to seek out and slay? or befriend? or maybe it is a tale in which he grows during the journey to realize he has been on the wrong path the whole time? I love the dragon as art, but as a representation of a scientific subject it imparts an aura of teleological biology to something that is literally out of this world, residing in a dark sector that is not part of our daily experience and may be entirely inaccessible to our terrestrial experimentation. Off the edge of the map and on into extra dimensions: here there be monsters.

The representations here are fantastic. There is the coffee mug and the candle to represent the hard work of those of us who burn the candle at both ends wrestling with the dark matter problem. There’s a magnifying glass to represent how hard the experimentalists have looked for the dark matter. Scattered around are various totems, like the Polaroid-style picture at right depicting the gravitational lensing around a black hole. This is cool, but has squat to do with the missing mass problem. It’s more a nod to General Relativity and the Faith we have therein, albeit in a regime many orders of magnitude removed from the one that concerns us here. On the left is an old newspaper article about WIMPs, complete with a sketch of a Feynman diagram that depicts how we might detect them. And at the top, peeking out of a book, as it were a thought made long ago now seeking new relevance, a note saying Axions!

I can save everyone a lot of time, effort, and expense. It ain’t WIMPs and it ain’t axions. Nor is the dark matter any of the plethora of other ideas illustrated in the eye-watering depiction of the landscape of particle possibilities in the article. These simply add mass while providing no explanation of the observed MOND phenomenology. This phenomenology is fundamental to the problem, so any approach that ignores it is doomed to failure. I’m happy to consider explanations based on dark matter, but these need to have a direct connection to baryons baked-in to be viable. None of the ideas they discuss meet this minimum criterion.

Of course it could be that MOND – either as modified gravity or modified inertia, an important possibility that usually gets overlooked – is essentially correct and that’s why it keeps having predictions come true. That’s what motivates considering it now: repeated and sustained predictive success, particularly for phenomena that dark matter does not provide a satisfactory explanation for.

Of course, this article advocating dark matter is at pains to dismiss modified gravity as a possibility:

The changes [of modified gravity] would have to mimic the effects of dark matter in astrophysical systems ranging from giant clusters of galaxies to the Milky Way’s smallest satellite galaxies. In other words, they would need to apply across an enormous range of scales in distance and time, without contradicting the host of other precise measurements we’ve gathered about how gravity works. The modifications would also need to explain why, if dark matter is just a modification to gravity—which is universally associated with all matter—not all galaxies and clusters appear to contain dark matter. Moreover, the most sophisticated attempts to formulate self-consistent theories of modified gravity to explain away dark matter end up invoking a type of dark matter anyway, to match the ripples we observe in the cosmic microwave background, leftover light from the big bang.

That’s a lot, so let’s break it down. First, that modified gravity “would have to mimic the effects of dark matter” gets it exactly backwards. It is dark matter that has to mimic the effects of MOND. That’s an easy call: dark matter plus baryons could combine in a large variety of ways that might bear no resemblance to MOND. Indeed, they should do that: the obvious prediction of LCDM-like theories is an exponential disk in an NFW halo. In contrast, there is one and only one thing that can happen in MOND since there is a single effective force law that connects the dynamics to the observed distribution of baryons. Galaxies didn’t have to do that, shouldn’t do that, but remarkably they do. The uniqueness of this relation poses a problem for dark matter that has been known since the previous century:

*Reluctant conclusions from McGaugh & de Blok (1998). As we said at the time, “This result surprised the bejeepers out of us, too.”*

This basic conclusion has not changed over the years, only gotten stronger. The equation coupling dark to luminous matter I wrote down in all generality in McGaugh (2004) and again in McGaugh et al. (2016). The latter paper is published in Physical Review Letters, arguably the most prominent physics journal, and is in the top percentile of citation rates, so it isn’t some minuscule detail buried in an obscure astronomical journal that might have eluded the attention of particle physicists. It is the implication that conclusion [1] could be correct that bounces off a protective shell of cognitive dissonance so hard that the necessary corollary [2] gets overlooked.

OK, that’s just the first sentence. Let’s carry on with “[the modification] would need to apply across an enormous range of scales in distance and time, without contradicting the host of other precise measurements we’ve gathered about how gravity works.” Well, duh. That’s the first thing I checked. Thoroughly and repeatedly. I’ve written many reviews on the subject. They’re either unaware of some well-established results, or choose to ignore them.

The reason MOND doesn’t contradict the host of other constraints about how gravity works is simple. It happens in the low acceleration regime, where the only test of gravity is provided by the data that evince the mass discrepancy. If we had posed galaxy observations as a test of GR, we would have concluded that it fails at low accelerations. Of course we didn’t do that; we observed galaxies because we were interested in how they worked, then inferred the need for dark matter when gravity as we currently know it failed to explain the data. Other tests, regardless how precise, are irrelevant if they probe accelerations higher than Milgrom’s constant (1.2 x 10^-10 m/s/s).

Continuing on, there is the complaint that “modifications would also need to explain why… not all galaxies and clusters appear to contain dark matter.” Yep, you gotta explain all the data. That starts with the vast majority of the data that do follow the radial acceleration relation, which is not satisfactorily explained by dark matter. They skip⁺ past that part, preferring to ignore the forest in order to complain about a few outlying trees. There are some interesting cases, to be sure, but this complaint about objects lacking dark matter is misplaced for deeper reasons. It makes no sense in terms of dark matter that there are objects without dark matter. That shouldn’t happen in LCDM any more than in MOND^$. One winds up invoking non-equilibrium effects, which we can do in MOND just as we do in dark matter. It is not satisfactory in either case, but it is weird to complain about it for one theory while not for the other. This line of argument is perilously close to the a priori fallacy.

The last line, “the most sophisticated attempts to formulate self-consistent theories of modified gravity to explain away dark matter end up invoking a type of dark matter anyway, to match the ripples we observe in the cosmic microwave background” actually has some merit. The theory they’re talking about is Aether-Scalar-Tensor (AeST) theory, which I guess earns the badge of “most sophisticated” because it fits the power spectrum of the cosmic microwave background (CMB).

I’ve discussed the CMB in detail before, so won’t belabor it here. I will note that the microwave background is only one piece of many lines of evidence, and the conclusion one reaches depends on how one chooses to weigh the various incommensurate evidence. That they choose to emphasize this one thing while entirely eliding the predictive successes of MOND is typical, but does not encourage me to take this as a serious argument, especially when I had more success predicting important aspects of the microwave background than did the entire community that persistently cites the microwave background to the exclusion of all else.

It is also a bit strange to complain that AeST “explain[s] away dark matter [but] end[s] up invoking a type of dark matter.” I think what they mean here is true at the level of quantum field theory where all particles are fields and all fields are particles, but beyond that, they aren’t the same thing at all. It is common for modified gravity theories to invoke scalar fields^#, and this is an important degree of freedom that enables AeST to fit the CMB. TeVeS also added a scalar and tensor field, but could not fit the CMB, so this approach isn’t guaranteed to work. But are these a type of dark matter? Or are our ideas of dark matter mimicking a scalar field? It seems like this argument could cut either way, and we’re just granting dark matter priority as a concept because we thought of it first. I don’t think nature cares about the order of our thoughts.

None of this addresses the question of the year. Why does MOND get any predictions right? Just saying “dark matter does it” is not sufficient. Until scientists engage seriously with this question, they’re doomed to chasing phantoms that aren’t there to catch.

^%From what I’ve seen, they’re probably right to fear the curses of their colleagues for such blasphemy. Very objective, very scientific.

*Galaxies are nature’s artwork; human imitations never seem adequate. These look more like fried eggs to me. On the whole, this art is exceptionally well informed by science, or at least by particle physics, but not so much by astronomy. And therein lies the greater problem: there is a whole field of physics devoted to dark matter that is entirely motivated by astronomical observations yet its practitioners are, by and large, remarkably ignorant of anything more than the most rudimentary aspects of the data that motivate their field’s existence.

⁺There seems to be a common misconception that anything we observe is automatically explained by dark matter. That’s only true at the level of inference: any excess gravity is attributable to unseen mass. That’s why a hypothesis is only as good as its prior; a mere inference isn’t science, you have to make a prediction. Once you do that, you find dark matter might do lots of things that are not at all like the MONDian phenomenology that we observe. While I would hope the need for predictions is obvious, many scientists seem to conflate observation with prediction – if we observe it, that’s what dark matter must predict!

^$The discrepancy should only appear below the critical acceleration scale in MOND. So strictly speaking, MOND does predict that there should be objects without dark matter: systems that are high acceleration. The central regions of globular clusters and elliptical galaxies are such regions, and MOND fares well there. In contrast, it is rather hard to build a sensible dark matter model that is as baryon dominated as observed. So this is an example of MOND explaining the absence of dark matter better than dark matter theory. This is related to the observation that the apparent need for dark matter only appears at low accelerations, at a scale that dark matter knows nothing about.

^#I, personally, am skeptical of this approach, as it seems too generic (let’s add some new freedom!) when it feels like we’re missing something fundamental, perhaps along the lines of Mach’s Principle. However, I also recognize that this is a feeling on my part; it is outside my training to have a meaningful opinion.

A Nobel prize in physics for something that is not physics

When I wrote about Nobel prizes a little while back, I did not expect to return to the subject. I assumed the prize this year would be awarded for some meritorious advance in laboratory physics, like last year’s prize “for experimental methods that generate attosecond pulses of light for the study of electron dynamics in matter.” Instead, we find that the 2024 prize has been awarded to John Hopfield and Geoffrey Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.” This is the Nobel prize in physics we’re talking about.

One small issue: that’s not physics.

I’ve been concerned for a long time with the interface between astronomy and physics – where they are distinct fields and where they overlap. One of the reasons I left physics as a grad student was because the string theorists were taking over. They were talking about phenomena that were tens of orders of magnitude beyond any conceivable experimental test. That sort of theoretical speculation is often fun, sometimes important, and very rarely relevant to physical reality. Lacking exposure to experimental tests or observational consequences, to my mind it was just that: speculation, not physics.

Nearly forty years on, my concerns about string theory have not been misplaced. And while, in the strictest sense, I don’t think it qualifies as physics – it’s more of a physics-adjacent branch of mathematics – it is at least attempting to be physical theory. But machine learning is not physics. It’s computer science. Computers are a useful tool, to be sure. But programming them is no more physics than teaching a horse to count.

I’m not sure we should even consider machine learning to be meritorious. It can be useful, but it is also a gateway drug to artificial intelligence (AI). I remember the more earnest proponents of early AI propounding on the virtues of LISP and how it would bring us AI – in the 1980s. All it brought us then was dystopian fantasies about killer robots nuking the world. Despite the current hype, we have not now developed intelligent machines – what we’re calling AI is certainly artificial but not at all intelligent. It uses machine “learning” to reprocess existing information into repackaged forms. There is zero original thought, nothing resembling intelligence. Modern AI is, in essence, a bullshit generator. Now, we can all think of people who qualify as organic bullshit generators, but that begs the question:

Why is the Nobel prize in physics being awarded for something that is clearly not physics?

Maybe it has something to do with the hype around AI. I don’t know what the decision process was, but I do know that I am not the only scientist to have this reaction.

Myself, I’m not mad, just disappointed. I’m not unique in feeling that physics has lost its way. This just emphasizes how far it has strayed.

Apparently the Nobel committee is sensing the blow-back, as this poll currently appears on the award page:

I… don’t think this helps their case. Did you know that molecules are made of atoms? Ergo all of chemistry is just applied atomic physics. I mean, it is a long-standing trope that physicists think every other science is just a lesser, applied form of physics. At the level of being based on the equations of physics, that’s almost kinda if not really true. So asserting that machine learning models are based on physics equations comes nowhere near to making machine learning into physics. It’s fancy programming, not physics.

Well, there will be complaints about this one for a while, so I won’t pile on more. I guess if you give out 118 prizes since 1901, one of them has to rank 118th.

Sociology in the hunt for dark matter

Who we give prizes to is more a matter of sociology than science. Good science is a prerequisite, but after that it is a matter of which results we value in the here and now. Results that are guaranteed to get a Nobel prize, like the detection of dark matter, attract many suitors who pursue them vigorously. Results that come as a surprise can be more important than the expected results, but it takes a lot longer to recognize and appreciate them.

When there are expected results with big stakes, sociology kicks into hyperdrive. Let’s examine the attitudes in some recent quotes:

In Science, Hunt for dark matter particles bags nothing—again (24 Aug 2024): Chamkaur Ghag says

If WIMPs were there, we have the sensitivity to have seen them

which is true. WIMP detection experiments have succeeded in failing. They have explored the predicted parameter space. But in the same paragraph, it is said that it is too early to “give up hope of detecting WIMPs.” That is a pretty vague assertion, and is precisely why I’ve been asking other scientists to define a criterion by which we could agree that enough was enough already. How do we know when to stop looking?

The same paragraph ends with

This is our first real foray into discovery territory

which is not true. We’ve explored the region in which WIMPs were predicted to reside over and over and over again. This was already excruciatingly old news when I wrote about it in 2008. The only way to spin this as a factual statement is to admit that the discovery territory is practically infinite, in which case we can assert that every foray is our first “real” foray because we’ll never get anywhere relative to infinity. It sounds bad when put that way, which is the opposite of the positivity the spokespeople for huge experiments are appointed to project.

And that’s where the sociology kicks in. The people who do the experiments want to keep doing the experiments until they discover dark matter and win the Nobel prize. It’s disappointing that this hasn’t happened already, but it is an expected result. It’s what they do, so it’s natural to want to keep at it.

On the one hand, I’d like to see these experiments continue until they reach the neutrino fog, at which point they will provide interesting astrophysical information. Says Michael Murra (in Science News, 25 July 2024)

It’s very cool to see that we can turn this detector into a neutrino observatory

Yes, it is. But that wasn’t the point, was it?

On the other hand, I do not expect these experiments to ever detect dark matter. That’s because I understand that the astronomical data contain self-contradictions to their interpretation in terms of dark matter. Any particle physicist will tell you that astronomical data require dark matter. But they’re not experts on that topic, I am. I’ve talked to enough of them at this point to conclude that the typical physicist working on dark matter has only a cartoonish understanding of the data that motivates their whole field. After all,

It is difficult to get a man to understand something, when his salary depends on his not understanding it.

Upton Sinclair

Nobel prizes that were, that might have been, and others that have not yet come to pass

The time is approaching when Nobel prizes are awarded. This inevitably leads to a lot of speculation and chattering rumor. Last year one publication, I think it was Physics Today, went so far as to publish a list of things various people thought should be recognized. This aspirational list was led, of course, by dark matter. It was even formatted the way prize awards are phrased, saying something like “the prize goes to [blank] for the discovery of dark matter.” This would certainly be a prize-worthy discovery, if made. So far it hasn’t been, and I expect it never will be: blank will remain blank forever. I’d be happy to be proved wrong, as forever is a long time to wait for corroboration of this prediction.

While the laboratory detection of dark matter is a slam-dunk for a Nobel prize, there are plenty of discoveries that drive the missing mass problem that are already worthy of this recognition. The issue is too big for a single prize. Laboratory detection would be the culmination of a search that has been motivated by astronomical observations. The Nobel prize in physics has sometimes been awarded for astronomical discoveries – and should be, for those that impact fundamental physics or motivate entire fields like the search for dark matter – so let’s think about what those might be.

An obvious historical example would be Kepler’s Laws. Kepler predates Nobel by a few centuries, but there is no doubt that his identification of the eponymous laws of planetary motion impacted fundamental physics, being one of the key set of facts that led Newton to his universal law of gravity. Whether Tycho Brahe should also be named as the person who made the observations on which Kepler’s work is based is the sort of question the prize committee has to wrestle with. I would say yes: the prize is for “the person who shall have made the most important discovery or invention within the field of physics.” In this case, the discovery that led to gravity was a set of rules – how the orbits of planets behave – that required both observational work (Brahe’s) and numerical analysis (Kepler’s) to achieve.

One could of course also give a prize to Newton some decades later, though theories are not generally considered discoveries. The line can be hazy. For example, the Nobel Prize in Physics 1921 was awarded to Albert Einstein “for his services to Theoretical Physics, and especially for his discovery of the law of the photoelectric effect.” The “especially” is reserved for the empirical law, not relativity, though I guess “services to theoretical physics” is doing a lot of work there.

Reading up on that I was mildly surprised to learn that the committee had a hard time finding deserving recipients, initially skipping 1918 and 1921 but awarding those prizes in the subsequent year to Planck and Einstein, respectively. I wonder if they struggled with the definition of discovery: need it be experimental? For many, the answer is yes. A theory by itself, untethered from experimental or observational corroboration, does not a discovery make.

I don’t think they need to skip years any more, as the list of plausible nominees has grown so long that deserving people die waiting to be recognized: the Nobel prize is not awarded posthumously. The story is that this is what happened to both Henrietta Leavitt (who discovered the Cepheid period-luminosity relation) and Edwin Hubble (who used Leavitt’s relation for Cepheids to measure distances to other galaxies, thereby changing the course of cosmology). There is also the issue of what counts as physics. At the time, these were very astronomical discoveries. In retrospect, it is obvious that the impact Hubble had on cosmology counts as physics as well.

The same can be said for the discovery of flat rotation curves. I have made the case before that Vera Rubin and Albert Bosma (and arguably others) deserve the Nobel prize for this discovery. Note that I do not say the discovery of dark matter, because (1) that’s not what they did*, and (2) flat rotation curves are enough. Flat rotation curves are a de facto law of nature. That’s enough, every bit as much as Einstein’s “discovery of the law of the photoelectric effect.” A laboratory detection of dark matter would be another discovery worthy of a Nobel prize, but we already missed out on recognizing Rubin for this one.

Conflating discoveries with their interpretation has precluded recognition of other important astronomical discoveries – discoveries that implicate basic physics regardless of their ultimate interpretation, be it cold dark matter or MOND or something else we have yet to figure out. So, what are some others?

One obvious one is the Tully-Fisher relation. This is another de facto law of nature. Tully has been recognized for his work with the Gruber prize, so it’s not like it hasn’t been recognized. What remains lacking is recognition that this is a fundamental law of physics, at least the baryonic version when flat rotation speeds are measured.

Philip Mannheim pointed out to me that Milgrom deserves the prize for the discovery of the acceleration scale a₀. This is a new constant of nature. That’s enough.

Milgrom went further, developing the whole MOND paradigm around this new scale. But that is extra credit material that needn’t be correct. Unfortunately, the controversial nature of MOND, deserved or not, serves to obscure that there is a new constant of nature whose discovery is analogous to Planck’s discovery of his eponymous constant. People argue over whether a₀ is a single constant (it is) or whether it evolves over cosmic time (not so far as I can tell). The latter objection could be raised for Planck’s constant or Newton’s constant; these were established when it wasn’t possible to test whether their values might have varied over cosmic time. Now that we can, we do check! and so far, no: h, G, and a₀ all appear to be constants of nature, to the extent we are able to perceive.

The above discoveries are all worthy of recognition by a Nobel prize. They are all connected by the radial acceleration relation, which is another worthy observational discovery in its own right. This is one that clearly transgresses the boundaries of physics and astronomy, as the early versions (Sanders 1990, McGaugh 1999, 2004) appeared in the astronomical literature, but more recent ones in the physics literature (McGaugh et al. 2016, Mistele et al. 2024). Sadly, the community seems perpetually stuck looping through the stages of Louis Agassiz‘s progression of responses to scientific discoveries. It shouldn’t be: this is an empirical relation that has long been well established and repeatedly confirmed. It suffers from association with MOND, but no reference to MOND is made in the construction of the observed relation. It’s right there in the data:

The radial acceleration relation as traced by both early (red) and late (cyan) type galaxies via both kinematics and gravitational lensing. The low acceleration behavior maps smoothly onto the Newtonian behavior seen in the solar system at higher accelerations. If Newton’s discovery of the inverse square force law would warrant a Nobel prize, as surely it would had the prize existed in Newton’s time, then so does the discovery of a systematically new behavior.

*Rubin and Bosma both argued, sensibly, that the interpretation of flat rotation curves required dark matter. That’s an interpretation, not a discovery. That rotation curves were flat, over and over again in every galaxy examined, to indefinitely large radii, was the observational discovery.

Progressive Approximations in Mass Modeling

I have said I wasn’t going to attempt to teach an entire graduate course on galaxy dynamics in this forum, and I’m not. But I can give some pointers for those who want to try it for themselves. It also provides some useful context for fans of Deur’s approach.

The go-to textbook for this topic is Galactic Dynamics by Binney & Tremaine. The first edition was published in 1987, conveniently when I switched to grad school in astronomy. It was already a deep and well-developed field at that time; this is a compendium of considerable scientific knowledge.

Fun story: a colleague in a joint physics & astronomy department once complained to me that she wanted to develop a course in galaxy dynamics, which is a staple of graduate programs in astronomy & astrophysics. However, there was a certain senior colleague who objected, saying that since it was astronomy, it couldn’t possibly be a rigorous course worthy of a full semester graduate course. This is a casual bias that astronomers often encounter when talking to physicists, many of whom have attitudes about the subject that were trapped in amber sometime in the Jurassic. I suggested that she walk into his office and drop a copy of Galactic Dynamics on his desk from on high, as (1) it would make a hefty impact, and (2) no one who so much as skims this book could persist in this toxic attitude.

She later reported that she had done this, and it had worked.

Galactic Dynamics is not a starter book. It is the textbook we use when teaching the graduate course that this is not. A useful how-to guide for the specific material I’ll discuss here is provided by Federico Lelli. In brief, to model the gravitational potential of an observed distribution of matter, we can make one of the following series of approximations:

*This is a slide I sometimes use to introduce mass modeling in science talks as a reminder for expert audiences.*

All science is an approximation at some level. The most crude approximation we can employ here is to imagine that all of the mass resides at a central point. In this limit, the potential is simply

V² = GM/R

where V is the orbital speed of a test particle on a circular orbit, G is Newton’s constant, M is the mass, and R is the distance from the point mass. Galaxies are not point masses, so this is a terrible approximation, as can be seen by the divergent V ~ R^-1/2 behavior as R → 0 (the dotted line above).

The next bad approximation one can make is a spherical cow: assume the mass is distributed in a sphere that is projected as the image we see on the sky. This at least incorporates the fact that the mass is not all concentrated at a point, so

V² = GM(R)/R

acknowledges that the mass M is spread out as a function of radius. This is a spherical cow. Since we cannot see dark matter, we almost always assume it to be a spherical cow.

For the luminous disk of a spiral galaxy, a common approximation is the so-called exponential disk:

Σ(R) = Σ₀ e^-R/R_d

where Σ₀ is the central surface density of stars and R_d is the scale length of the disk – the characteristic size over which the surface brightness declines exponentially. This can be integrated by parts to obtain an expression for the enclosed mass M(R) which I leave as an exercise for the eager reader. This provides a handy analytic formula, the rotation curve of which is illustrated above by the dashed line.

Spiral galaxies are fairly thin when seen edge-on, so the spherical cow is not a great approximation. In a classic paper, Freeman (1970) solved the Poisson equation for the case of a razor-thin exponential disk, where one meets modified Bessel functions of the first and second kind (denoted “ikik” above). These must be solved numerically, but one can make a tabulation for use with any choice of disk mass and scale length. Such a thin disk is illustrated by the grey line above for a choice of stellar mass and scale length appropriate to NGC 6946.

*The spiral galaxy NGC 6946, aka the fireworks galaxy.*

Spiral galaxies are not razor thin of course. We only see a projected image on the sky, so for a galaxy like NGC 6946, we may have a good measurement of its azimuthally averaged light (and presumable stellar mass) distribution Σ(R) but we have no idea how thick it is. Here, we have to make an educated guess based on observations of edge-on galaxies. A ballpark average is R:z = 8:1, but some galaxies are thicker and others thinner, so this becomes an approximation with an associated uncertainty. This uncertainty cannot be unambiguously eliminated; it is one of the known unknowns that comprise the inevitable systematic errors in astronomy. Fortunately, allowing for a finite thickness only takes the harsh edge off of the thin disk case, and the assumption one chooses makes little difference to the result (compare the lines labeled thick and thin above).

The exponential disk formula Σ(R) is an azimuthal average over an image like that of NGC 6946. This approximation captures none of the spiral structure: it only tells us about the average rate at which the surface brightness falls off. It also imposes a smooth shape on that fall off that our eyes can see is not necessarily a great approximation. So the next level of approximation is to solve the Poisson equation numerically for the observed surface brightness profile, Σ(R), not just the exponential approximation thereto. This is the blue line in the bottom right graph above.

There are important differences between using the numerical solution for the observed light distribution and the exponential disk approximation. This has been known since the 1980s, but the analytic expression is so convenient that people need an occasional reminder not to trust it too much. Jerry Sellwood felt the need to provide this reminder in 1999:

Small apparent differences in the shape of the mass profile (left) correspond to pronounced differences in the rotation curve (right). I chose the example of NGC 6946 in part because the exponential approximation for it is pretty good. Nevertheless, the details matter, so the best practice is to build numerical mass models, as we did for SPARC.

Building numerical mass models is tractable for external galaxies, where we can see the entire light distribution. It is not possible for our own Milky Way, since we are located within it and cannot see it as a whole. Consequently, the vast majority of Milky Way models rely on the exponential approximation; so far as I’m aware, I’m the only one who has built a model that attempts to get beyond this.

Numerical mass models are still an approximation. We’re assuming that the gravitational potential is static and azimuthally symmetric. Taking the next step would require abandoning these assumptions to model the spiral arms. The Poisson equation can handle that, but it becomes dicey because the arms rotate with some pattern speed (generally unknown) and may grow or dissolve or reform on some unknown timescale. The potential at any given point is time variable even in equilibrium, so we need not just a numerical solution but a live numerical simulation to keep track of it. That can be done, but it has to be done on a case by case basis, and the answer will depend somewhat on additional assumptions that have to be introduced to run the simulation, like specifying a dark matter halo.

One can generalize further to consider the full 3D potential, e.g., to allow for asymmetry in the z-direction as well as in azimuth. One can further imagine non-equilibrium processes, such as an external perturbations. There is good evidence that the Milky Way suffers both of these effects, the passage of the Large Magellanic Cloud being one obvious and apparently large perturbation. So we are in the awkward position that the Gaia data now oblige us to consider the entire run of possible effects through non-equilibrium processes in a mass distribution that is not completely symmetric in any of the three spatial dimensions, but for the main mass component we are stuck with the inadequate approximation of an exponential disk.

Geometry appears to play a crucial role in the approach of Deur to the acceleration discrepancy problem. The essential claim is that the discrepancy correlates with flattening, with highly flattened systems like spirals evincing the classic discrepancy while spherical systems like E0 galaxies showing none. Big if true!

A useful plot appears on slide 44:

*Some measure of the discrepancy as a function of apparent ellipticity.*

This is the one example shown that goes into the plot of many determinations of the slope a on the following slide. It being the only one, it is the only thing I have to evaluate without chasing down every other case. Looking at this, I am not inclined to do so.

At first it looks persuasive: the best fit slope is clear. There is no reason why the discrepancy should depend on the projected ellipticity of a triaxial 3D blob of stars, so this must be telling us something important. I’d be on board with that if it were true, but I’ve seen too many non-correlations masquerading as correlations to believe this one. The fitted slope is strongly influenced by the one point at large ellipticity; absent that, a slope of zero works fine. Mostly what I see here is a lot of scatter, which is normal in extragalactic astronomy. Since there are only a few points at high and low ellipticity, we don’t know what would happen if we went out and got more data. But I bet that what would happen is that the high ellipticity points would wind up looking like those in the middle: a big blob of scatter, with no significant correlation.

I’d kinda like to be wrong about this one, so I won’t even get into the theory side, which I find sorta compelling but ultimately unpersuasive. Why are gravitons confined to a disk? What happens way far out? Surely the flatness of the disk at tens of kpc is not dictating the flatness at 1000 kpc.

Surely.

Triton Station

A Blog About the Science and Sociology of Cosmology and Dark Matter