Continuing our discussion of galaxy formation and evolution in the age of JWST, we saw previously that there appears to be a population of galaxies that grew rapidly in the early universe, attaining stellar masses like those expected in a traditional monolithic model for a giant elliptical galaxy rather than a conventional hierarchical model that builds up gradually through many mergers. The formation of galaxies at incredibly high redshift, z > 10, implies the existence of a descendant population at intermediate redshift, 3 < z < 4, at which point they should have mature stellar populations. These galaxies should not only be massive, they should also have the spectral characteristics of old stellar populations – old, at least, for how old the universe itself is at this point.

Theoretical predictions from Fig. 1 of McGaugh et al (2024) combined with the data of Fig. 4. The data follow the track of a monolithic model that forms early as a single galaxy rather than that of the largest progenitor of the hierarchical build-up expected in LCDM.

The data follow the track of stellar mass growth for an early-forming monolithic model. Do the ages of stars also look like that?

Here is a recent JWST spectrum published by de Graff et al. (2024). This appeared too recently for us to have cited in our paper, but it is a great example of what we’re talking about. This is an incredibly gorgeous spectrum of a galaxy at z = 4.9 when the universe was 1.2 Gyr old.

Fig. 1 from de Graff et al. (2024): JWST/NIRSpec PRISM spectrum (black line) of the massive quiescent galaxy RUBIES-EGS-QG-1 at a redshift of z = 4.8976.

It is challenging to refrain from nerding out at great length over many of the details on display here. First, it is an incredible technical achievement. I’ve seen worse spectra of local galaxies. JWST was built to obtain images and spectra of galaxies so distant they approach the horizon of the observable universe. Its cameras are sensitive to the infrared part of the spectrum in order to capture familiar optical features that have been redshifted by a huge factor (compare the upper and lower x-axes). The telescope itself was launched into space well beyond the obscuring atmosphere of the earth, pointed precisely at a tiny, faint flicker of light in a vast, empty universe, captured photons that had been traveling for billions of years, and transmitted the data to Earth. That this is possible, and works, is an amazing feat of science, engineering, and societal commitment (it wasn’t exactly cheap).

In the raw 2D spectrum (at top) I can see by eye the basic features in the extracted, 1D spectrum (bottom). This is a useful and convincing reality check to an experienced observer even if at first glance it looks like a bug splot smeared by a windshield wiper. The essential result is apparent to the eye; the subsequent analysis simply fills in the precise numbers.

Looking from right to left, the spectrum runs from red to blue. It ramps up then crashes down around an observed wavelength of 2.3 microns. This is the 4000 Å break in the rest frame, a prominent feature of aging stellar populations. The amount of blue-to-red ramp-up and the subsequent depth of drop is a powerful diagnostic of stellar age.

In addition to the 4000 Å break, a number of prominent spectral lines are apparent. In particular, the Balmer absorption lines Hβ, Hγ, and Hδ are clear and deep. These are produced by A stars, which dominate the light of a stellar population after a few hundred million years. There’s the answer right there: the universe is only 1.2 Gyr old at this point, and the stars dominating the light aren’t much younger.

There are also some emission lines. These can be the sign of on-going star formation or an active galactic nucleus powered by a supermassive black hole. The authors attribute these to the latter, inferring that the star formation happened fast and furious early on, then basically stopped. That’s important to the rest of the spectrum; A stars only dominate for a while, and their lines are not so prominent if a population keeps making new stars. So this galaxy made a lot of stars, made them fast, then basically stopped. That is exactly the classical picture of a monolithic giant elliptical.

Here is the star formation history that de Graff et al. (2024) infer:

Fig. 2 from de Graff et al. (2024): the star formation rate (top) and accumulated stellar mass (bottom) as a function of cosmic time (only the first 1.2 Gyr are shown). Results for stellar populations of two metallicities are shown (purple or blue lines). This affects the timing of the onset of star formation, but once going, an enormous mass of stars forms fast, in ~200 Myr.

There are all sorts of caveats about population modeling, but it is very hard to avoid the basic conclusion that lots of stars were assembled with incredible speed. A stellar mass a bit in excess of that of the Milky Way appears in the time it takes for the sun to orbit once. That number need not be exactly right to see that this is not a the gradual, linear, hierarchical assembly predicted by LCDM. The typical galaxy in LCDM is predicted to take ~7 Gyr to assemble half its stellar mass, not 0.1 Gyr. It’s as if the entire mass collapsed rapidly and experienced an intense burst of star formation during violent relaxation (Lynden-Bell 1967).

Collapse of shells within shells to form a massive galaxy rapidly in MOND (Sanders 2008). Note that the inner shells (inset) where most of the stars will be collapse even more rapidly than the overall monolith (dotted line).

Where MOND provides a natural explanation for this observation, the fiducial population model of de Graff et al. violates the LCDM baryon limit: there are more stars than there are baryons to make them from. It should be impossible to veer into the orange region above as the inferred star formation history does. The obvious solution is to adopt a higher metallicity (the blue model) even if that is a worse fit to the spectrum. Indeed, I find it hard to believe that so many stars could be made in such a small region of space without drastically increasing their metallicity, so there are surely things still to be worked out. But before we engage in too much excuse-making for the standard model, note that the orange region represents a double-impossibility. First, the star formation efficiency is 100%. Second, this is for an exceptionally rare, massive dark matter halo. The chances of spotting such an object in the area so far surveyed by JWST is small. So we not only need to convert all the baryons into stars, we also need to luck into seeing it happen in a halo so massive that it probably shouldn’t be there. And in the strictist reading, there still aren’t enough baryons. Does that look right to you?

Do these colors look right to you? Getting the color right is what stellar population modeling is all about.

OK, so I got carried away nerding out about this one object. There are other examples. Indeed, there are enough now to call them a population of old and massive quiescent galaxies at 3 < z < 4. These have the properties expected for the descendants of massive galaxies that form at z > 10.

Nanayakkara et al. (2024) model spectra for a dozen such galaxies. The spectra provide an estimate of the stellar mass at the redshift of observation. They also imply a star formation history from which we can estimate the age/redshift at which the galaxy had formed half of those stars, and when it quenched (stopped forming stars, or in practice here, when the 90% mark had been reached). There are, of course, large uncertainties in the modeling, but it is again hard to avoid the conclusion that lots of stars were formed early.

Figure 7 from McGaugh et al. (2024): The stellar masses of quiescent galaxies from Nanayakkara et al. (2024). The inferred growth of stellar mass is shown for several cases, marking the time when half the stars were present (small green circles) to the quenching time when 90% of the stars were present (midsize orange circles) to the epoch of observation (large red circles). Illustrative star formation histories are shown as dotted lines with the time of formation ti and the quenching timescale τ noted in Gyr. We omit the remaining lines for clarity, as many cross. There is a wide distribution of formation times from very early (ti = 0.2 Gyr) to relatively late (>1 Gyr), but all of the galaxies in this sample are inferred to build their stellar mass rapidly and quench early (τ < 0.5 Gyr).

The dotted lines above are models I constructed in the spirit of monolithic models. The particular details aren’t important, but the inferred timescales are. To put galaxies in this part of the stellar mass-redshift plane, they have to start forming early (typically in the first billion years), form stars at a prolific rate, then quench rapidly (typically with e-folding timescales < 1 Gyr). I wouldn’t say any of these numbers are particularly well-measured, but they are indicative.

What is missing from this plot is the LCDM prediction. That’s not because I omitted it, it’s because the prediction for typical L* galaxies doesn’t fall within the plot limits. LCDM does not predict that typical galaxies should become this massive this early. I emphasize typical because there is always scatter, and some galaxies will grow ahead of the typical rate.

Not only are the observed galaxies massive, they have mature stellar populations that are pretty much done forming stars. This will sound normal to anyone who has studied the stellar populations of giant elliptical galaxies. But what does LCDM predict?

I searched through the Illustris TNG50 and TNG300 simulations for objects at redshift 3 that had stellar masses in the same range as the galaxies observed by Nanayakkara et al. (2024). The choice of z = 3 is constrained by the simulation output, which comes in increments of the expansion factor. To compare to real galaxies at 3 < z < 4 one can either look at the snapshot at z = 4 or the one at z = 3. I chose z = 3 to be conservative; this gives the simulation the maximum amount of time to produce quenched, massive galaxies.

These simulations do indeed produce some objects of the appropriate stellar mass. These are rare, as they are early adopters: galaxies that got big quicker than is typical. However, they are not quenched as observed: the simulated objects are still on the star forming main sequence (the correlation between star formation rate and stellar mass). The distribution of simulated objects does not appear to encompass that of real galaxies.

Figure 8 from McGaugh et al. (2024): The stellar masses and star formation rates of galaxies from Nanayakkara et al. (2024; red symbols). Downward-pointing triangles are upper limits; some of these fall well below the edge of the plot and so are illustrated as the line of points along the bottom. Also shown are objects selected from the TNG50 (Pillepich et al. 2019; filled squares) and TNG300 (Pillepich et al. 2018; open squares) simulations at z = 3 to cover the same range of stellar mass. Unlike the observed galaxies, simulated objects with stellar masses comparable to real galaxies are mostly forming stars at a rapid pace. In the higher-resolution TNG50, none have quenched as observed.

If we want to hedge, we can note that TNG300 has a few objects that are kinda in the right ballpark. That’s a bit misleading, as the data are mostly upper limits. Moreover, these are the rare objects among a set of objects selected to be rare: it isn’t a resounding success if we have to scrape the bottom of the simulated barrel after cherry-picking which barrel. Worse, these few semi-quenched simulated objects are not present in TNG50. TNG50 is the higher resolution simulation, so presumably provides a better handle on the star formation in individual objects. It is conceivable that TNG300 “wins” by virtue of its larger volume, but that’s just saying we have more space in which to discover very rare entities. The prediction is that massive, quenched galaxies should be exceedingly rare, but in the real universe they seem mundane.

That said, I don’t think this problem is fundamental. Hierarchical assembly is still ongoing at this epoch, bringing with it merger-induced star formation. There’s an easy fix for that: change the star formation prescription. Instead of “wet” mergers with gas that can turn into stars, we just need to form all the stars already early on so that the subsequent mergers are “dry” – at least, for those mergers that build this particular population. One winds up needing a new and different mode of star formation. In addition to what we observe locally, there needs to be a separate mode of super-efficient star formation that somehow turns all of the available baryons into stars as soon as possible. That’s basically what I advocate as the least unreasonable possibility for LCDM in our paper. This is a necessary but not sufficient condition; these early stellar nuggets also need to assemble speedy quick to make really big galaxies. While it is straightforward to mess with the star formation prescription in models (if not in nature), the merger trees dictating the assembly history are less flexible.

Putting all the data together in a single figure, we can get a sense for the evolutionary trajectory of the growth of stellar mass in galaxies across cosmic time. This figure extends from the earliest galaxies so-far known at z ~ 14 when the universe was just a few hundred million years old (of order on orbital time in a mature galaxy) to the present over thirteen billion years later. In addition to data discussed previously, it also shows recent data with spectroscopic redshifts from JWST. This is important, as the sense of the figure doesn’t change if we throw away all the photometric redshifts, it just gets a little sparse around z ~ 8.

Figure 10 from McGaugh et al. (2024): The data from Figures 4 and 6 shown together using the same symbols. Additional JWST data with spectroscopic redshifts are shown from Xiao et al. (2023; green triangles) and Carnall et al. (2024). The data of Carnall et al. (2024) distinguish between star-forming galaxies (small blue circles) and quiescent galaxies (red squares); the latter are in good agreement with the typical stellar mass determined from Schechter fits in clusters (large circles). The dashed red lines show the median growth predicted by the Illustris ΛCDM simulation (Rodriguez-Gomez et al. 2016) for model galaxies that reach final stellar masses of M* = 1010, 1011, and 1012 M. The solid lines show monolithic models with a final stellar mass of 9 x 1010 M and ti = τ = 0.3, 0.4, and 0.5 Gyr, as might be appropriate for giant elliptical galaxies. The dotted line shows a model appropriate to a monolithic spiral galaxy with ti = 0.5 and τ = 13.5 Gyr.

The solid lines are monolithic models we built to represent classical giant elliptical galaxies that form early and quench rapidly. These capture nicely the upper envelope of the data. They form most of their stars at z > 4, producing appropriately old populations at lower redshifts. The individual galaxy data merge smoothly into those for typical galaxies in clusters.

The LCDM prediction as represented by the Illustris suite of simulations is shown as the dashed red lines for objects of several final masses. These are nearly linear in log(M*)-linear z space. Objects that end up with a typical L* elliptical galaxy mass at z = 0 deviate from the data almost immediately at z > 1. They disappear above z > 6 as the largest progenitors become tiny.

What can we do to fix this? Massive galaxies get a head start, as it were, by being massive at all epochs. But the shape of the evolutionary trajectory remains wrong. The top red line (for a final stellar masses of 1012 M) corresponds to a typical galaxy at z ~ 2, but it continues to grow to be atypical locally. The data don’t do that. Even with this boost, the largest progenitor is still predicted to be too small at z > 3 where there are now many examples of massive, quiescent galaxies – known both from JWST observations and from Jay Franck’s thesis before it. Again, the distribution of the data do not look like the predictions of LCDM.

One can abandon Illustris as the exemplar of LCDM, but it doesn’t really help. Other models show similar things, differing only in minor details. That’s because the issue is the mass assembly history they all share, not the details of the star formation. The challenge now is to tweak models to make them look more monolithic; i.e., change those red dashed lines into the solid black lines. One will need super-efficient star formation, if it is even possible. I’ll leave discussion of this and other obvious fudges to a future post.

Finally, note that there are a bunch of galaxies with JWST spectroscopic redshifts from 3 < z < 4 that are not exceptionally high mass (the small blue points). These are expected in any paradigm. They can be galaxies that are intrinsically low mass and won’t grow much further, or galaxies that may still grow a lot, just with a longer fuse on their star formation timescale. Such objects are ubiquitous in the local universe as spiral and irregular galaxies. Their location in the diagram above is consistent with the LCDM predictions, but is also readily explained by monolithic models with long star formation timescales. The dotted line shows a monolithic model that forms early (ti = 0.5) but converts gas into stars gradually (τ = 13.5 Gyr rather than < 1 Gyr). This is a boilerplate model for a spiral that has been around for as long as the short-τ model for giant ellipticals. So while these lower mass galaxies exist, their location in the M*-z plane doesn’t really add much to this discussion as yet. It is the massive galaxies that form early and become quiescent rapidly that most challenge LCDM.

16 thoughts on “Old galaxies in the early universe

  1. Thanks for a very good post. It’s a major challenge to LCDM, and the detail shows it clearly. MOND does considerably better, but as you probably know as well as anyone, there are things that are hard to explain in any paradigm. One observation was of a merger between two compact early galaxies that should not have held together – I don’t know if MOND can explain that.

    For LCDM’s attempts to explain very rapid galaxy formation, you point out that ‘Forming enough stars is not the problem. The problem is assembling them into a single object’. Hypothetically, if there’s a faster time rate as well as a proportional higher overall mass, this is doubly helpful in allowing it.

    It’s not ruled out by SNe data. In relation to recent discussion, it’s worth pointing out that although with galaxies ‘high redshift’ can mean z ~ 10, with SNe it often means z ~ 1. One study of ‘high redshift SNe’ went out to z = 0.62. This covers an area where the time rate has to be much the same as ours anyway.

    In my picture the curve flattens outs, with a similar time rate to the present one, somewhere (in the expansion sequence) after z = 6, and as I’ve said in the paper, I don’t have that curve, only a nearby approximation. That’s why I asked if you, or anyone else, could get to it – a number of things should click into place if it’s found. There are things that set bounds on it – what you said about the Tully-Fisher relation helps, it seems clear that the time rate is close to the present one out to z = 2.5, which goes beyond most SNe data anyway.

    With GRBs the data is less reliable at higher redshifts, distant GRBs are generally less that 5s in duration, but that’s at least partly the tip of the iceberg effect. But in one study it was shown – they say with 99% certainty – that there’s some unknown, fundamental difference between nearby populations of GRBs and distant ones (refs are in the paper). As I said there, VTC is a spinoff, less certain and less complete than PST. All I could do, looking at the measurements, was try to make sure there was room for a varying time rate, with varying mass – and point out reasons to expect one. Perhaps you could find out more than that.

  2. Thanks for sharing, this is excellent. When you say structure formation must have happened very quickly and very efficiently, what does that mean . . . that gravitation was more dominant than radiation? If so, wouldn’t the stars appear younger? What are the main factors that normally influence this efficiency?
    My first impression was that if something is happening both much more quickly than expected AND much more efficiently, then that is again seems consistent with time dilation.

    1. Let’s be careful that we’re talking about the same thing. To me, structure formation means the formation of galaxies themselves and the larger structures they trace out. What I talked about above was the efficiency of star formation within galaxies. One could, and presumably does, have galaxies that form as primordial gas but that don’t convert all that gas into stars rapidly, though some apparently do.

      So if you mean galaxy formation, then that is slow in LCDM and fast in MOND. That process is *not* efficient; most of the baryons remain in the intergalactic medium rather than accreting into galaxies. I don’t even know how to address what you mean by time dilation.

      If instead you mean star formation, what I’m saying above is that one path to attempt to salvage LCDM is for there to sometimes be super-efficient star formation that promptly turns a lot of the available baryons into stars. That’s not what it looks like so much as an inference we’re forced into: not that many baryons are predicted to have formed into galaxies yet, but gee there sure are a lot of stars already. So it is post-facto fill in, in which yes, I guess gravity would dominate over radiation (and other forms of feedback) that usually inhibit star formation and keep its efficiency low.

      1. Thanks Stacy for clarifying the distinction for me between rapid structure formation and super efficient star formation. I was lumping those two processes together, and it just didn’t sit right with me that an assembly process should somehow be much faster and also much more efficient at the same time. I assumed that may be counterintuitive, but could be avoided if the time interval for formation was longer than currently assumed . . . due to additional time dilation. However, you are pointing out MOND predicts the early structure formation and plenty of older stars, so these JWST observations are not surprising interms of MOND. Is that right?

  3. Incidentally, the suggestion of time dilation can’t be something novel . . . I can’t belive it could be. If one comes up with inlationary cosmology where the universe should expand in unrealistic fashion to such extremes in a tiny fraction of a second, nobody thought to say, well maybe what we call a tiny fraction of a second was much much longer locally in the early universe. That just seems so obvious to me, I’m sure there must have been such a discussion and attempts to disprove it.

  4. The idea of a faster time time rate early on is easy to arrive at, and we know time varies in other situations. But those are local variations, and no-one had a
    mechanism for an overall, cosmological time rate. The conceptual picture I linked to a summary of (on the post before last) makes it a lot more of a possibility, as it explains both kinds of variations via the same basis.

    Incidentally, some results published last year suggest that ‘dark energy may be getting weaker’, there’s a New Scientist article. This was a surprise, but it could potentially (very loosely at present) be explained in the VTC picture, because the loss of mass that causes the accelerating expansion would definitely be slowing down. There’d be a need to see how the value of H changes over time, and relate that to the changing total mass, which is proportional to the time rate, so clues about how the time rate changes could be related to that. In 2022 I tried for months to get to the curve, but couldn’t (I’m from the conceptual department, but departments can work together…). if anyone wants to try it, get in touch and I’ll let you know the clues I have. The third link on the page I linked to has an email address and a summary of the conceptual basis, thanks.

    1. Jonathan,
      It is not clear that one needs to change anything in the physics models of the early universe, as time dilation wouldn’t be a factor to a local model. Accounting for time dilation would just potentially brings certain observations and model outputs closer together if the divergence was due to unaccounted for relativistic effects.

  5. I would think the question “Why does MOND work at all?” provides a lot of motivation for rewriting MOND into a time dilation equation. The assumption being that the acceleration discrepency is not really felt by the object in the low acceleration regime, but appears to the observer due to the time dilation. It would then make sense that the effect appears at an acceleration scale and can be described by the observable baryons. The apparent need to modify either the force of gravity or the inertial mass could be a consequence of something more fundamental, and one of the most fundamental properties of both SR and GR is the concept of time dilation.

  6. Well for MOND I have self interaction of the emitted medium, which always starts at a ≈ 2e-9 (as in the preprint) and changes the dissipation pattern, arriving at the new pattern slowly in galaxies, quickly in clusters, because of how aligned, or random, the travelling directions of the small-scale waves are. That was one of several possibilities for what might change the pattern of the field, which started to seem the likely one last year because of two different speeds for the transition.

    1. Yes, it’s all part and parcel of the same thing. I’ve emphasized galaxy formation lately because of the JWST results, but that even larger structures would appear earlier than expected was part of the same prediction.

  7. Do you think these are old galaxies in the early universe, or old galaxies in the distant universe? Do you think “early universe” is a physically true concept? (I recall your hyperlink to Lerner’s research.)

    1. Yes, I mean early universe, which is the same as distant universe in the context of a hot big bang – the basic premise of which I’m not disputing. Citation is acknowledgement not endorsement: I am open to the possibility that we are even more wrong than I think, but I don’t think we’re *that* wrong. I’ll write about it if I think we have to go there.

  8. It’s worth pointing out that it’s hard to find a mechanism that boosts accelerations effortlessly (ie. without many assumptions). It’s easier to reduce them, and make the effect of gravity fall off quicker. To boost gravity, whatever causes gravity might have to be boosted. In PSG it’s not far-fetched. If the medium starts dissipating faster at some radius for whatever reason, that boosts accelerations, because an acceleration is matter detecting a local rate of change for the ‘density’ of the medium. That slope affects the helical refraction mechanism – if it gets steeper, gravity gets stronger. So the inverse square law in the Newtonian regime comes not from the state of the medium, but how fast its state is changing (and a rate of change with radius, with r^2 underneath it, is not too surprising).

    So faster dissipation means higher accelerations, and self-interaction is a likely cause, for one thing because it can indeed increase dissipation. There’s certainly a radial aspect to how the field changes: one way to express MOND is that accelerations are boosted at any point in the MOND regime by r/r[M], where r[M] is the MOND radius, (GM/a0)^1/2.

    I know you sometimes investigate theories that try to shed light on MOND – I don’t know if you’d want to look at this one, but why MOND works was the question of the year at one point. You probably prefer more mathematical theory and less concepts, but there’s always the near-proof. It doesn’t prove that all masses emit a refractive medium – it only proves that all masses are surrounded by one, which thins out in the radial direction.

  9. My hunch is that unaccounted for time dilation provides a compelling explanation for much of the crisis in cosmology, and choosing a more suitable metric for cosmological observations may show this. Homogeneity is perhaps the current limiting assumption, along with the current lack of complementarity on cosmological scales. These concepts seem to me like the most fertile areas to construct a better theoretical framework.

Comments are closed.