The baryonic sizes and masses of late type galaxies, and a bit about their angular momentum

The baryonic sizes and masses of late type galaxies, and a bit about their angular momentum

I have always been interested in the extremes of galaxy properties, especially to low surface brightness (LSB). LSB galaxies are hard to find and observe, so they present an evergreen opportunity for discovery. They also expose theories built to explain bright galaxies to novel tests.

Fundamental properties of galaxies include their size and luminosity. The luminosity L is a proxy for stellar mass while the size R is one measure of how those stars are distributed. The surface brightness S is the luminosity spread over an area 2πR2, so S = L/2πR2. One may define different types of radii and corresponding surface brightnesses, but whatever the choice, only two of these three quantities are independent. At a minimum, one needs at least two parameters to quantitatively describe a galaxy, as galaxies of the same luminosity* can have their light spread over different areas.

Being composed of tens of billions of stars, it ought to take a lot more than two parameters to describe a galaxy. A useful shorthand for galaxy appearance is provided by morphological types. I’m not a huge fan (they’re not quantitative and don’t relate simply to quantitative measures), but saying a spiral galaxy is an Sa or an Sc does provide a great shorthand for evoking their appearance.

Fig. 9 from Buta (2011): Examples of spiral galaxy morphologies Sa, Sb, Sc, Sd, and Sm (from left to right). The corresponding Hubble stages are T = 1, 3, 5, 7, 9. As one proceeds from early (Sa) to late (Sm) types, the bulge component becomes less prominent and the winding of spiral arms less tight until the appearance becomes irregular (T ≥ 9).

If we step back from the detailed difference in the appearance of the spiral arms of Sb and Sbc and Sc galaxies, there are some interesting physical distinctions between early type spirals (Sa – Sc) and later types (Sd on through Irr). These are all late type galaxies (LTGs) that are thin, rotationally supported disks of stars and gas. I’m not going to talk about pressure supported early type galaxies (ETGs) here, just early (Sa – Sc) and late (Sd – Irr) LTGs+.

My colleague Jim Schombert pointed out in 2006 that LTGs segregated into two sequences in size and stellar mass if not in gas mass. So early LTGs are more compact for their mass and late LTGs more diffuse.

Fig. 2 from Schombert (2006): Stellar and gas mass vs. optical scale length (α) in kiloparsecs. The open symbols are from the LSB dwarf catalog, crosses show disks from de Jong (1996), and asterisks show Sc galaxies from Courteau (1996). The separation of dwarfs and disks into two sequences is evident in the left panel. Sm class galaxies from de Jong are shown as filled symbols and are typically found on the dwarf sequence. Biweight fits to each sample are shown as dashed lines.

Another distinction is in the gas fraction. This correlates with surface brightness, and early and late LTGs tend to be either star-dominated or gas dominated.

Gas fraction as a function of effective surface brightness (stellar surface density). Red points are early type spirals (T < 5); blue points are later type (T > 6) spirals and irregular galaxies. Orange points are Sc (T = 5) spirals, which reside mostly with the early types. Green points are Scd (T = 6) galaxies, which reside mostly with the later types. There is a steady trend of increasing gas fraction with decreasing surface brightness. Early type spirals are star-dominated, high surface brightness galaxies; late types are gas-rich, low surface brightness galaxies!.

There are early LTGs with such low gas fractions that their current star formation rate risks using up all the available gas in just a Gyr or so. This seems a short time for a galaxy that has been forming stars for the past 13 Gyr, which has led to a whole subfield obsessed with how such galaxies may be resupplied with fresh gas from the IGM to keep things going. That may happen, and I’m sure it does at some level, but I think the concern with this being a terrible timing coincidence is misplaced, as there are lots of late LTGs with ample gas. The median gas fraction is 2/3 for the late LTGs above: they have twice as much gas as stars, and they can sustain their observed star formation rates for tens of Gyr, sometimes hundreds of Gyr. There are plenty of galaxies that need no injection of fresh gas. Similarly, there are genuine ETGs that are “red and dead”: some galaxies do stop forming stars. So perhaps those with short depletion times are just weary giants near the end of the road?

That paragraph may cause an existential crisis for an entire subfield, but I didn’t come here to talk about star formation winding down. No, I wanted to highlight an update to the size-mass relation provided by student Zichen Hua. No surprise, Schombert was right. Here is the new size-mass relation for gas, stars, and baryons (considering both stars and gas together):

Fig. 2 from Hua et al. (2025): The mass-size relations of SPARC galaxies in gas (left), stars (middle), and both together (baryons, right). Data points are color-coded by the gas fraction: red means gas poor, blue gas rich. The three panels span the same dynamic range on both axes. Two sequences are evident in the stellar and baryonic mass-size relations.

The half-mass radius R50 is a distinct quantity for each component: gas alone, stars alone, or both$ together. All the galaxies are on the same sequence if we only look at the gas: the surface density of atomic gas is similar in all of them#. When we look at the stars, there are two clear groups: the star-dominated early LTGs (red points) and the gas-rich late LTGs (blue points). This difference in the stars persists when translated into baryons – since the stars dominate the baryonic mass budget of the early LTGs, the gas makes little difference to their baryonic size. The opposite is the case for the gas rich galaxies, and the scatter is reduced as gas is included in the baryonic size. There are some intermediate cases, but the gap between between distinct groups is real, as best we can tell. Certainly it has become more clear than it was in 2006 when Schombert had only optical data (the near-IR helps for getting at stellar mass), and the two sequences are more clearly defined in baryons than in stars alone.

A related result is that of Tully & Verheijen (1997), who found a bimodality in surface brightness. Remember above, only two of luminosity, size, and surface brightness are independent. So a bimodality in surface brightness would be two parallel lines cutting diagonally across the size-stellar mass plane. That’s pretty much what we see in the two sequences.

Full disclosure: I was the referee of Tully & Verheijen (1997), and I didn’t want to believe it. I did not see such an effect in the data available to me, and they were looking at the Ursa Major cluster, which I suspected might be a special environment. However, they were the first to have near-IR data, something I did not have at the time. Moreover, they showed that the segregation into different groups was not apparent with optical data; it only emerged in the near-IR K-band. I had no data to contradict that, so while it seemed strange to me, I recommended the paper for publication. Turns out they were right^.

I do not understand why there are two sequences. Tully & Verheijen (1997) suggest that there are different modes of disk stability, so galaxies fall into one or the other. That seems reasonable in principle, but I don’t grasp how it works. I am not alone. There is an enormous literature on disk stability; it is largely focused on bars and spirals in star-dominated systems. It’s a fascinating and complex subject that people have been arguing about for decades. Rather less has been done for gas-dominated systems.

It is straightforward to simulate stellar dynamics. Not easy, mind you, but at least stars are very well approximated as point masses on the scale of galaxies. Not so the gas, for which one needs a hydro code. These are notoriously messy. One persistent result is that systems tend to become unstable when there is too much gas. And yet, nature seems to have figured it out as we see lots of gas rich galaxies. Their morphology is different, so there seems to be an interplay between surface brightness, gas content, and disk stability. Perhaps Tully & Verheijen’s supposition about stability modes is related to the gas content.

That brings us to other scaling relations. Whatever is going on to segregate galaxies in the size-mass plane is not doing it in the velocity-mass plane (the BTFR). There should be a dependence on radius or surface brightness along the BTFR. There really should be, but there is not. Another, related scaling relation is that of specific angular momentum with mass. These three are shown together here:

Fig. 5 from Hua et al. (2025): Scaling relations of galaxy disks: the baryonic Tully-Fisher relation (left panel), the baryonic mass-size relation (middle panel), and the baryonic angular-momentum relation (right panel). The crosses and circles are early and late type spirals, respectively, color-coded by the effective baryonic surface density. The blue and gold solid lines are the best-fit lines for LSD galaxies and HSD galaxies, respectively. The dashed black line in the right panel shows the best-fit line considering all 147 galaxies together.

As with luminosity, size, and surface brightness, only two of these three plots are independent. Velocity and size specify the specific angular momentum j ~ V*R, so the right panel is essentially a convolution of the left and middle panels. There is very little scatter in the BTFR (left) but a lot in size-mass (middle), so you wind up with something intermediary in the j-M plane (right).

I hope that sounds trivial, because it is. It hardly warrants mention, in my opinion. However, my opinion on this point is not widely shared; there are a lot of people who make a lot of hay about the specific angular momentum of disk galaxies.

In principle this attention to j-M makes sense. Angular momentum is a conserved quantity, after all. Real physics, not just astronomical scaling relations. Moreover, one can quantify the angular momentum acquired by dark matter halos in simulations. The spin parameter thus defined seems to do a good job of explaining the size-mass relation, which appears to follow if angular momentum is conserved. In this picture, LSB galaxies form in halos with large initial spin, so they end up spread out, while HSB galaxies form in low spin halos. How far the baryons collapse just depends on that initial angular momentum.

This is one of those compelling idea that nature declined to implement. First, an objection in principle: this hinges on the baryons conserving their share of the angular momentum. The angular momentum of the whole must be conserved (absent external torques), but the whole includes both baryons and dark matter. These two components are free to exchange angular momentum with each other, and there is every reason to expect they do so. In that case, the angular momentum of the baryons need not appear to be conserved: some could be acquired from or lost to the dark matter, where it becomes invisible. As baryons collapse to form a visible galaxy at the center of a dark matter halo, it is easy for them to lose angular momentum to the dark matter. That’s exactly what happens simulations, even in the first simulations to look into this: it was an eye-opening result to me in 1993, and yet in 2025 people still pretend like baryon-only angular momentum conservation has something to do with galaxy formation. They tend to argue that it gets the size-mass relation right, so it must work out, no?

Does it though? I’ve written about this before, and the answer is not really. Models that predict about the right size-mass relation predict the wrong Tully-Fisher relation, and vice-versa. You can squeeze the toothpaste tube on one end to make it flat, but the bulge simply moves somewhere else. So I find the apparent agreement between disk sizes and angular momenta to be more illusory than compelling. Heck, even Frank van den Bosch agrees with me that you can’t get a realistic disk from the initial distribution of angular momentum j(r). Frank built his career& contradicting me, so if we agree about something y’all should take note.

That was all before the current results. The distribution of initial spins is a continuous function that is lognormal: it has a peak and a width. Translating that% into the size distribution predicts a single size-mass relation with finite scatter. It does not predict two distinct families for gas-poor and gas-rich disk galaxies. The new results are completely at odds with this picture.

That might not be apparent to advocates of the spin-size interpretation. If one looks at the j-M (right) panel, it seems like a pretty good correlation by the standards of extragalactic astronomy. So if you’re thinking in those terms, all may seem well, and the little kink between families is no big deal. Those are the wrong terms to think in. The correlation in j-M is good because that in the BTFR plane is great. The BTFR is the more fundamental relation; j is not fundamental, it’s just the BTFR diluted by the messier size-mass relation. That’s it.

One can work out the prediction for angular momentum in MOND. That’s the dotted line in the j-M panel above. MOND gets the angular momentum right: the observed trend follows the dotted line. It is possible for galaxies to have more or less angular momentum at a given mass, so there is some scatter, as observed. Again, that’s it.


*A common assertion I frequently hear, mostly from theorists, is that mass is the only galaxy parameter that matters. This is wrong now just as it was thirty years ago. I never cease to be amazed at the extent to which a simple, compelling concept outweighs actual evidence.

+So there are “early” late types. I suppose the earliest of LTGs is the S0, which is also the latest of ETGs. There are only a few S0’s in the SPARC sample, so I’m just gonna lump them in with the other early LTGs. Morphology is reproducible – experts can train others who subsequently perform as well as the experts – but it’s not like all experts agree about all classifications, and S0 is the most confounding designation.

$I recall giving a talk about LSB galaxies at UC Santa Cruz in the ’90s. In the discussion afterwards, Sandy Faber asked whether, instead of optical scale lengths, we should be talking about baryonic scale lengths instead. Both the audience and I were like

wut?

All that we had then were measures of the scale size of the stars in optical light, so the phrasing didn’t even compute at the time. But of course she was right, and R50,bar above is such a measure.

#A result I recall from my thesis is that the dynamic range in stellar surface brightness was huge while that in the gas surface density was small: a factor of 1,000 in Σ* might correspond to a factor of 2 or maybe 3 in Σg.

^It happens a lot in astronomy that a seemingly unlikely result later proves to be correct. That’s why we need to be open-minded as referees. Today’s blasphemy is tomorrow’s obvious truth.

&Career advice for grad students: find some paper of mine from 15 – 20 years ago. Update it with a pro-LCDM spin. You’ll go far.


%There was a time when the narrow distribution of spins in simulations was alleged to explain the narrow distribution of surface brightness known as Freeman’s Law. This wasn’t right. Doing the actual math, the “narrow” spin distribution maps to a broad surface brightness distribution – not a single value, nor a bimodal distribution. Here is an example spin distribution:

The spin distribution for galaxy and cluster mass dark matter halos from Eisenstein & Loeb (1995).

Rather than a narrow Freeman’s Law, there should be galaxies of all different surface brightness, over a broad range. The spin distribution above maps into the dashed line below:

Fig. 8 from McGaugh & de Blok (1998): Surface brightness distribution (data points from various sources) together with the distribution expected from the variation of spin parameters. Dotted line: Efstathiou & Jones (1979). Dashed line: Eisenstein & Loeb (1995). Theory predicts a very broad distribution with curvature inconsistent with observations. Worse, a cutoff must be inserted by hand to reconcile the high surface brightness end of the distribution.

Mapping spin to surface brightness predicts galaxies that are well above the Freeman value. Such very HSB galaxies do not exist, at least not as disks, so one had to insert a cut off by hand in dark matter models that would otherwise support such galaxies.

In contrast, an upper limit to galaxy surface brightness arises naturally in MOND. Only disks with surface density less than a0/G are stable.


!OK, I guess an obvious question is how surface brightness correlates with morphological type. I didn’t want to get into how the morphological T-type does or doesn’t correlate with quantitative measures, but here is this one example. Yes, there’s a correlation, but there is also a lot of meaningless scatter. LSBs tend to be late LTGs, but can be found among the early LTGs, and vice-versa for HSBs. Despite the clear trend, a galaxy with a central baryonic surface density of 1,000 M pc-2 could be in any bin of morphology.

The central surface density of baryons as a function of morphological type. Colors have the same meaning as in the gas fraction plot. (This measure of surface density is different from Σ50,bar used by Hua et al. above (see McGaugh 2006), but the details are irrelevant here.)

This messy correlation is par for the course for plots involving morphology, and for extragalactic astronomy in general. This is why the small scatter in the BTFR and the RAR is so amazing – that never happens!

Non-equilibrium dynamics in galaxies that appear to have lots of dark matter: ultrafaint dwarfs

Non-equilibrium dynamics in galaxies that appear to have lots of dark matter: ultrafaint dwarfs

This is a long post. It started focused on ultrafaint dwarfs, but can’t avoid more general issues. In order to diagnose non-equilibrium effects, we have to have some expectation for what equilibrium would be. The Tully-Fisher relation is a useful empirical touchstone for that. How the Tully-Fisher relation comes about is itself theory-dependent. These issues are intertwined, so in addition to discussing the ultrafaints, I also review some of the many predictions for Tully-Fisher, and how our theoretical expectation for it has evolved (or not) over time.

In the last post, we discussed how non-equilibrium dynamics might make a galaxy look like it had less dark matter than similar galaxies. That pendulum swings both ways: sometimes non-equilibrium effects might stir up the velocity dispersion above what it would nominally be. Some galaxies where this might be relevant are the so-called ultrafaint dwarfs (not to be confused with ultradiffuse galaxies, which are themselves often dwarfs). I’ve talked about these before, but more keep being discovered, so an update seems timely.

Galaxies and ultrafaint dwarfs

It’s a big universe, so there’s a lot of awkward terminology, and the definition of an ultrafaint dwarf is somewhat debatable. Most often I see them defined as having an absolute magnitude limit MV > -8, which corresponds to a luminosity less than 100,000 suns. I’ve also seen attempts at something more physical, like being a “fossil” whose star formation was entirely before cosmic reionization, which ended way back at z ~ 6 so all the stars would be at least*&^# 12.5 Gyr old. While such physics-based definitions are appealing, these are often tied up with theoretical projection: the UV photons that reionized the universe should have evaporated the gas in small dark matter halos, so these tiny galaxies can only be fossils from before that time. This thinking pervades much of the literature despite it being obviously wrong, as counterexamples! exist. For example, Leo P is practically an ultrafaint dwarf by luminosity, but has ample gas (so a larger baryonic mass) and is currently forming stars.

A luminosity-based definition is good enough for us here; I don’t really care exactly where we make the cut. Note that ultrafaint is an appropriate moniker: a luminosity of 105 L is tiny by galaxy standards. This is a low-grade globular cluster, and some ultrafaints are only a few hundred solar luminosities, which is barely even# a star cluster. At this level, one has to worry about stochastic effects in stellar evolution. If there are only a handful of stars, the luminosity of the entire system changes markedly as a single star evolves up the red giant branch. Consequently, our mapping from observed quantities to stellar mass is extremely dodgy. For consistency, to compare with brighter dwarfs, I’ve adopted the same boilerplate M*/LV = 2 M/L. That makes for a fair comparison luminosity-to-luminosity, but the uncertainty in the actual stellar mass is ginormous.

It gets worse, as the ultrafaints that we know about so far are all very nearby satellites of the Milky Way. They are not discovered in the same way as other galaxies, where one plainly sees a galaxy on survey plates. For example, NGC 7757:

A faint galaxy in the night sky, surrounded by numerous distant star-like points.
The spiral galaxy NGC 7757 as seen on plates of the Palomar Sky Survey.

While bright, high surface brightness galaxies like NGC 7757 are easy to see, lower surface brightness galaxies are not. However, they can usually still be seen, if you know where to look:

A faint galaxy amidst numerous distant stars in a dark sky, illustrating the challenges of observing low surface brightness galaxies.
UGC 1230 as seen on the Palomar Sky Survey. It’s in the middle.

I like to use this pair as an illustration, as they’re about the same distance from us and about the same angular size on the sky – at least, once you crank up the gain for the low surface brightness UGC 1230:

Comparison of two astronomical images: the left side shows a spiral galaxy with visible structure and brightness, while the right side features a lower surface brightness galaxy, appearing more diffuse and less distinct.
Zoom in on deep CCD images of NGC 7757 (left) and UGC 1230 (right) with the contrast of the latter enhanced. The chief difference between the two is surface brightness – how spread out their stars are. They have a comparable physical diameter, they both have star forming regions that appear as knots in their spiral arms, etc. These galaxies are clearly distinct from the emptiness of the cosmic void around them, being examples of giant stellar systems that gave rise to the term “island universe.”

In contrast to objects that are obvious on the sky as independent island universes, ultrafaint dwarfs are often invisible to the eye. They are recognized as a subset of stars near each other on the sky that also share the same distance and direction of motion in a field that might otherwise be crowded with miscellaneous, unrelated stars. For example, here is Leo IV:

Wide field image of the Ultra-Faint Dwarf Galaxy Leo IV, featuring a zoomed-in view of its faint structure surrounded by numerous background stars and galaxies.
The ultrafaint dwarf Leo IV as identified by the Sloan Digital Sky Survey and the Hubble Space Telescope.

See it?

I don’t. I do see a number of background galaxies, including an edge-on spiral near the center of the square. Those are not the ultrafaint dwarf, which is some subset of the stars in this image. To decide which ones are potentially a part of such a dwarf, one examines the color magnitude diagram of all the stars to identify those that are consistent with being at the same distance, and assigns membership in a probabilistic way. It helps if one can also obtain radial velocities and/or proper motions for the stars to see which hang together – more or less – in phase space.

Part of the trick here is deciding what counts as hanging together. A strong argument in favor of these things residing in dark matter halos is that the velocity differences between the apparently-associated stars are too great for them to remain together for any length of time otherwise. This is essentially the same situation that confronted Zwicky in his observations of galaxies in clusters in the 1930s. Here are these objects that appear together in the sky, but they should fly apart unless bound together by some additional, unseen force. But perhaps some of these ultrafaints are not hanging together; they may be in the process of coming apart. Indeed, they may have so few stars because they are well down the path of dissolution.

Since one cannot see an ultrafaint dwarf in the same way as an island universe, I’ve heard people suggest that being bound by a dark matter halo be included in the definition of a galaxy. I see where they’re coming from, but find it unworkable. I know a galaxy when I see one. As did Hubble, as did thousands of other observers since, as can you when you look at the pictures above. It is absurd to make the definition of an object that is readily identifiable by visual inspection be contingent on the inferred presence of invisible stuff.

So are ultrafaints even galaxies? Yes and no. Some of the probabilistic identifications may be mere coincidences, not real objects. However, they can’t all be fakes, and I think that if you put them in the middle of intergalactic space, we would recognize them as galaxies – provided we could detect them at all. At present we can’t, but hopefully that situation will improve with the Rubin Observatory. In the meantime, what we have to work with are these fragmentary systems deep in the potential well of the seventy billion solar mass cosmic gorilla that is the Milky Way. We have to be cognizant that they might have gotten knocked around, as we can see in more massive systems like the Sagittarius dwarf. Of course, if they’ve gotten knocked around too much, then they shouldn’t be there at all. So how do these systems evolve under the influence of a comic gorilla?

Let’s start by looking at the size-mass diagram, as we did before. Ultrafaint dwarfs extend this relation to much lower mass, and also to rather small sizes – some approaching those of star clusters. They approximately follow a line of constant surface density, ~0.1 M pc-2 (dotted line)..

A graph illustrating the size-mass relationship of galaxies, plotting effective radius (Re) against stellar mass (M*). Black squares represent data points of larger galaxies, while green squares indicate ultrafaint dwarfs. The dotted line suggests a correlation between size and mass.
The size and stellar mass of Local Group dwarfs as discussed previously, with the addition of ultrafaint dwarfs$ (small gray squares).

This looks weird to me. All other types of galaxies scatter all over the place in this diagram. The ultrafaints are unique in following a tight line in the size-mass plane, and one that follows a line of constant surface brightness. Every element of my observational experience screams that this is likely to be an artifact. Given how these “galaxies” are identified as the loose association of a handful of stars, it is easy to imagine that this trend might be an artifact of how we define the characteristic size of a system that is essentially invisible. It might also arise for physical reasons to do with the cosmic gorilla; i.e., it is a consequence of dynamical evolution. So maybe this correlation is real, but the warning lights that it is not are flashing red.

The Baryonic Tully-Fisher relation as a baseline

Ideally, we would measure accelerations to test theories, particularly MOND. Here, we would need to use the size to estimate the acceleration, but I straight up don’t believe these sizes are physically meaningful. The stellar mass, dodgy as it is, seems robust by comparison. So we’ll proceed as if we know that much – which we don’t, really – but let’s at least try.

With the stellar mass (there is no gas in these things), we are halfway to constructing the baryonic Tully-Fisher relation (BTFR), which is the simplest test of the dynamics that we can make with the available data. The other quantity we need is the characteristic circular speed of the gravitational potential. For rotating galaxies, that is the flat rotation speed, Vf. For pressure supported dwarfs, what is usually measured is the velocity dispersion σ. We’ve previously established that for brighter dwarfs in the Local Group, a decent approximation is Vf = 2σ, so we’ll start by assuming that this should apply to the ultrafaints as well. This allows us to plot the BTFR:

A scatter plot showing the relationship between velocity (Vf in km/s) and baryonic mass (Mb in solar masses), with data points represented by different shapes and colors for various galaxy types.
The baryonic mass and characteristic circular speeds of both rotationally supported galaxies (circles) and pressure supported dwarfs (squares). The colored points follow the same baryonic Tully-Fisher relation (BTFR), but the data for low mass ultrafaint dwarfs (gray squares) flattens out, having nearly the same characteristic speed over several decades in mass.

The BTFR is an emprical relation of the form Vf ~ Mb1/4 over about six decades in mass. Somewhere around the ultrafaint scale, this no longer appears to hold, with the observed velocity flattening out to become approximately constant for these lowest mass galaxies. I’m not sure this is real, as there many practical caveats to interpreting the observations. Measuring stellar velocities is straightforward but demanding at this level of accuracy. There are many potential systematics, pretty much all of which cause the intrinsic velocity dispersion to be overestimated. For example, observations made with multislit masks tend to return larger dispersions than observations of the same object with fibers. That’s likely because it is hard to build a mask so well that all of the stars perfectly hit the centers of the slitlets assigned to them; offsets within the slit shift the spectrum in a way that artificially adds to the apparent velocity dispersion. Fibers are less efficient in their throughput, but have the virtue of blending the input light in a way that precludes this particular systematic. Another concern is physical – some of the stars that are observed are presumably binaries, and some of the velocity will be due to motion within the binary pair and nothing to do with the gravitational potential of the larger system. This can be addressed with repeated observations to see if some velocities change, but it is hard to do that for each and every system, especially when it is way more fun to discover and explore new systems than follow up on the same one over and over and over again.

There are lots of other things that can go wrong. At some level, some of them probably do – that’s the nature of observational astronomy&. While it seems likely that some of the velocity dispersions are systematically overestimated, it seems unlikely that all of them are. Let’s proceed as if the bulk of the data is telling us something, even if we treat individual objects with suspicion.

MOND

MOND makes a clear prediction for the BTFR of isolated galaxies: the baryonic mass goes as the fourth power of the flat rotation speed. Contrary to Newtonian expectation, this holds irrespective of surface brightness, which is what attracted my attention to the theory in the first place. So how does it do here?

A graph depicting the relationship between the flat rotation speed (Vf in km/s) and the baryonic mass (Mb in solar masses), showing data points for various galaxies, including ultrafaint dwarfs highlighted with unique markers.
The same data as above with the addition of the line predicted by MOND (Milgrom 1983).

Low surface density means low acceleration, so low surface brightness galaxies would make great tests of MOND if they were isolated. Oh, right – they already did. Repeatedly. MOND also correctly predicted the velocities of low mass, gas-rich dwarfs that were unknown when the prediction was made. These are highly nontrivial successes of the theory.

The ultrafaints we’re discussing here are not isolated, so they do not provide the clean tests that isolated galaxies provide. However, galaxies subject to external fields should have low velocities relative to the BTFR, while the ultrafaints have higher velocities. They’re on the wrong side of the relation! Taking this at face value (i.e., assuming equilibrium), MOND fails here.

Whenever MOND has a problem, it is widely seen as a success of dark matter. In my experience, this is rarely true: observations that are problematic for MOND usually don’t make sense in terms of dark matter either. For each observational test we also have to check how LCDM fares.

LCDM

How LCDM fares is often hard to judge because its predictions for the same phenomena are not always clear. Different people predict different things for the same theory. There have been lots of LCDM-based predictions made for both dwarf satellite galaxies and the Tully-Fisher relation. Too many, in fact – it is a practical impossibility to examine them all. Nevertheless, some common themes emerge if we look at enough examples.

The halo mass-velocity relation

The most basic prediction of LCDM is that the mass of a dark matter halo scales with the cube of the circular velocity of a test particle at the virial radius (conventionally taken to be the radius R200 that encompasses an average density 200 times the critical density of the universe. If that sounds like gobbledygook to you, just read “halo” for “200”): M200 ~ V2003. This is a very basic prediction that everyone seems to agree to.

There is a tiny problem with testing this prediction: it refers to the dark matter halo that we cannot see. In order to test it, we have to introduce some scaling factors to relate the dark to the light. Specifically, Mb = fd M200 and Vf = fv V200, where fd is the observed fraction of mass in baryons and fv relates the observed flat velocity to the circular speed of our notional test particle at the virial radius. The obvious assumptions to make are that fd is a constant (perhaps as much as but not more than the cosmic baryon fraction of 16%) and fv is close to untiy. The latter requirement stems from the need for dark matter to explain the amplitude of the flat rotation speed, but fv could be slightly different; plausible values range from 0.9 < fv < 1.4. Values large than one indicate a rotation curve that declines before the virial radius is reached, which is the natural expectation for NFW halos.

Here is a worked example with fd = 0.025 and fv = 1:

A graph depicting the relationship between the flat rotation speed (Vf) in kilometers per second and the baryonic mass (Mb) in solar masses. The data points are shown with various markers, including gray squares, green squares, and blue circles, each representing different galaxy types, along with error bars. A solid gray line indicates a trend, while a dotted line marks a theoretical lower bound.
The same data as above with the addition of the nominal prediction of LCDM. The dotted line is the halo mass-circular velocity relation; the gray band is a simple model with fd = 0.025 and fv = 1 (e.g., Mo, Mao, & White 1998).

I have illustrated the model with a fat grey line because fd = 0.025 is an arbitrary choice* I made to match the data. It could be more, it could be less. The detected baryon fraction can be anythings up to or less than the cosmic value, fd < fb = 0.16 as not all of the baryons available in a halo cool and condense into cold gas that forms visible stars. That’s fine; there’s no requirement that all of the baryons have to become readily observable, but there is also no reason to expect all halos to cool exactly the same fraction of baryons. Naively one would expect at least some variation in fd from halo to halo, so there could and probably should be a lot of scatter: the gray line could easily be a much wider band than depicted.

In addition to the rather arbitrary value of fd, this reasoning also predicts a Tully-Fisher relation with the wrong slope. Picking a favorable value of fd only matches the data over a narrow range of mass. It was nevertheless embraced for many years by many people. Selection effects bias samples to bright galaxies. Consequently, the literature is rife with TF samples dominated by galaxies with Mb > 1010 M (the top right corner of the plot above); with so little dynamic range, a slope of 3 looks fine. Once you look outside that tiny box, it does not look fine.

Personally, I think a slope of 3 is an oversimplification. That is the prediction for dark matter halos; there can be effects that vary systematically with mass. An obvious one is adiabatic compression, the effect by which baryons drag some dark matter along with them as they settle to the center of their halos. This increases fv by an amount that depends on the baryonic surface density. Surface density correlates with mass, so I would nominally expect higher velocities in brighter galaxies; this drives up the slope. There are various estimates of this effect; typically one gets a slope like 3.3, not the observed 4. Worse, it predicts an additional effect: at a given mass, galaxies of higher surface brightness should also have higher velocity. Surface brightness should be a second parameter in the Tully-Fisher relation, but this is not observed.

The easiest way to reconcile the predicted and observed slopes are to make fd a function of mass. Since Mb = fd M200 and M200 ~ V2003, Mb ~ fd V2003. Adopting fv = 1 for simplicity, Mb ~ Vf4 follows if fd ~ Vf. Problem solved, QED.

There are [at least] two problems with this argument. One is that the scaling fd ~ Vf must hold perfectly without introducing any scatter. This is a fine-tuning problem: we need one parameter to vary precisely with an another, unrelated parameter. There is no good reason to expect this; we just have to insert the required dependence by hand. This is much worse than choosing an arbitrary value for fd: now we’re making it a rolling fudge factor to match whatever we need it to. We can make it even more complicated by invoking some additional variation in fv, but this just makes the fine-tuning worse as the product fdfv-3 has to vary just so. Another problem is that what we’re doing all this to adjust the prediction of one theory (LCDM) to match that of a different theory (MOND). It is never a good sign when we have to do that, whether we admit it or not.

Abundance matching

The reasoning leading to a slope 3 Tully-Fisher relation assumes a one-to-one relation between baryonic and halo mass (fd = constant). This is an eminently reasonable assumption. We spent a couple of decades trying to avoid having to break this assumption. Once we do so and make fd a freely variable parameter, then it can become a rolling fudge factor that can be adjusted to fit anything. Everyone agrees that is Bad. However, it might be tolerable if there is an independent way of estimating this variation. Rather than make fd just be what we need it to be as described above, we can instead estimate it with abundance matching.

Abundance matching comes from equating the observed number density of galaxies as a function of mass with the number density of dark matter halos. This process gives fd, or at least the stellar fraction, f*, which is close to fd for bright galaxies. Critically, it provides a way to assign dark matter halo masses to galaxies independently of their kinematics. This replaces an arbitrary, rolling fudge factor with a predictive theory.

Abundance matching models generically introduce curvature into the prediction for the BTFR. This stems from the mismatch in the shape of the galaxy stellar mass function (a Schechter function) and the dark halo mass function (a power law on galaxy scales). This leads to a bend in relations that map between visible and dark mass.

The transition from the M ~ V3 reasoning to abundance matching occurred gradually, but became pronounced circa 2010. There are many abundance matching models; I already faced the problem of the multiplicity of LCDM predictions when I wrote a lengthy article on the BTFR in 2012. To get specific, let’s start with an example from then, the model of Trujillo-Gomez-et al. (2011):

Scatter plot showing the relationship between gravitational potential flat rotation speed (Vf in km/s) and baryonic mass (Mb in solar masses). The plot features varying data points marked with blue circles, green squares, and gray squares, indicating different galaxy types or observational methods. A red curve is drawn, illustrating an empirical relationship fitting the data.
The same data as above with the addition of the line predicted by LCDM in the model of Trujillo-Gomez-et al. (2011).

One thing Trujillo-Gomez-et al. (2011) say in their abstract is “The data present a clear monotonic LV relation from ∼50 km s−1 to ∼500 km s−1, with a bend below ∼80 km s−1“. By LV they mean luminosity-velocity, i.e., the regular Tully-Fisher relation. The bend they note is real; that’s what happens when you consider only the starlight and ignore the gas. The bend goes away if you include that gas. This was already known at the time – our original BTFR paper from 2000 has nearly a thousand citations, so it isn’t exactly obscure. Ignoring the gas is a choice that makes no sense empirically but makes a lot of sense from the perspective of LCDM simulations. By 2010, these had become reasonably good at matching the numbers of stars observed in galaxies, but the gas properties of simulated galaxies remained, hmmmmmmm, wanting. It makes sense to utilize the part that works. It makes less sense to pretend that this bend is something physically meaningful rather than an artifact of ignoring the gas. The pressure-supported dwarfs are all star dominated, so this distinction doesn’t matter here, and they follow the BTFR, not the stars-only version.

An old problem in galaxy formation theory is how to calibrate the number density of dark matter halos to that of observed galaxies. For a long time, a choice that people made was to match either the luminosity function or the kinematics. These didn’t really match up, so there was occasional discussion of the virtues and vices of the “luminosity function calibration” vs. the “Tully-Fisher calibration.” These differed by a factor of ~2. This tension between remains with us. Mostly simulations have opted to adopt the luminosity function calibration, updated and rebranded as abundance matching. Again, this makes sense from the perspective of LCDM simulations, because the number density of dark matter halos is something that simulations can readily quantify while the kinematics of individual galaxies are much harder to resolve**.

The nonlinear relation between stellar mass and halo mass obtained from abundance matching inevitably introduces curvature into the corresponding Tully-Fisher relation predicted by such models. That’s what you see in the curved line of Trujillo-Gomez-et al. (2011) above. They weren’t the first to obtain such a result, and the certainly weren’t the last: this is a feature of LCDM with abundance matching, not a bug.

The line of Trujillo-Gomez-et al. (2011) matches the data pretty well at intermediate masses. It diverges to higher velocities at both small and large galaxy masses. I’ve written about this tension at high masses before; it appears to be real, but let’s concentrate on low masses here. At low masses, the velocity of galaxies with Mb < 108 M appears to be overestimated. But the divergence between model and reality has just begun, and it is hard to resolve small things in simulations, so this doesn’t seem too bad. Yet.

Moving ahead, there are the “Latte” simulations of Wetzel et al. (2016) that use the well-regarded FIRE code to look specifically at simulated dwarfs, both isolated and satellites – specifically satellites of Milky Way-like systems. (Milky Way. Latte. Get it? Nerd humor.) So what does that find?

A graph displaying the relationship between circular velocity (Vf in km/s) and baryonic mass (Mb in solar masses), featuring various data points distinguished by shape and color, including gray squares, green squares, orange triangles, and blue circles to represent different types of galaxies.
The same data as above with the addition of simulated dwarfs (orange triangles) from the Latte LCDM simulation of Wetzel et al. (2016), specifically the simulated satellites in the top panel of their Fig. 3. Note that we plot Vf = 2σ for pressure supported systems, both real and simulated.

The individual simulated dwarf satellites of Wetzel et al. (2016) follow the extrapolation of the line predicted by Trujillo-Gomez-et al. (2011). To first order, it is the same result to higher resolution (i.e., smaller galaxy mass). Most of the simulated objects have velocity dispersions that are higher than observed in real galaxies. Intriguingly, there are a couple of simulated objects with M* ~ 5 x 106 M that fall nicely among the data where there are both star-dominated and gas-rich galaxies. However, these two are exceptions; the rule appears to be characteristic speeds that are higher than observed.

The lowest mass simulated satellite objects begin to approach the ultrafaint regime, but resolution continues to be an issue: they’re not really there yet. This hasn’t precluded many people from assuming that dark matter will work where MOND fails, which seems like a heck of a presumption given that MOND has been consistently more successful up until that point. Where MOND underpredicts the characteristic velocity of ultrafaints, LCDM hasn’t yet made a clear prediction, and it overpredicts velocities for objects of slightly larger mass. Ain’t no theory covering itself in glory here, but this is a good example where objects that are a problem for MOND are also a problem for dark matter, and it seems likely that non-equilibrium dynamics play a role in either case.

Comparing apples with apples

A persistent issue with comparing simulations to reality is extracting comparable measures. Where circular velocities are measured from velocity fields in rotating galaxies and estimated from measured velocity dispersions in pressure supported galaxies, the most common approach to deriving rotation curves from simulated objects is to sum up particles in spherical shells and assume V2 = GM/R. These are not the same quantities. They should be proxies for one another, but equality holds only in the limit of isotropic orbits in spherical symmetry. Reality is messier than that, and simulations aren’t that simple either%.

Sales et al. (2017) make the effort to make a better comparison between what is observed given how it is observed, and what the simulations would show for that quantity. Others have made a similar effort; a common finding is that the apparent rotation speeds of simulated gas disks do not trace the gravitational potential as simply as GM/R. That’s no surprise, but most simulated rotation curves do not look like those of real galaxies^, so the comparison is not straightforward. Those caveats aside, Sales et al. (2017) are doing the right thing in trying to make an apples-to-apples comparison between simulated and observed quantities. They extract from simulations a quantity Vout that is appropriate for comparison with what we observe in the outer parts of rotation curves. So here is the resulting prediction for the BTFR:

A graph plotting the baryonic mass (Mb in solar masses) against the characteristic flat rotation speed (Vf in km/s) for various galaxies, showing a curve that describes the baryonic Tully-Fisher relation. The scatter points include different types of galaxies, with green squares indicating specific categories.
The same data as above with the addition of the line predicted by LCDM in the model of Sales et al. (2017), specifically the formula for Vout in their Table 2 which is their proxy for the observable rotation speed.

That’s pretty good. It still misses at high masses (those two big blue points at the top are Andromeda and the Milky Way) and it still bends away from the data at low masses where there are both star-dominated and gas-rich galaxies. (There are a lot more examples of the latter that I haven’t used here because the plot gets overcrowded.) Despite the overshoot, the use of an observable aspect of the simulations gets closer to the data, and the prediction flattens out in the same qualitative sense. That’s good, so one might see cause for hope that this problem is simply a matter of making a fair comparison between simulations and data. We should also be careful not to over-interpret it: I’ve simply plotted the formula they give; the simulations to which they fit it surely do not resolve ultrafaint dwarfs, so really the line should stop at some appropriate mass scale.

Nevertheless, it makes sense to look more closely at what is observed vs. what is simulated. This has recently been done in greater detail by Ruan et al. (2025). They consider two simulations that implement rather different feedback; both wind up producing rotating, gas rich dwarfs that actually fall on the BTFR.

Scatter plot illustrating the baryonic Tully-Fisher relation, showing the relationship between characteristic circular velocity (Vf) and baryonic mass (Mb) for various galaxy types, including data points for ultrafaint dwarfs.
The same data as above with the addition of simulated dwarfs of Ruan et al. (2025), specifically from the top right panel of their Fig. 6. The orange circles are their “massives” and the red triangles the “marvels” (the distinction refers to different feedback models).

Finally some success after all these years! Looking at this, it is tempting to declare victory: problem solved. It was just a matter of doing the right simulation all along, and making an apples-to-apples comparison with the data.

That sounds too goo to be true. Is it repeatable in other simulations? What works now that didn’t before?

These are high resolution simulations, but they still don’t resolve ultrafaints. We’re talking here about gas-rich dwarfs. That’s also an important topic, so let’s look more closely. What works now is in the apples-to-apples assessment: what we would measure for Vout is less than Vmax (related to V200) of the halo:

A graph displaying two panels: the top panel shows the relation between the ratio of mid-outward velocity to maximum velocity (Vout, mid / Vmax, mid) and the logarithm of baryonic mass (Mbar), with data points represented as circles and triangles. The bottom panel illustrates the relationship between the ratio of outer radius to maximum radius (Rout, mid / Rmax, mid) and the logarithm of baryonic mass, also featuring similar data points.
Two panels from Fig. 7 of Ruan et al. (2025) showing the ratio of the velocity we might observe relative to the characteristic circular velocity of the halo (top) and the ratio of the radii where these occur (bottom).

The treatment of cold gas in simulations has improved. In these simulations, Vout(Rout) is measured where the gas surface density falls to 1 M pc-2, which is typical of many observations. But the true rotation curve is still rising for objects with Mb < a few x 108 M; it has not yet reached a value that is characteristic of the halo. So the apparent velocity is low, even if the dark matter halos are doing basically the same thing as before:

Graph showing the baryonic Tully-Fisher relation, with velocity Vf (km/s) plotted against baryonic mass Mb (solar masses). Data points include various galaxies and dwarf galaxies, with error bars indicating measurement uncertainties. A red line represents the best-fit relation.
As above, but with the addition of the true Vmax (small black dots) of the simulated halos discussed by Ruan et al. (2025), which follow the relation of Sales et al. (2017) (line for Vmax in their Table 2).

I have mixed feelings about this. On the one hand, there are many dwarf galaxies with rising rotation curves that we don’t see flatten out, so it is easy to imagine they might keep going up, and I find it plausible that this is what we would find if we looked harder. So plausible that I’ve spend a fair amount of time doing exactly this. Not all observations terminate at 1 M pc-2, and whenever we push further out, we see the same damn thing over and over: the rotation curve flattens out and stays flat!!. That’s been my anecdotal experience; getting beyond that systematically is the point of the MOHNGOOSE survey. This was constructed to detect much lower atomic gas surface densities, and routinely detects gas at the 0.1 M pc-2 level where Ruan et al. suggest we should see something closer to Vmax. So far, we don’t.

I don’t want to sound too negative, because how we map what we predict in simulations to what we measure in observations is a serious issue. But it seems a bit of a stretch for a low-scatter power law BTFR to be the happenstance of observational sensitivity that cuts in at a convenient mass scale. So far, we see no indication of that in more sensitive observations. I’ll certainly let you know if that changes.

Survey says…

At this juncture, we’ve examined enough examples that the reader can appreciate my concern that LCDM models can predict rather different things. What does the theory really predict? We can’t really test it until we agree what it should do!!!.

I thought it might be instructive to combine some of the models discussed above. It is.

Graph illustrating the correlation between the characteristic flat rotation speed (Vf) and baryonic mass (Mb) of galaxies. The plot features data points in different colors representing various galaxy types, with lines indicating theoretical trends and empirical relations.
Some of the LCDM predictions discussed above shown together. The dotted line to the right of the data is the halo mass-velocity relation, which is the one thing we all agree LCDM predicts but which is observationally inaccessible. The grey band is a Mo, Mao, & White-type model with fd = 0.025. The red dotted line is the model of Trujillo-Gomez-et al. (2011); the solid red line that of Sales et al. (2017) for Vmax.

The models run together, more or less, for high mass galaxies. Thanks to observational selection effects, these are the objects we’ve always known about and matched our theories to. In order to test a theory, one wants to force it to make predictions in new regimes it wasn’t built for. Low mass galaxies do that, as do low surface brightness galaxies, which are often but not always low mass. MOND has done well for both, down to the ultrafaints we’re discussing here. LCDM does not yet explain those, or really any of the intermediate mass dwarfs.

What really disturbs me about LCDM models is their flexibility. It’s not just that they miss, it’s that it is possible to miss the data on either side of the BTFR. The older fd = constant models predict velocities that are too low for low mass galaxies. The more recent abundance matching models predict velocities that are too high for low mass galaxies. I have no doubt that a model can be constructed that gets it right, because there is obviously enough flexibility to do pretty much anything. Adding new parameters until we get it right is an example of epicyclic thinking, as I’ve been pointing out for thirty years. I don’t know what could be worse for an idea like dark matter that is not falsifiable.

We still haven’t come anywhere close to explaining the ultrafaints in either theory. In LCDM, we don’t even know if we should draw a curved line that catches them as if they’re in equilibrium, or start from a power-law BTFR and look for departures from that due to tidal effects. Both are possible in LCDM, both are plausible, as is some combination of both. I expect theorists will pick an option and argue about it indefinitely.

Tidal effects

The typical velocity dispersion of the ultrafaint dwarfs is too high for them to be in equilibrium in MOND. But there’s also pretty much no way these tiny things could be in equilibrium, being in the rough neighborhood dominated by our home, the cosmic gorilla. That by itself doesn’t make an explanation; we need to work out what happens to such things as they evolve dynamically under the influence of a pronounced external field. To my knowledge, this hasn’t been addressed in detail in MOND any more than in LCDM, though Brada & Milgrom addressed some of the relevant issues.

There is a difference in approach required for the two theories. In LCDM, we need to increase the resolution of simulations to see what happens to the tiniest of dark matter halos and their resident galaxies within the larger dark matter halos of giant galaxies. In MOND we have to simulate the evolution along the orbit of each unique individual. This is challenging on multiple levels, as each possible realization of a MOND theory requires its own code. Writing a simulation code for AQUAL requires a different numerical approach than QUMOND, and those are both modifications of gravity via the Poisson euqation. We don’t know which might be closer to reality; heck, we don’t even know [yet] if MOND is a modification of gravity or intertia, the latter being even harder to code.

Cold dark matter is scale-free, so crudely I expect ultrafaint dwarfs in LCDM to do the same as larger dwarf satellites that have been simulated: their outer dark matter halos are gradually whittled away by tidal stripping for many Gyr. At first the stars are unaffected, but eventually so little dark matter is left that the stars start to be lost impulsively during pericenter passages. Though the dark matter is scale free, the stars and the baryonic physics that made them are not, so that’s where it gets tricky. The apparent dark-to-luminous mass ratio is huge, so one possibility is that the ultrafaints are in equilibrium despite their environment; they just made ridiculously few stars from the amount of mass available. That’s consistent with a wild extrapolation of abundance matching models, but how it comes about physically is less clear. For example, at some low mass, a galaxy would make so few stars that none are massive enough to result in a supernova, so there is no feedback, which is what is preventing too many stars from forming. Awkward. Alternately, the constant exposure to tidal perturbation might stir things up, with the velocity dispersion growing and stars getting stripped to form tidal streams, so they may have started as more massive objects. Or some combination of both, plus the evergreen possibility of things that don’t occur to me offhand.

Equilibrium for ultrafaint satellites is not an option in MOND, but tidal stirring and stripping is. As a thought experiment, let’s imagine what happens to a low mass dwarf typical of the field that falls towards the Milky Way from some large distance. Initially gas-rich, the first environmental effect that it is likely to experience is ram pressure stripping by the hot coronal gas around the Milky Way. That’s a baryonic effect that happens in either theory; it’s nothing to do with the effective law of gravity. A galaxy thus deprived of much of its mass will be out of equilibrium; its internal velocities will be typical of the original mass but the stripped mass is less. Consequently, its structure must adjust to compensate; perhaps dwarf Irregulars puff up and are transformed into dwarf Spheroidals in this way. Our notional infalling dwarf may have time to equilibrate to its new mass before being subject to strong tidal perturbation by the Milky Way, or it may not. If not, it will have characteristic internal velocities that are too high for its new mass, and reside above the BTFR. I doubt this suffices to explain [m]any of the ultrafaints, as their masses are so tiny that some stellar mass loss is also likely to have occurred.

Let’s suppose that our infalling dwarf has time to [approximately] equilibrate, or it simply formed nearby to begin with. Now it is a pressure supported system [more or less] on the BTFR. As it orbits the Milky Way, it feels an extra force from the external field. If it stays far enough out to remain in quasi-equilibrium in the EFE regime, then it will oscillate in size and velocity dispersion in phase with the strength of the external field it feels along its orbit.

If instead a satellite dips too close, it will be tidally disturbed and depart from equilibrium. The extra energy may stir it up, increasing its velocity dispersion. It doesn’t have the mass to sustain that, so stars will start to leak out. Tidal disruption will eventually happen, with the details depending on the initial mass and structure of the dwarf and on the eccentricity of its orbit, the distance of closest approach (pericenter), whether the orbit is prograde or retrograde relative to any angular momentum the dwarf may have… it’s complicated, so it is hard to generalize##. Nevertheless, we (McGaugh & Wolf 2010) anticipated that “the deviant dwarfs [ultrafaints] should show evidence of tidal disruption while the dwarfs that adhere to the BTFR should not.” Unlike LCDM where most of the damage is done at closest approach, we anticipate for MOND that “stripping of the deviant dwarfs should be ongoing and not restricted to pericenter passage” because tides are stronger and there is no cocoon of dark matter to shelter the stars. The effect is still maximized at pericenter, its just not as impulsive as in the some of the dark matter simulations I’ve seen.

This means that there should be streams of stars all over the sky. As indeed there are. For example:

A color-coded map of the northern sky displaying various stellar streams, indicated by labels such as 'Gaia-1*', 'Gaia-3*', and 'GD-1'. The color gradient represents velocity in kilometers per second, with colors ranging from blue for lower velocities to red for higher velocities.
Stellar streams in the Milky Way identified using Gaia (Malhan et al. 2018).

As a tidally influence dwarf dissolves, the stars will leak out and form a trail. This happens in LCDM too, but there are differences in the rate, coherence, and symmetry of the resulting streams. Perhaps ultrafaint dwarfs are just the last dregs of the tidal disruption process. From this perspective, it hardly matters if they originated as external satellites or are internal star clusters: globular clusters native to the Milky Way should undergo a similar evolution.

Evolutionary tracks

Perhaps some of the ultrafaint dwarfs are the nuggets of disturbed systems that have suffered mass loss through tidal stripping. That may be the case in either LCDM or MOND, and has appealing aspects in either case – we went through all the possibilities in McGaugh & Wolf (2010). In MOND, the BTFR provides a reference point for what a stable system in equilibrium should do. That’s the starting point for the evolutionary tracks suggested here:

A graph plotting flat rotation speed (Vf) in km/s against baryonic mass (Mb) in solar masses. The data points include various galaxies represented as blue circles and green squares, with error bars indicating measurement uncertainty. A solid black line demonstrates the overall trend, while red curves suggest alternative theoretical predictions.
BTFR with conceptual evolutionary tracks (red lines) for tidally-stirred ultrafaint dwarfs.

Objects start in equilibrium on the BTFR. As they become subject to the external field, their velocity dispersions first decreases as they transition through the quasi-Newtonian regime. As tides kick in, stars are lost and stretched along the satellite’s orbit, so mass is lost but the apparent velocity dispersion increases as stars gradually separate and stretch out along a stream. Their relative velocities no longer represent a measure of the internal gravitational potential; rather than a cohesive dwarf satellite they’re more an association of stars in similar orbits around the Milky Way.

This is crudely what I imagine might be happening in some of the ultrafaint dwarfs that reside above the BTFR. Reality can be more complicated, and probably is. For example, objects that are not yet disrupted may oscillate around and below the BTFR before becoming completely unglued. Moreover, some individual ultrafaints probably are not real, while the data for others may suffer from systematic uncertainties. There’s a lot to sort out, and we’ve reached the point where the possibility of non-equilibrium effects cannot be ignored.

As a test of theories, the better course remains to look for new galaxies free from environmental perturbation. Ultrafaint dwarfs in the field, far from cosmic gorillas like the Milky Way, would be ideal. Hopefully many will be discovered in current and future surveys.


!Other examples exist and continue to be discovered. More pertinent to my thinking is that the mass threshold at which reionization is supposed to suppress star formation has been a constantly moving goal post. To give an amusing anecdote, while I was junior faculty at the University of Maryland (so at least twenty years ago), Colin Norman called me up out of the blue. Colin is an expert on star formation, and had a burning question he thought I could answer. “Stacy,” he says as soon as I pick up, “what is the lowest mass star forming galaxy?” Uh, Hi, Colin. Off the cuff and totally unprepared for this inquiry, I said “um, a stellar mass of a few times 107 solar masses.” Colin’s immediate response was to laugh long and loud, as if I had made the best nerd joke ever. When he regained his composure, he said “We know that can’t be true as reionization will prevent star formation in potential wells that small.” So, after this abrupt conversation, I did some fact-checking, and indeed, the number I had pulled out of my arse on the spot was basically correct, at that time. I also looked up the predictions, and of course Colin knew his business too; galaxies that small shouldn’t exist. Yet they do, and now the minimum known is two orders of magnitude lower in mass, with still no indication that a lower limit has been reached. So far, the threshold of our knowledge has been imposed by observational selection effects (low luminosity galaxies are hard to see), not by any discernible physics.

More recently, McQuinn et al. (2024) have made a study of the star formation histories of Leo P and a few similar galaxies that are near enough to see individual stars so as to work out the star formation rate over the course of cosmic history. They argue that there seems to be a pause in star formation after reionization, so a more nuanced version of the hypothesis may be that reionization did suppress star forming activity for a while, but these tiny objects were subsequently able to re-accrete cold gas and get started again. I find that appealing as a less simplistic thing that might have happened in the real universe, and not just a simple on/off switch that leaves only a fossil. However, it isn’t immediately clear to me that this more nuanced hypothesis should happen in LCDM. Once those baryons have evaporated, they’re gone, and it is far from obvious that they’ll ever come back to the weak gravity of such a small dark matter halo. It is also not clear to me that this interpretation, appealing as it is, is unique: the reconstructed star formation histories also look consistent with stochastic star formation, with fluctuations in the star formation rate being a matter of happenstance that have nothing to do with the epoch of reionization.

#So how are ultrafaint dwarfs different from star clusters? Great question! Wish we had a great answer.

Some ultrafaints probably are star clusters rather than independent satellite galaxies. How do we tell the difference? Chiefly, the velocity dispersion: star clusters show no need for dark matter, while ultrafaint dwarfs generally appear to need a lot. This of course assumes that their measured velocity dispersions represent an equilibrium measure of their gravitational potential, which is what we’re questioning here, so the opportunity for circular reasoning is rife.

$Rather than apply a strict luminosity cut, for convenience I’ve kept the same “not safe from tidal disruption” distinction that we’ve used before. Some of the objects in the 105 – 106 M range might belong more with the classical dwarfs than with the ultrafaints. This is a reminder that our nomenclature is terrible more than anything physically meaningful.

&Astronomy is an observational science, not a laboratory science. We can only detect the photons nature sends our way. We cannot control all the potential systematics as can be done in an enclosed, finite, carefully controlled laboratory. That means there is always the potential for systematic uncertainties whose magnitude can be difficult to estimate, or sometimes to even be aware of, like how local variations impact Jeans analyses. This means we have to take our error bars with a grain of salt, often such a big grain as to make statistical tests unreliable: goodness of fit is only as meaningful as the error bars.

I say this because it seems to be the hardest thing for physicists to understand. I also see many younger astronomers turning the crank on fancy statistical machinery as if astronomical error bars can be trusted. Garbage in, garbage out.

*This is an example of setting a parameter in a model “by hand.”

**The transition to thinking in terms of the luminosity function rather than Tully-Fisher is so complete that the most recent, super-large, Euclid flagship simulation doesn’t even attempt to address the kinematics of individual galaxies while giving extraordinarily detailed and extensive details about their luminosity distributions. I can see why they’d do that – they want to focus on what the Euclid mission might observe – but it is also symptomatic of the growing tendency to I’ve witnessed to just not talk about those pesky kinematics.

%Halos in dark matter simulations tend to be rather triaxial, i.e., a 3D bloboid that is neither spherical like a soccer ball nor oblate like a frisbee nor prolate like an American football: each principle axis has a different length. If real halos were triaxial, it would lead to non-circular orbits in dark matter-dominated galaxies that are not observed.

The triaxiality of halos is a result from dark matter-only simulations. Personally, I suspect that the condensation of gas within a dark matter halo (presuming such things exist) during the process of galaxy formation rounds-out the inner halo, making it nearly spherical where we are able to make measurements. So I don’t see this as necessarily a failure of LCDM, but rather an example of how more elaborate simulations that include baryonic physics are sometimes warranted. Sometimes. There’s a big difference between this process, which also compresses the halo (making it more dense when it already starts out too dense), and the various forms of feedback, which may or may not further alter the structure of the halo.

^There are many failure modes in simulated rotation curves, the two most common being the cusp-core problem in dwarfs and sub-maximal disks in giants. It is common for the disks of bright spiral galaxies to be nearly maximal in the sense that the observed stars suffice to explain the inner rotation curve. They may not be completely maximal in this sense, but they come close for normal stellar populations. (Our own Milky Way is a good example.) In contrast, many simulations produce bright galaxies that are absurdly sub-maximal; EAGLE and SIMBA being two examples I remember offhand.

Another common problem is that LCDM simulations often don’t produce rotation curves that are as flat as observed. This was something I also found in my early attempts at model-building with dark matter halos. It is easy to fit a flat rotation curve given the data, but it is hard to predict a priori that rotation curves should be flat.

!!Gravitational lensing indicates that rotation curves remain flat to even larger radii. However, these observations are only sensitive to galaxies more massive than those under discussion here. So conceivably there could be another coincidence wherein flatness persists for galaxies with Mb > 1010 M, but not those with Mb < 109 M.

!!!Many in the community seem to agree that it will surely work out.

##I’ve tried to estimate dissolution timescales, but find the results wanting. For plausible assumptions, one finds timescales that seem plausible (a few Gyr) but with some minor fiddling one can also find results that are no-way that’s-too-short (a few tens of millions of years), depending on the dwarf and its orbit. These are crude analytic estimates; I’m not satisfied that these numbers were particularly meaningful. Still, this is a worry with the tidal-stirring hypothesis: will perturbed objects persist long enough to be observed as they are? This is another reason we need detailed simulations tailored to each object.


*&^#Note added after initial publication: While I was writing this, a nice paper appeared on exactly this issue of the star formation history of a good number of ultrafaint dwarfs. They find that 80% of the stellar mass formed 12.48 ± 0.18 Gyr ago, so 12.5 was a good guess. Formally, at the one sigma level, this is a little after reionization, but only a tiny bit, so close enough: the bulk of the stars formed long ago, like a classical globular cluster, and these ultrafaints are consistent with being fossils.

Intriguingly, there is a hint of an age difference by kinematic grouping, with things that have been in the Milky Way being the oldest, those on first infall being a little younger (but still very old), and those infalling with the Large Magellanic Cloud a tad younger still. If so, then there is more to the story than quenching by cosmic reionization.

They also show a nice collection of images so you can see more examples. The ellipses trace out the half-light radii, so can see the proclivity for many (not all!) of these objects to be elongated, perhaps as a result of tidal perturbation:

Figure 2 from Durbin et al. (2025)Footprints of all HST observations (blue filled patches) overlaid on DSS2 imaging cutouts. Open black ellipses show the galaxy profiles at one half-light radius.

Kinematics suggest large masses for high redshift galaxies

Kinematics suggest large masses for high redshift galaxies

This is what I hope will be the final installment in a series of posts describing the results published in McGaugh et al. (2024). I started by discussing the timescale for galaxy formation in LCDM and MOND which leads to different and distinct predictions. I then discussed the observations that constrain the growth of stellar mass over cosmic time and the related observation of stellar populations that are mature for the age of the universe. I then put on an LCDM hat to try to figure out ways to wriggle out of the obvious conclusion that galaxies grew too massive too fast. Exploring all the arguments that will be made is the hardest part, not because they are difficult to anticipate, but because there are so many* options to consider. This leads to many pages of minutiae that no one ever seems to read+, so one of the options I’ve discussed (e.g., super-efficient star formation) will likely emerge as the standard picture even if it comes pre-debunked.

The emphasis so far has been on the evolution of the stellar masses of galaxies because that is observationally most accessible. That gives us the opportunity to wriggle, because what we really want to measure to test LCDM is the growth of [dark] mass. This is well-predicted but invisible, so we can always play games to relate light to mass.

Mass assembly in LCDM from the IllustrisTNG50 simulation. The dark matter mass assembles hierarchically in the merger tree depicted at left; the size of the circles illustrates the dark matter halo mass. The corresponding stellar mass of the largest progenitor is shown at right as the red band. This does not keep pace with the apparent assembly of stellar mass (data points), but what is the underlying mass really doing?

Galaxy Kinematics

What we really want to know is the underlying mass. It is reasonable to expect that the light traces this mass, but is there another way to assess it? Yes: kinematics. The orbital speeds of objects in galaxies trace the total potential, including the dark matter. So, how massive were early galaxies? How does that evolve with redshift?

The rotation curve of NGC 6946 traced by stars at small radii and gas farther out. This is a typical flat rotation curve (data points) that exceeds what can be explained by the observed baryonic mass (red line deduced from the stars and gas pictured at right), leading to the inference of dark matter.

The rotation curve for NGC 6946 shows a number of well-established characteristics for nearby galaxies, including the dominance of baryons at small radii in high surface brightness galaxies and the famous flat outer portion of the rotation curve. Even when stars contribute as much mass as allowed by the inner rotation curve (“maximum disk“), there is a need for something extra further out (i.e., dark matter or MOND). In the case of dark matter, the amplitude of flat rotation is typically interpreted as being indicative& of halo mass.

So far, the rotation curves of high redshift galaxies look very much like those of low redshift galaxies. There are some fast rotators at high redshift as well. Here is an example observed by Neeleman et al. (2020), who measure a flat rotation speed of 272 km/s for DLA0817g at z = 4.26. That’s more massive than either the Milky Way (~200 km/s) or Andromeda (~230 km/s), if not quite as big as local heavyweight champion UGC 2885 (300 km/s). DLA0817g looks to be a disk galaxy that formed early and is sedately rotating only 1.4 Gyr after the Big Bang. It is already massive at this time: not at all the little nuggets we expect from the CDM merger tree above.

Fig. 1 from Neeleman et al. (2020): the velocity field (left) and position-velocity diagram (right) of DLA0817g. The velocity field looks like that of a rotating disk with the raw position-velocity diagram shows motions of ~200 km/s on either side of the center. When corrected for inclination, the flat rotation speed is 272 km/s, corresponding to a massive galaxy near the top of the Tully-Fisher relation.

This is anecdotal, of course, but there are a good number of similar cases that are already known. For example, the kinematics of ALESS 073.1 at z ≈ 5 indicate the presence of a massive stellar bulge as well as a rapidly rotating disk (Lelli et al. 2021). A similar case has been observed at z ≈ 6 (Tripodi et al. 2023). These kinematic observations indicate the presence of mature, massive disk galaxies well before they were expected to be in place (Pillepich et al. 2019; Wardlow 2021). The high rotation speeds observed in early disk galaxies sometimes exceed 250 (Neeleman et al. 2020) or even 300 km s−1 (Nestor Shachar et al. 2023; Wang et al. 2024), comparable to the most massive local spirals (Noordermeer et al. 2007; Di Teodoro et al. 2021, 2023). That such rapidly rotating galaxies exist at high redshift indicates that there is a lot of mass present, not just light. We can’t just tweak the mass-to-light ratio of the stars to explain the photometry and also explain the kinematics.

In a seminal galaxy formation paper, Mo, Mao, & White (1998) predicted that “present-day disks were assembled recently (at z ≤ 1).” Today, we see that spiral galaxies are ubiquitous in JWST images up to z ∼ 6 (Ferreira et al. 2022, 2023; Kuhn et al. 2024). The early appearance of massive, dynamically cold (Di Teodoro et al. 2016; Lelli et al. 2018, 2023; Rizzo et al. 2023) disks in the first few billion years after the Big Bang is contradictory the natural prediction of ΛCDM. Early disks are expected to be small and dynamically hot (Dekel & Burkert 2014; Zolotov et al. 2015; Krumholz et al. 2018; Pillepich et al. 2019), but they are observed to be massive and dynamically cold. (Hot or cold in this context means a high or low amplitude of the velocity dispersion relative to the rotation speed; the modern Milky Way is cold with σ ~ 20 km/s and Vc ~ 200 km/s.) Understanding the stability and longevity of dynamically cold spiral disks is foundational to the problem.

Kinematic Scaling Relations

Beyond anecdotal cases, we can check on kinematic scaling relations like Tully–Fisher. These are expected to emerge late and evolve significantly with redshift in LCDM (e.g., Glowacki et al. 2021). In MOND, the normalization of the baryonic Tully–Fisher relation is set by a0, so is immutable for all time if a0 is constant. Let’s see what the data say:

Figure 9 from McGaugh et al (2024)The baryonic Tully–Fisher (left) and dark matter fraction–surface brightness (right) relations. Local galaxy data (circles) are from Lelli et al. (2019; left) and Lelli et al. (2016; right). Higher-redshift data (squares) are from Nestor Shachar et al. (2023) in bins with equal numbers of galaxies color coded by redshift: 0.6 < z < 1.22 (blue), 1.22 < z < 2.14 (green), and 2.14 < z < 2.53 (red). Open squares with error bars illustrate the typical uncertainties. The relations known at low redshift also appear at higher redshift with no clear indication of evolution over a lookback time up to 11 Gyr.

Not much to see: the data from Nestor Shachar et al. (2023) show no clear indication of evolution. The same can be said for the dark matter fraction-surface brightness relation. (Glad to see that being plotted after I pointed it out.) The local relations are coincident with those at higher redshift for both relations within any sober assessment of the uncertainties – exactly what we measure and how matters at this level, and I’m not going to attempt to disentangle all that here. Neither am I about to attempt to assess the consistency (or lack thereof) with either LCDM or MOND; the data simply aren’t good enough for that yet. It is also not clear to me that everyone agrees on what LCDM predicts.

What I can do is check empirically how much evolution there is within the 100-galaxy data set of Nestor Shachar et al. (2023). To do that, I fit a line to their data (the left panel above) and measure the residuals: for a given rotation speed, how far is each galaxy from the expected mass? To compare this with the stellar masses discussed previously, I normalize those residuals to the same M** = 9 x 1010 M. If there is no evolution, the data will scatter around a constant value as function of redshift:

This figure reproduces the stellar mass-redshift data for L* galaxies (black points) and the monolithic (purple line) and LCDM (red and green lines) models discussed previously. The blue squares illustrate deviations of the data of Nestor Shachar et al. (2023) from the baryonic Tully-Fisher relation (dashed line, normalized to the same mass as the monolithic model). There is no indication of evolution in the baryonic Tully-Fisher relation, which was apparently established within the first few billion years after the Big Bang (z = 2.5 corresponds to a cosmic age of about 2.6 Gyr). The data are consistent with a monolithic galaxy formation model in which all the mass had been assembled into a single object early on.

The data scatter around a constant value as function of redshift: there is no perceptible evolution.

The kinematic data for rotating galaxies tells much the same story as the photometric data for galaxies in clusters. The are both consistent with a monolithic model that gathered together the bulk of the baryonic mass early on, and evolved as an island universe for most of the history of the cosmos. There is no hint of the decline in mass with redshift predicted by the LCDM simulations. Moreover, the kinematics trace mass, not just light. So while I am careful to consider the options for LCDM, I don’t know how we’re gonna get out of this one.

Empirically, it is an important observation that there is no apparent evolution in the baryonic Tully-Fisher relation out to z ~ 2.5. That’s a lookback time of ~11 Gyr, so most of cosmic history. That means that whatever physics sets the relation did so early. If the physics is MOND, this absence of evolution implies that a0 is constant. There is some wiggle room in that given all the uncertainties, but this already excludes the picture in which a0 evolves with the expansion rate through the coincidence a0 ~ cH0. That much evolution would be readily perceptible if H(z) evolves as it appears to do. In contrast, the coincidence a0 ~ c2Λ1/2 remains interesting since the cosmological constant is constant. Perhaps this is just a coincidence, or perhaps it is a hint that the anomalous acceleration of the expansion of the universe is somehow connected with the anomalous acceleration in galaxy dynamics.

Though I see no clear evidence for evolution in Tully-Fisher to date, it remains early days. For example, a very recent paper by Amvrosiadis et al. (2025) does show a hint of evolution in the sense of an offset in the normalization of the baryonic Tully-Fisher relation. This isn’t very significant, being different by less than 2σ; and again we find ourselves in a situation where we need to take a hard look at all the assumptions and population modeling and velocity measurements just to see if we’re talking about the same quantities before we even begin to assess consistency or the lack thereof. Nevertheless, it is an intriguing result. There is also another interesting anecdotal case: one of their highest redshift objects, ALESS 071.1 at z = 3.7, is also the most massive in the sample, with an estimated stellar mass of 2 x 1012 M. That is a crazy large number, comparable to or maybe larger than the entire dark matter halo of the Milky Way. It falls off the top of any of the graphs of stellar mass we discussed before. If correct, this one galaxy is an enormous problem for LCDM regardless of any other consideration. It is of course possible that this case will turn out to be wrong for some reason, so it remains early days for kinematics at high redshift.

Cluster Kinematics

It is even earlier days for cluster kinematics. First we have to find them, which was the focus of Jay Franck’s thesis. Once identified, we have to estimate their masses with the available data, which may or may not be up to the task. And of course we have to figure out what theory predicts.

LCDM makes a clear prediction for the growth of cluster mass. This work out OK at low redshift, in the sense that the cluster X-ray mass function is in good agreement with LCDM. Where the theory struggles is in the proclivity for the most massive clusters to appear sooner in cosmic history than anticipated. Like individual galaxies, they appear too big too soon. This trend persisted in Jay’s analysis, which identified candidate protoclusters at higher redshifts than expected. It also measured velocity dispersions that were consistently higher than found in simulations. That is, when Jay applied the search algorithm he used on the data to mock data from the Millennium simulation, the structures identified there had velocity dispersions on average a factor of two lower than seen in the data. That’s a big difference in terms of mass.

Figure 11 from McGaugh et al. (2024): Measured velocity dispersions of protocluster candidates (Franck & McGaugh 2016a, 2016b) as a function of redshift. Point size grows with the assessed probability that the identified overdensities correspond to a real structure: all objects are shown as small points, candidates with P > 50% are shown as light blue midsize points, and the large dark blue points meet this criterion and additionally have at least 10 spectroscopically confirmed members. The MOND mass for an equilibrium system in the low-acceleration regime is noted at right; these are comparable to cluster masses at low redshift.

At this juncture, there is no way to know if the protocluster candidates Jay identified are or will become bound structures. We made some probability estimates that can be summed up as “some are probably real, but some probably are not.” The relative probability is illustrated by the size of the points in the plot above; the big blue points are the most likely to be real clusters, having at least ten galaxies at the same place on the sky at the same redshift, all with spectroscopically measured redshifts. Here the spectra are critical; photometric redshifts typically are not accurate enough to indicate that galaxies that happen to be nearby to each other on the sky are also that close in redshift space.

The net upshot is that there are at least some good candidate clusters at high redshift, and these have higher velocity dispersions than expected in LCDM. I did the exercise of working out what the equivalent mass in MOND would be, and it is about the same as what we find for clusters at low redshift. This estimate assumes dynamical equilibrium, which is very far from guaranteed. But the time at which these structures appear is consistent with the timescale for cluster formation in MOND (a couple Gyr; z ~ 3), so maybe? Certainly there shouldn’t be lots of massive clusters in LCDM at z ~ 3.

Kinematic Takeaways

While it remains early days for kinematic observations at high redshift, so far these data do nothing to contradict the obvious interpretation of the photometric data. There are mature, dynamically cold, fast rotating spiral galaxies in the early universe that were predicted not to be there by LCDM. Moreover, kinematics traces mass, not just light, so all the wriggling we might try to explain the latter doesn’t help with the former. The most obvious interpretation of the kinematic data to date is the same as that for the photometric data: galaxies formed early and grew massive quickly, as predicted a priori by MOND.


*The papers I write that cover both theories always seem to wind up lopsided in favor of LCDM in terms of the bulk of their content. That happens because it takes many pages to discuss all the ins and outs. In contrast, MOND just gets it right the first time, so that section is short: there’s not much more to say than “Yep, that’s what it predicted.”

+I’ve yet not heard directly any criticisms of our paper. The criticisms that I’ve heard second or third hand so far almost all fall in the category of things we explicitly discussed. That’s a pretty clear tell that the person leveling the critique hasn’t bothered to read it. I don’t expect everyone to agree with our take on this or that, but a competent critic would at least evince awareness that we had addressed their concern, even if not to their satisfaction. We rarely seem to reach that level: it is much easier to libel and slander than engage with the issues.

The one complaint I’ve heard so far that doesn’t fall in the category of things-we-already-discussed is that we didn’t do hydrodynamic simulations of star formation in molecular gas. That is a red herring. To predict the growth of stellar mass, all we need is a prescription for assembling mass and converting baryons into stars; this is essentially a bookkeeping exercise that can be done analytically. If this were a serious concern, it should be noted that most cosmological hydro-simulations also fail to meet this standard: they don’t resolve star formation, so they typically adopt some semi-empirical (i.e., data-informed) bookkeeping prescription for this “subgrid physics.”

Though I have not myself attempted to numerically simulate galaxy formation in MOND, Sanders (2008) did. More recently, Eappen et al. (2022) have done so, including molecular gas and feedback$ and everything. They find a star formation history compatible with the analytic models we discuss in our paper.

$Related detail: Eappen et al find that different feedback schemes make little difference to the end result. The deus ex machina invoked to solve all problems in LCDM is largely irrelevant in MOND. There’s a good physical reason for this: gravity in MOND is sourced by what you see; how it came to have its observed distribution is irrelevant. If 90% of the baryons are swept entirely out of the galaxy by some intense galactic wind, then they’re gone BYE BYE and don’t matter any more. In contrast, that is one of the scenarios sometimes invoked to form cores in dark matter halos that are initially cuspy: the departure of all those baryons perturbs the orbits of the dark matter particles and rearranges the structure of the halo. While that might work to alter halo structure, how it results in MOND-like phenomenology has never been satisfactorily explained. Mostly that is not seen as even necessary; converting cusp to core is close enough!


&Though we typically associate the observed outer velocity with halo mass, an important caveat is that the radius also matters: M ~ RV2, and most data for high redshift galaxies do not extend very far out in radius. Nevertheless, it takes a lot of mass to make rotation speeds of order 200 km/s within a few kpc, so it hardly matters if this is or is not representative of the dark matter halo: if it is all stars, then the kinematics directly corroborate the interpretation of the photometric data that the stellar mass is large. If it is representative of the dark matter halo, then we expect the halo radius to scale with the halo velocity (R200 ~ V200) so M200 ~ V2003 and again it appears that there is too much mass in place too early.

The fault in our stars: blame them, not the dark matter!

The fault in our stars: blame them, not the dark matter!

As discussed in recent posts, the appearance of massive galaxies in the early universe was predicted a priori by MOND (Sanders 1998, Sanders 2008, Eappen et al. 2022). This is problematic for LCDM. How problematic? That’s always the rub.

The data follow the evolutionary track of a monolithic model (purple line) rather than the track of the largest progenitor predicted by hierarchical LCDM (dotted lines leading to different final masses).

The problem that JWST observations pose for LCDM is that there is a population of galaxies in the high redshift universe that appear to evolve as giant monoliths rather than assembling hierarchically. Put that way, it is a fatal flaw: hierarchical assembly of mass is fundamental to the paradigm. But we don’t observe mass, we observe light. So the obvious “fix” is to adjust the mapping of observed light to predicted dark halo mass in order to match the observations. How plausible is this?

Merger trees from the Illustris-TNG50 simulation showing the hierarchical assembly of L* galaxies. The dotted lines in the preceding plot show the stellar mass growth of the largest progenitor, which is on the left of each merger tree. All progenitors were predicted to be tiny at z > 3, well short of what we observe.

Before trying to wriggle out of the basic result, note that doing so is not plausible from the outset. We need to make the curve of growth of the largest progenitors “look like” the monolithic model. They shouldn’t, by construction, so everything that follows is a fudge to try to avoid the obvious conclusion. But this sort of fudging has been done so many times before in so many ways (the “Frenk Principle” was coined nearly thirty years ago) that many scientists in the field have known nothing else. They seem to think that this is how science is supposed to work. This in turn feeds a convenient attitude that evades the duty to acknowledge that a theory is in trouble when it persistently has to be adjusted to make itself look like a competitor.

That noted, let’s wriggle!

Observational dodges

The first dodge is denial: somehow the JWST data are wrong or misleading. Early on, there were plausible concerns about the validity of some (some) photometric redshifts. There are enough spectroscopic redshifts now that this point is moot.

A related concern is that we “got lucky” with where we pointed JWST to start with, and the results so far are not typical of the universe at large. This is not quite as crazy as it sounds: the field of view of JWST is tiny, so there is no guarantee that the first snapshot will be representative. Moreover, a number of the first pointings intentionally targeted rich fields containing massive clusters, i.e., regions known to be atypical. However, as observations have accumulated, I have seen no indications of a reversal of our first impression, but rather lots of corroboration. So this hedge also now borders on reality denial.

A third observational concern that we worried a lot about in Franck & McGaugh (2017) is contamination by active galactic nuclei (AGN). Luminosity produced by accretion onto supermassive black holes (e.g., quasars) was more common in the early universe. Perhaps some of the light we are attributing to stars is actually produced by AGN. That’s a real concern, but long story short, AGN contamination isn’t enough to explain everything else away. Indeed, the AGN themselves are a problem in their own right: how do we make the supermassive black holes that power AGN so rapidly that they appear already in the early universe? Like the galaxies they inhabit, the black holes that power AGN should take a long time to assemble in the absence of the heavy seeds naturally provided by MOND but not dark matter.

An evergreen concern in astronomy is extinction by dust. Dust could play a role (Ferrara et al. 2023), but this would be a weird effect for it to have. Dust is made by stars, so we naively expect it to build up along with them. In order to explain high redshift JWST data with dust we have to do the opposite: make a lot of dust very early without a lot of stars, then eject it systematically from galaxies so that the net extinction declines with time – a galactic reveal sort of like a cosmic version of the dance of the seven veils. The rate of ejection for all galaxies must necessarily be fine-tuned to balance the barely evolving UV luminosity function with the rapidly evolving dark matter halo mass function. This evolution of the extinction has to coordinate with the dark matter evolution over a rather small window of cosmic time, there being only ∼108 yr between z = 14 and 11. This seems like an implausible way to explain an unchanging luminosity density, which is more naturally explained by simply having stars form and be there for their natural lifetimes.

Figure 5 from McGaugh et al. (2024): The UV luminosity function (left) observed by Donnan et al. (2024; points) compared to that predicted for ΛCDM by Yung et al. (2023; lines) as a function of redshift. Lines and points are color coded by redshift, with dark blue, light blue, green, orange, and red corresponding to z = 9, 10, 11, 12, and 14, respectively. There is a clear excess in the number density of galaxies that becomes more pronounced with redshift, ranging from a factor of ∼2 at z = 9 to an order of magnitude at z ≥ 11 (right). This excess occurs because the predicted number of sources declines with redshift while the observed numbers remain nearly constant with the data at z = 9, 10, and 11being right on top of each other.

The basic observation is that there is too much UV light produced by galaxies at all redshifts z > 9. What we’d rather have is the stellar mass function. JWST was designed to see optical light at the redshift of galaxy formation, but the universe surprised us and formed so many stars so early that we are stuck making inferences with the UV anyway. The relation of UV light to mass is dodgy, providing a knob to twist. So up next is the physics of light production.

In our discussion to this point, we have assumed that we know how to compute the luminosity evolution of a stellar population given a prescription for its star formation history. This is no small feat. This subject has a rich history with plenty of ups and downs, like most of astronomy. I’m not going to attempt to review all that here. I think we have this figured out well enough to do what we need to do for the purposes of our discussion here, but there are some obvious knobs to turn, so let’s turn ’em.

Blame the stars!

As noted above, we predict mass but observe light. So the program now is to squeeze more light out of less mass. Early dark matter halos too small? No problem; just make them brighter. More specifically, we need to make models in which the small dark matter halos that form first are better at producing photons from the small amount of baryons that they possess than are their low-redshift descendants. We have observational constraints on the latter; local star formation is inefficient, but maybe that wasn’t always the case. So the first obvious thing to try is to make star formation more efficient.

Super Efficient Star Formation

First, note that stellar populations evolve pretty much as we expect for stars, so this is a bit tricky. We have to retain the evolution we understand well for most of cosmic time while giving a big boost at early times. One way to do that is to have two distinct modes of star formation: the one we think of as normal that persists to this day, and an additional mode of super-efficient star formation (SEFS) at play in the early universe. This way we retain the usual results while potentially giving us the extra boost that we need to explain the JWST data. We argue that this is the least implausible path to preserving LCDM. We’re trying to make it work, and anticipate the arguments Dr. Z would make.

This SESF mode of star formation needs to be very efficient indeed, as there are galaxies that appear to have converted essentially all of their available baryons into stars. Let’s pause to observe that this is pretty silly. Space is very empty; it is hard to get enough mass together to form stars at all: there’s good reason that it is inefficient locally! The early universe is a bit denser by virtue of being smaller; at z = 9 the expansion factor is only 1/(1+z) = 0.1 of what it is now, so the density is (1+z)3 = 1,000 times greater. ON AVERAGE. That’s not really a big boost when it comes to forming structures like stars since the initial condition was extraordinarily uniform. The lack of early structure by far outweighs the difference in density; that is precisely why we’re having a problem. Still, I can at least imagine that there are regions that experience a cascade of violent relaxation and SESF once some threshold in gas density is exceeded that differentiates the normal model of star formation from SESF. Why a threshold in the gas? Because there’s not anything obvious in the dark matter picture to distinguish the galaxies that result from one or the other mode. CDM itself is scale free, after all, so we have to imagine a scale set by baryons that funnels protogalaxies into one mode or the other. Why, physically, is there a particular gas density that makes that happen? That’s a great question.

There have been observational indications that local star formation is related to a gas surface density threshold, so maybe there’s another threshold that kicks it up another notch. That’s just a plausibility argument, but that’s the straw I’m clutching at to justify SESF as the least implausible option. We know there’s at least one way in which a surface density scale might matter to star formation.

Writing out the (1+z)3 argument for the density above tickled the memory that I’d seen something similar claimed elsewhere. Looking it up, indeed Boylan-Kolchin (2024) does this, getting an extra (1+z)3 [for a total of (1+z)6] by invoking a surface density Σ that follows from an acceleration scale g: Σ=g/(πG). Very MONDish, that. At any rate, the extra boost is claimed to lift a corner of dark matter halo parameter space into the realm of viability. So, sure. Why not make that step two.

However we do it, making stars super-efficiently is what the data appear to require – if we confine our consideration to the mass predicted by LCDM. It’s a way of covering the lack of mass with an surplus of stars. Any mechanism that makes stars more efficiently will boost the dotted lines in the M*-z diagram above in the right direction. Do they map into the data (and the monolithic model) as needed? Unclear! All we’ve done so far is offer plausibility arguments that maybe it could be so, not demonstrate a model that works without fine-tuning that woulda coulda shoulda made the right prediction in the first place.

The ideas become less plausible from here.

Blame the IMF!

The next obvious idea after making more stars in total is to just make more of the high mass stars that produce UV photons. The IMF is a classic boogeyman to accomplish this. I discussed this briefly before, and it came up in a related discussion in which it was suggested that “in the end what will probably happen is that the IMF will be found to be highly redshift dependent.”

OK, so, first, what is the IMF? The Initial Mass Function is the spectrum of masses with which stars form: how many stars of each mass, ranging from the brown dwarf limit (0.08 M) to the most massive stars formed (around 100 M). The numbers of stars formed in any star forming event is a strong function of mass: low mass stars are common, high mass stars are rare. Here, though, is the rub: integrating over the whole population, low mass stars contain most of the mass, but high mass stars produce most of the light. This makes the conversion of mass to light quite sensitive to the IMF.

The number of UV photons produced by a stellar population is especially sensitive to the IMF as only the most massive and short-lived O and B stars produce them. This is low-hanging fruit for the desperate theorist: just a few more of those UV-bright, short-lived stars, please! If we adjust the IMF to produce more of these high mass stars, then they crank out lots more UV photons (which goes in the direction we need) but they don’t contribute much to the total mass. Better yet, they don’t live long. They’re like icicles as murder weapons in mystery stories: they do their damage then melt away, leaving no further evidence. (Strictly speaking that’s not true: they leave corpses in the form of neutron stars or stellar mass black holes, but those are practically invisible. They also explode as supernovae, boosting the production of metals, but the amount is uncertain enough to get away with murder.)

There is a good plausibility argument for a variable IMF. To form a star, gravity has to overcome gas pressure to induce collapse. Gas pressure depends on temperature, and interstellar gas can cool more efficiently when it contains some metals (here I mean metals in the astronomy sense, which is everything in the periodic table that’s not hydrogen or helium). It doesn’t take much; a little oxygen (one of the first products of supernova explosions) goes a long way to make cooling more efficient than a primordial gas composed of only hydrogen and helium. Consequently, low metallicity regions have higher gas temperatures, so it makes sense that gas clouds would need more gravity to collapse, leading to higher mass stars. The early universe started with zero metals, and it takes time for stars to make them and to return them to the interstellar medium, so voila: metallicity varies with time so the IMF varies with redshift.

This sound physical argument is simple enough to make that it can be done in a small part of a blog post. This has helped it persist in our collective astronomical awareness for many decades. Unfortunately, it appears to have bugger-all to do with reality.

If metalliticy plays a strong role in determining the IMF, we would expect to see it in stellar populations of different metallicity. We measure the IMF for solar metallicity stars in the solar neighborhood. Globular clusters are composed of stars formed shortly after the Big Bang and have low metallicities. So following this line of argument, we anticipate that they would have a different IMF. There is no evidence that this is the case. Still, we only really need to tweak the high-mass end of the IMF, and those stars died a long time ago, so maybe this argument applies for them if not for the long-lived, low-mass stars that we observe today.

In addition to counting individual stars, we can get a constraint on the galaxy-wide average IMF from the scatter in the Tully-Fisher relation. The physical relation depends on mass, but we rely on light to trace that. So if the IMF varies wildly from galaxy to galaxy, it will induce scatter in Tully-Fisher. This is not observed; the amount of intrinsic scatter that we see is consistent with that expected for stochastic variations in the star formation history for a fixed IMF. That’s a pretty strong constraint, as it doesn’t take much variation in the IMF to cause a lot of scatter that we don’t see. This constraint applies to entire galaxies, so it tolerates variations in the IMF in individual star forming events, but whatever is setting the IMF apparently tends to the same result when averaged over the many star forming events it takes to build a galaxy.

Variation in the IMF has come up repeatedly over the years because it provides so much convenient flexibility. Early in my career, it was commonly invoked to explain the variation in spectral hardness with metallicity. If one looks at the spectra of HII regions (interstellar gas ionized by hot young stars), there is a trend for lower metallicity HII regions to be ionized by hotter stars. The argument above was invoked: clearly the IMF tended to have more high mass stars in low metallicity environments. However, the light emitted by stars also depends on metallicity; low metallicity stars are bluer than their high metallicity equivalents because there are few UV absorption lines from iron in their atmospheres. Taking care to treat the stars and interstellar gas self-consistentlty and integrating over a fixed IMF, I showed that the observed variation in spectral hardness was entirely explained by the variation in metallicity. There didn’t need to be more high mass stars in low metallicity regions, the stars were just hotter because that’s what happens in low metallicity stars. (I didn’t set out to do this; I was just trying to calibrate an abundance indicator that I would need for my thesis.)

Another example where excess high mass stars were invoked was to explain the apparently high optical depth to the surface of last scattering reported by WMAP. If those words don’t mean anything to you, don’t worry – all it means is that a couple of decades ago, we thought we needed lots more UV photons at high redshift (z ~ 17) than CDM naturally provided. The solution was, you guessed it, an IMF rich in high mass stars. Indeed, this result launched a thousand papers on supermassive Population III stars that didn’t pan out for reasons that were easily anticipated at the time. Nowadays, analysis to the Planck data suggest a much lower optical depth than initially inferred by WMAP, but JWST is observing too many UV photons at high redshift to remain consistent with Plank. This apparent tension for LCDM is a natural consequence of early structure formation in MOND; indeed, it is another thing that was specifically predicted (see section 3.1 of McGaugh 2004).

I relate all these stories of encounters with variations in the high mass end of the IMF because they’ve never once panned out. Maybe this time will be different.

Stochastic Star Formation

What else can we think up? There’s always another possibility. It’s a big universe, after all.

One suggestion I haven’t discussed yet is that high redshift galaxies appear overly bright from stochastic fluctuations in their early star formation. This again invokes the dubious relation between stellar mass and UV light, but in a more subtle way than simply stocking the IMF with a bunch more high mass stars. Instead, it notes that the instantaneous star formation rate is stochastic. The massive stars that produces all the UV light are short-lived, so the number present will fluctuate up and down. Over time, this averages out, but there hasn’t been much time yet in the early universe. So maybe the high redshift galaxies that seem to be over-luminous are just those that happen to be near a peak in the ups and downs of star formation. Galaxies will be brightest and most noticeable in this peak phase, so the real mass is less than it appears – albeit there must be a lot of galaxies in the off phase for every one that we see in the on phase.

One expects a lot of scatter in the inferred stellar mass in the early universe due to stochastic variations in the star formation rate. As time goes on, these average out and the inferred stellar mass becomes steady. That’s pretty much what is observed (data). The data track the monolithic model (purple line) and sometimes exceed it in the early, stochastic phase. The data bear no resemblance to hierarchical LCDM (orange line).

This makes a lot of sense to me. Indeed, it should happen at some level, especially in the chaotic early universe. It is also what I infer to be going on to explain why some measurements scatter above the monolithic line. That is the baseline star formation history for this population, with some scatter up and down at early times. Simply scattering from the orange LCDM line isn’t going to look like the purple monolithic line. The shape is wrong and the amplitude difference is too great to overcome in this fashion.

What else?

I’m sure we’ll come up with something, but I think I’ve covered everything I’ve heard so far. Indeed, most of these possibilities are obvious enough that I thought them up myself and wrote about them in McGaugh et al (2024). I don’t see anything in the wide-ranging discussion at KITP that wasn’t already in my paper.

I note this because I want to point out that we are following a well-worn script. This is the part where I tick off all the possibilities for more complicated LCDM models and point out their shortcomings. I expect the same response:

That’s too long to read. Dr. Z says it works, so he must be right since we already know that LCDM is correct.

Triton Station, 8 February 2022

People will argue about which of these auxiliary hypotheses is preferable. MOND is not an auxiliary hypothesis, but an entirely different paradigm, so it won’t be part of the discussion. After some debate, one of the auxiliaries (SESF not IMF!) will be adopted as the “standard” picture. This will be repeated until it becomes familiar, and once it is familiar it will seem that it was always so, and then people will assert that there was never a problem, indeed, that we expected it all along. This self-gaslighting reminds me of Feynman’s warning:

The first principle is that you must not fool yourself and you are the easiest person to fool.

Richard Feynman

What is persistently lacking in the community is any willingness to acknowledge, let alone engage with, the deeper question of why we have to keep invoking ad hoc patches to somehow match what MOND correctly predicted a priori. The sociology of invoking arbitrary auxiliary hypotheses to make these sorts of excuses for LCDM has been so consistently on display for so long that I wrote this parody a year ago:


It always seems to come down to special pleading:

Please don’t falsify LCDM! I ran out of computer time. I had a disk crash. I didn’t have a grant for supercomputer time. My simulation data didn’t come back from the processing center. A senior colleague insisted on a rewrite. Someone stole my laptop. There was an earthquake, a terrible flood, locusts! It wasn’t my fault! I swear to God!

And the community loves LCDM, so we fall for it every time.

Oh, LCDM. LCDM, honey.

PS – to appreciate the paraphrased quotes here, you need to hear it as it would be spoken by the pictured actors. So if you do not instantly recognize this scene from the Blues Brothers, you need to correct this shortcoming in your cultural education to get the full effect of the reference.

Old galaxies in the early universe

Old galaxies in the early universe

Continuing our discussion of galaxy formation and evolution in the age of JWST, we saw previously that there appears to be a population of galaxies that grew rapidly in the early universe, attaining stellar masses like those expected in a traditional monolithic model for a giant elliptical galaxy rather than a conventional hierarchical model that builds up gradually through many mergers. The formation of galaxies at incredibly high redshift, z > 10, implies the existence of a descendant population at intermediate redshift, 3 < z < 4, at which point they should have mature stellar populations. These galaxies should not only be massive, they should also have the spectral characteristics of old stellar populations – old, at least, for how old the universe itself is at this point.

Theoretical predictions from Fig. 1 of McGaugh et al (2024) combined with the data of Fig. 4. The data follow the track of a monolithic model that forms early as a single galaxy rather than that of the largest progenitor of the hierarchical build-up expected in LCDM.

The data follow the track of stellar mass growth for an early-forming monolithic model. Do the ages of stars also look like that?

Here is a recent JWST spectrum published by de Graff et al. (2024). This appeared too recently for us to have cited in our paper, but it is a great example of what we’re talking about. This is an incredibly gorgeous spectrum of a galaxy at z = 4.9 when the universe was 1.2 Gyr old.

Fig. 1 from de Graff et al. (2024): JWST/NIRSpec PRISM spectrum (black line) of the massive quiescent galaxy RUBIES-EGS-QG-1 at a redshift of z = 4.8976.

It is challenging to refrain from nerding out at great length over many of the details on display here. First, it is an incredible technical achievement. I’ve seen worse spectra of local galaxies. JWST was built to obtain images and spectra of galaxies so distant they approach the horizon of the observable universe. Its cameras are sensitive to the infrared part of the spectrum in order to capture familiar optical features that have been redshifted by a huge factor (compare the upper and lower x-axes). The telescope itself was launched into space well beyond the obscuring atmosphere of the earth, pointed precisely at a tiny, faint flicker of light in a vast, empty universe, captured photons that had been traveling for billions of years, and transmitted the data to Earth. That this is possible, and works, is an amazing feat of science, engineering, and societal commitment (it wasn’t exactly cheap).

In the raw 2D spectrum (at top) I can see by eye the basic features in the extracted, 1D spectrum (bottom). This is a useful and convincing reality check to an experienced observer even if at first glance it looks like a bug splot smeared by a windshield wiper. The essential result is apparent to the eye; the subsequent analysis simply fills in the precise numbers.

Looking from right to left, the spectrum runs from red to blue. It ramps up then crashes down around an observed wavelength of 2.3 microns. This is the 4000 Å break in the rest frame, a prominent feature of aging stellar populations. The amount of blue-to-red ramp-up and the subsequent depth of drop is a powerful diagnostic of stellar age.

In addition to the 4000 Å break, a number of prominent spectral lines are apparent. In particular, the Balmer absorption lines Hβ, Hγ, and Hδ are clear and deep. These are produced by A stars, which dominate the light of a stellar population after a few hundred million years. There’s the answer right there: the universe is only 1.2 Gyr old at this point, and the stars dominating the light aren’t much younger.

There are also some emission lines. These can be the sign of on-going star formation or an active galactic nucleus powered by a supermassive black hole. The authors attribute these to the latter, inferring that the star formation happened fast and furious early on, then basically stopped. That’s important to the rest of the spectrum; A stars only dominate for a while, and their lines are not so prominent if a population keeps making new stars. So this galaxy made a lot of stars, made them fast, then basically stopped. That is exactly the classical picture of a monolithic giant elliptical.

Here is the star formation history that de Graff et al. (2024) infer:

Fig. 2 from de Graff et al. (2024): the star formation rate (top) and accumulated stellar mass (bottom) as a function of cosmic time (only the first 1.2 Gyr are shown). Results for stellar populations of two metallicities are shown (purple or blue lines). This affects the timing of the onset of star formation, but once going, an enormous mass of stars forms fast, in ~200 Myr.

There are all sorts of caveats about population modeling, but it is very hard to avoid the basic conclusion that lots of stars were assembled with incredible speed. A stellar mass a bit in excess of that of the Milky Way appears in the time it takes for the sun to orbit once. That number need not be exactly right to see that this is not a the gradual, linear, hierarchical assembly predicted by LCDM. The typical galaxy in LCDM is predicted to take ~7 Gyr to assemble half its stellar mass, not 0.1 Gyr. It’s as if the entire mass collapsed rapidly and experienced an intense burst of star formation during violent relaxation (Lynden-Bell 1967).

Collapse of shells within shells to form a massive galaxy rapidly in MOND (Sanders 2008). Note that the inner shells (inset) where most of the stars will be collapse even more rapidly than the overall monolith (dotted line).

Where MOND provides a natural explanation for this observation, the fiducial population model of de Graff et al. violates the LCDM baryon limit: there are more stars than there are baryons to make them from. It should be impossible to veer into the orange region above as the inferred star formation history does. The obvious solution is to adopt a higher metallicity (the blue model) even if that is a worse fit to the spectrum. Indeed, I find it hard to believe that so many stars could be made in such a small region of space without drastically increasing their metallicity, so there are surely things still to be worked out. But before we engage in too much excuse-making for the standard model, note that the orange region represents a double-impossibility. First, the star formation efficiency is 100%. Second, this is for an exceptionally rare, massive dark matter halo. The chances of spotting such an object in the area so far surveyed by JWST is small. So we not only need to convert all the baryons into stars, we also need to luck into seeing it happen in a halo so massive that it probably shouldn’t be there. And in the strictist reading, there still aren’t enough baryons. Does that look right to you?

Do these colors look right to you? Getting the color right is what stellar population modeling is all about.

OK, so I got carried away nerding out about this one object. There are other examples. Indeed, there are enough now to call them a population of old and massive quiescent galaxies at 3 < z < 4. These have the properties expected for the descendants of massive galaxies that form at z > 10.

Nanayakkara et al. (2024) model spectra for a dozen such galaxies. The spectra provide an estimate of the stellar mass at the redshift of observation. They also imply a star formation history from which we can estimate the age/redshift at which the galaxy had formed half of those stars, and when it quenched (stopped forming stars, or in practice here, when the 90% mark had been reached). There are, of course, large uncertainties in the modeling, but it is again hard to avoid the conclusion that lots of stars were formed early.

Figure 7 from McGaugh et al. (2024): The stellar masses of quiescent galaxies from Nanayakkara et al. (2024). The inferred growth of stellar mass is shown for several cases, marking the time when half the stars were present (small green circles) to the quenching time when 90% of the stars were present (midsize orange circles) to the epoch of observation (large red circles). Illustrative star formation histories are shown as dotted lines with the time of formation ti and the quenching timescale τ noted in Gyr. We omit the remaining lines for clarity, as many cross. There is a wide distribution of formation times from very early (ti = 0.2 Gyr) to relatively late (>1 Gyr), but all of the galaxies in this sample are inferred to build their stellar mass rapidly and quench early (τ < 0.5 Gyr).

The dotted lines above are models I constructed in the spirit of monolithic models. The particular details aren’t important, but the inferred timescales are. To put galaxies in this part of the stellar mass-redshift plane, they have to start forming early (typically in the first billion years), form stars at a prolific rate, then quench rapidly (typically with e-folding timescales < 1 Gyr). I wouldn’t say any of these numbers are particularly well-measured, but they are indicative.

What is missing from this plot is the LCDM prediction. That’s not because I omitted it, it’s because the prediction for typical L* galaxies doesn’t fall within the plot limits. LCDM does not predict that typical galaxies should become this massive this early. I emphasize typical because there is always scatter, and some galaxies will grow ahead of the typical rate.

Not only are the observed galaxies massive, they have mature stellar populations that are pretty much done forming stars. This will sound normal to anyone who has studied the stellar populations of giant elliptical galaxies. But what does LCDM predict?

I searched through the Illustris TNG50 and TNG300 simulations for objects at redshift 3 that had stellar masses in the same range as the galaxies observed by Nanayakkara et al. (2024). The choice of z = 3 is constrained by the simulation output, which comes in increments of the expansion factor. To compare to real galaxies at 3 < z < 4 one can either look at the snapshot at z = 4 or the one at z = 3. I chose z = 3 to be conservative; this gives the simulation the maximum amount of time to produce quenched, massive galaxies.

These simulations do indeed produce some objects of the appropriate stellar mass. These are rare, as they are early adopters: galaxies that got big quicker than is typical. However, they are not quenched as observed: the simulated objects are still on the star forming main sequence (the correlation between star formation rate and stellar mass). The distribution of simulated objects does not appear to encompass that of real galaxies.

Figure 8 from McGaugh et al. (2024): The stellar masses and star formation rates of galaxies from Nanayakkara et al. (2024; red symbols). Downward-pointing triangles are upper limits; some of these fall well below the edge of the plot and so are illustrated as the line of points along the bottom. Also shown are objects selected from the TNG50 (Pillepich et al. 2019; filled squares) and TNG300 (Pillepich et al. 2018; open squares) simulations at z = 3 to cover the same range of stellar mass. Unlike the observed galaxies, simulated objects with stellar masses comparable to real galaxies are mostly forming stars at a rapid pace. In the higher-resolution TNG50, none have quenched as observed.

If we want to hedge, we can note that TNG300 has a few objects that are kinda in the right ballpark. That’s a bit misleading, as the data are mostly upper limits. Moreover, these are the rare objects among a set of objects selected to be rare: it isn’t a resounding success if we have to scrape the bottom of the simulated barrel after cherry-picking which barrel. Worse, these few semi-quenched simulated objects are not present in TNG50. TNG50 is the higher resolution simulation, so presumably provides a better handle on the star formation in individual objects. It is conceivable that TNG300 “wins” by virtue of its larger volume, but that’s just saying we have more space in which to discover very rare entities. The prediction is that massive, quenched galaxies should be exceedingly rare, but in the real universe they seem mundane.

That said, I don’t think this problem is fundamental. Hierarchical assembly is still ongoing at this epoch, bringing with it merger-induced star formation. There’s an easy fix for that: change the star formation prescription. Instead of “wet” mergers with gas that can turn into stars, we just need to form all the stars already early on so that the subsequent mergers are “dry” – at least, for those mergers that build this particular population. One winds up needing a new and different mode of star formation. In addition to what we observe locally, there needs to be a separate mode of super-efficient star formation that somehow turns all of the available baryons into stars as soon as possible. That’s basically what I advocate as the least unreasonable possibility for LCDM in our paper. This is a necessary but not sufficient condition; these early stellar nuggets also need to assemble speedy quick to make really big galaxies. While it is straightforward to mess with the star formation prescription in models (if not in nature), the merger trees dictating the assembly history are less flexible.

Putting all the data together in a single figure, we can get a sense for the evolutionary trajectory of the growth of stellar mass in galaxies across cosmic time. This figure extends from the earliest galaxies so-far known at z ~ 14 when the universe was just a few hundred million years old (of order on orbital time in a mature galaxy) to the present over thirteen billion years later. In addition to data discussed previously, it also shows recent data with spectroscopic redshifts from JWST. This is important, as the sense of the figure doesn’t change if we throw away all the photometric redshifts, it just gets a little sparse around z ~ 8.

Figure 10 from McGaugh et al. (2024): The data from Figures 4 and 6 shown together using the same symbols. Additional JWST data with spectroscopic redshifts are shown from Xiao et al. (2023; green triangles) and Carnall et al. (2024). The data of Carnall et al. (2024) distinguish between star-forming galaxies (small blue circles) and quiescent galaxies (red squares); the latter are in good agreement with the typical stellar mass determined from Schechter fits in clusters (large circles). The dashed red lines show the median growth predicted by the Illustris ΛCDM simulation (Rodriguez-Gomez et al. 2016) for model galaxies that reach final stellar masses of M* = 1010, 1011, and 1012 M. The solid lines show monolithic models with a final stellar mass of 9 x 1010 M and ti = τ = 0.3, 0.4, and 0.5 Gyr, as might be appropriate for giant elliptical galaxies. The dotted line shows a model appropriate to a monolithic spiral galaxy with ti = 0.5 and τ = 13.5 Gyr.

The solid lines are monolithic models we built to represent classical giant elliptical galaxies that form early and quench rapidly. These capture nicely the upper envelope of the data. They form most of their stars at z > 4, producing appropriately old populations at lower redshifts. The individual galaxy data merge smoothly into those for typical galaxies in clusters.

The LCDM prediction as represented by the Illustris suite of simulations is shown as the dashed red lines for objects of several final masses. These are nearly linear in log(M*)-linear z space. Objects that end up with a typical L* elliptical galaxy mass at z = 0 deviate from the data almost immediately at z > 1. They disappear above z > 6 as the largest progenitors become tiny.

What can we do to fix this? Massive galaxies get a head start, as it were, by being massive at all epochs. But the shape of the evolutionary trajectory remains wrong. The top red line (for a final stellar masses of 1012 M) corresponds to a typical galaxy at z ~ 2, but it continues to grow to be atypical locally. The data don’t do that. Even with this boost, the largest progenitor is still predicted to be too small at z > 3 where there are now many examples of massive, quiescent galaxies – known both from JWST observations and from Jay Franck’s thesis before it. Again, the distribution of the data do not look like the predictions of LCDM.

One can abandon Illustris as the exemplar of LCDM, but it doesn’t really help. Other models show similar things, differing only in minor details. That’s because the issue is the mass assembly history they all share, not the details of the star formation. The challenge now is to tweak models to make them look more monolithic; i.e., change those red dashed lines into the solid black lines. One will need super-efficient star formation, if it is even possible. I’ll leave discussion of this and other obvious fudges to a future post.

Finally, note that there are a bunch of galaxies with JWST spectroscopic redshifts from 3 < z < 4 that are not exceptionally high mass (the small blue points). These are expected in any paradigm. They can be galaxies that are intrinsically low mass and won’t grow much further, or galaxies that may still grow a lot, just with a longer fuse on their star formation timescale. Such objects are ubiquitous in the local universe as spiral and irregular galaxies. Their location in the diagram above is consistent with the LCDM predictions, but is also readily explained by monolithic models with long star formation timescales. The dotted line shows a monolithic model that forms early (ti = 0.5) but converts gas into stars gradually (τ = 13.5 Gyr rather than < 1 Gyr). This is a boilerplate model for a spiral that has been around for as long as the short-τ model for giant ellipticals. So while these lower mass galaxies exist, their location in the M*-z plane doesn’t really add much to this discussion as yet. It is the massive galaxies that form early and become quiescent rapidly that most challenge LCDM.

Measuring the growth of the stellar mass of galaxies over cosmic time

Measuring the growth of the stellar mass of galaxies over cosmic time

This post continues the series summarizing our ApJ paper on high redshift galaxies. To keep it finite, I will focus here on the growth of stellar mass. The earlier post discussed what we expect in theory. This depends both on mass assembly (slow in LCDM, fast in MOND), how the assembled mass is converted into stars, and how those stars shine in light we can detect. We know a lot about stars and their evolution, so for this post I will assume we know how to convert a given star formation history into the evolution of the light it produces. There are of course caveats to that which we discuss in the paper, and perhaps will get to in a future post. It’s exhausting to be exhaustive, so not today, Satan.

The principle assumption we are obliged to make, at least to start, is that light traces mass. As mass assembles, some of it turns into stars, and those stars produce light. The astrophysics of stars and the light they produce is the same in any structure formation theory, so with this basic assumption, we can test the build-up of mass. In another post we will discuss some of the ways in which we might break this obvious assumption in order to save a favored theory. For now, we assume the obvious assumption holds, and what we see at high redshift provides a picture of how mass assembles.

Before JWST

This is not a new project; people have been doing it fo for decades. We like to think in terms of individual galaxies, but there are lots out there, so an important concept is the luminosity function, which describes the number of galaxies as a function of how bright they are. Here are some examples:

Figure 3. from Franck & McGaugh (2017) showing the number of galaxies as a function of their brightness in the 4.5 micron band of the Spitzer Space Telescope in candidate protoclusters from z = 2 to 6. Each panel notes the number of galaxies contributing to the Schechter luminosity function+ fit (gray bands), the apparent magnitude m* corresponding to the typical luminosity L*, and the redshift range. The magnitude m* is characteristic of how bright typical galaxies are at each redshift.

One reason to construct these luminosity functions is to quantify what is typical. Hundreds of galaxies inform each fit. The luminosity L* is representative of the typical galaxy, not just anecdotal individual examples. At each redshift, L* corresponds to an observed apparent magnitude m*, which we plot here:

Figure 3 from McGaugh et al. (2024)The redshift dependence of the Spitzer [4.5] apparent magnitude m* of Schechter function fits to populations of galaxies in clusters and candidate protoclusters; each point represents the characteristic brightness of the galaxies in each cluster. The apparent brightness of galaxies gets fainter with increasing redshift because galaxies are more distant, with the amount they dim depending also on their evolution (lines). The purple line is the monolithic exponential model we discussed last time. The orange line is the prediction of the Millennium simulation (the state of the art at the time Jay Franck wrote his thesis) and the Munich galaxy formation model based on it. The open squares are the result of applying the same algorithm to the simulation as used on the data; this is what we would have observed if the universe looked like LCDM as depicted by the Munich model. The real universe does not look like that.

We plot faint to bright going up the y-axis; the numbers get smaller because of the backwards definition of the magnitude scale (which dates to ancient times in which the stars that appeared brightest to the human eye were “of the first magnitude,” then the next brightest of the second magnitude, and so on). The x-axis shows redshift. The top axis shows the corresponding age of the universe for vanilla LCDM parameters. Each point shows the apparent magnitude that is typical as informed by observations of dozens to hundreds of individual galaxies. Each galaxy has a spectroscopic redshift, which we made a requirement for inclusion in the sample. These are very accurate; no photometric redshifts are used to make the plot above.

One thing that impressed me when Jay made the initial version of this plot is how well the models match the evolution of m* at z < 2, which is most of cosmic time (the past ten billion years). This encourages one that the assumption adopted above, that we understand the evolution of stars well enough to do this, might actually be correct. I was, and remain, especially impressed with how well the monolithic model with a simple exponential star formation history matches these data. It’s as if the inferences the community had made about the evolution of giant elliptical galaxies from local observations were correct.

The new thing that Jay’s work showed was that the evolution of typical cluster galaxies at z > 2 persists in tracking the monolithic model that formed early (zf = 10). There is a lot of scatter in the higher redshift data even though there is little at lower redshift. This is to be expected for both observational reasons – the data get rattier at larger distances – and theoretical ones: the exponential star formation history we assume is at best a crude average; at early times when short-lived but bright massive stars are present there will inevitably be stochastic variation around this trend. At later times the law of averages takes over and the scatter should settle down. That’s pretty much what we see.

What we don’t see is the decline in typical brightness predicted by contemporaneous LCDM models. The specific example shown is the Munich galaxy formation model based on the Millennium simulation. However, the prediction is generic: galaxies get faint at high redshift because they haven’t finished assembling yet. This is not a problem of misunderstanding stellar evolution, it is a failure of the hierarchical assembly paradigm.

In order to identify [proto]clusters at high redshift, Jay devised an algorithm to identify galaxies in close proximity on the sky and in redshift space, in excess of the average density around them. One question we had was whether the trend predicted by the LCDM model (the orange line above) would be reproduced in the data when analyzed in this way. To check, Jay made mock observations of a simulated lookback cone using the same algorithm. The results (not previously published) are the open squares in the plot above. These track the “right” answer known directly in the form of the orange line. Consequently, if the universe had looked as predicted, we could tell. It doesn’t.

The above plot is in terms of apparent magnitude. It is interesting to turn this into the corresponding stellar mass. There has also been work done on the subject after Jay’s, so I wanted to include it. An early version of a plot mapping m* to stellar mass and redshift to cosmic time that I came up with was this:

The stellar mass of L* galaxies as a function of cosmic age. Data as noted in the inset. The purple/orange lines represent the monolithic/hierarchical models, as above.

The more recent data (which also predate JWST) follow the same trend as the preceding data. All the data follow the path of the monolithic model. Note that the bulk of the stars are formed in situ in the first few billion years; the stellar mass barely changes after that. There is quite a bit of stellar evolution during this time, which is why m* in the figure above changes in a complicated fashion while the stellar mass remains constant. This again provides some encouragement that we understand how to model stellar populations.

The data in the first billion years are not entirely self-consistent. For example, the yellow points are rather higher in mass than the cyan points. This difference is not one in population modeling, but rather in how much of a correction is made for non-stellar, nebular emission. So as not to go down that rabbit hole, I chose to adopt the lowest stellar mass estimates for the figure that appears in the paper (below). Note that this is the most conservative choice; I’m trying to be as favorable to LCDM as is reasonably plausible.

Figure 4 from McGaugh et al. (2024)The characteristic stellar mass as a function of time with the corresponding redshift noted at the top.

There were more recent models as well as more recent data, so I wanted to include those. There are, in fact, way too many models to illustrate without creating a confusing forest of lines, so in the end I chose a couple of popular ones, Illustris and FIRE. Illustris is the descendant of Millennium, and shows identical behavior. FIRE has a different scheme for forming stars, and does so more rapidly than Illustris. However, its predictions still fall well short of the data. This is because both simulations share the same LCDM cosmology with the same merger tree assembly of structure. Assembling the mass promptly enough is the problem; it isn’t simply a matter of making stars faster.

I’ll show one more version of this plot to illustrate the predicted evolutionary trajectories. In the plots above, I only show models that end up with the mass of a typical local giant elliptical. Galaxies come in a variety of masses, so what does that look like?

The stellar mass of galaxies as a function of cosmic age. Data as above. The orange lines represent the hierarchical models that result in different final masses at z = 0.

The curves of stellar growth predicted by LCDM have pretty much the same shape, just different amplitude. The most massive case illustrated above is reasonable insofar as there are real galaxies that massive, but they are rare. They are also rare in simulations, which make the predicted curve a bit jagged as there aren’t enough examples to define a smooth trajectory as there are for lower mass objects. More importantly, the shape is wrong. One can imagine that the galaxies we see at high redshift are abnormally massive, but even the most massive galaxies don’t start out that big at high redshift. Moreover, they continue to grow hierarchically in LCDM, so they wind up too big. In contrast, the data look like the monolithic model that we made on a lark, no muss, no fuss, no need to adjust anything.

This really shouldn’t have come as a surprise. We already knew that galaxies were impossibly massive at z ~ 4 before JWST discovered that this was also true at z ~ 10. The a priori prediction that LCDM has made since its inception (earlier models show the same thing) fails. More recent models fail, though I have faith that they will eventually succeed. This is the path theorists has always taken, and the obvious path here, as I remarked previously, is to make star formation (or at least light production) artificially more efficient so that the hierarchical model looks like the monolithic model. For completeness, I indulge in this myself in the paper (section 6.3) as an exercise in what it takes to save the phenomenon.

A two year delay

Regular readers of this blog will recall that in addition to the predictions I emphasized when JWST was launched, I also made a number of posts about the JWST results as they started to come in back in 2022. I had also prepared the above as a science paper that is now sections 1 to 3 of McGaugh et al. (2024). The idea was to have it ready to go so I could add a brief section on the new JWST results and submit right away – back in 2022. The early results were much as expected, but I did not rush to publish. Instead, it has taken over two years since then to complete what turned into a much longer manuscript. There are many reasons for this, but the scientific reason is that I didn’t believe many of the initial reports.

JWST was new and exciting and people fell all over themselves to publish things quickly. Too quickly. To do so, they relied on a calibration of the telescope plus detector system made while it was on the ground prior to launch. This is not the same as calibrating it on the sky, which is essential but takes some time. Consequently, some of the initial estimates were off.

Stellar masses and redshifts of galaxies from Labbe et al. The pink squares are the initial estimates that appeared in their first preprint in July 2022. The black squares with error bars are from the version published in February 2023. The shaded regions represent where galaxies are too massive too early for LCDM. The lighter region is where galaxies shouldn’t exist; the darker region is a where they cannot exist.

In the example above, all of the galaxies had both their initial mass and redshift estimates change with the updated calibration. So I was right to be skeptical, and wait for an improved analysis. I was also right that while some cases would change, the basic interpretation would not. All that happened in the example above was that the galaxies moved from the “can’t exist in LCDM” region (dark blue) into the “really shouldn’t exist in LCDM” region (light blue). However, the widespread impression was that we couldn’t trust photometric redshifts at all, so I didn’t see what new I could justifiably add in 2022. This was, after all, the attitude Jay and I had taken in his CCPC survey where we required spectroscopic redshifts.

So I held off. But then it became impossible to keep up with the fire hose of data that ensued. Every time I got the chance to update the manuscript, I found some interesting new result had been published that I had to include. New things were being discovered faster than I could read the literature. I found myself stuck in the Red Queen’s dilemma, running as fast as possible just to stay in place.

Ultimately, I think the delay was worthwhile. Lots new was learned, and actual spectroscopic redshifts began to appear. (Spectroscopy takes more telescope time than photometry – spreading out the light reduces the signal-to-noise per pixel, necessitating longer exposure times, so it always lags behind. One also discovers the galaxies in the same images that are used for photometry, so it also gets a head start.) Consequently, there is a lot more in the paper than I had planned on. This is another long blog post, so I will end it where I had planned for the original paper to end, with the updated version of the plot above.

Massive galaxies at high redshift from JWST

The stellar masses of galaxies discovered by JWST as a function of redshift is shown below. Unlike most of the plots above, these are individual galaxies rather than typical L* galaxies. Many are based on photometric redshifts, but those in solid black have spectroscopic redshifts. There are many galaxies that reside in a region they should not, at least according to LCDM models: their mass is too large at the observed redshift.

Figure 6 from McGaugh et al. (2024)Mass estimates for high-redshift galaxies from JWST. Colored points based on photometric redshifts are from Adams et al. (2023; dark blue triangles), Atek et al. (2023; green circles), Labbé et al. (2023; open squares), Naidu et al. (2022; open star), Harikane et al. (2023; yellow diamonds), Casey et al. (2024; light blue left-pointing triangles), and Robertson et al. (2024; orange right-pointing triangles). Black points from Wang et al. (2023; squares), Carniani et al. (2024; triangles), Harikane et al. (2024; circles) and Castellano et al. (2024; star) have spectroscopic redshifts. The upper limit for the most massive galaxy in TNG100 (Springel et al. 2018) as assessed by Keller et al. (2023) is shown by the light blue line. This is consistent with the maximum stellar mass expected from the stellar mass–halo mass relation of Behroozi et al. (2020; solid blue line). These merge smoothly into the trend predicted by Yung et al. (2019b) for galaxies with a space density of 10−5 dex−1 Mpc−3 (dashed blue line), though L. Yung et al. (2023) have revised this upward by ∼0.4 dex (dotted blue line). This closely follows the most massive objects in TNG300 (Pillepich et al. 2018; red line). The light gray region represents the parameter space in which galaxies were not expected in LCDM. The dark gray area is excluded by the limit on the available baryon mass (Behroozi & Silk 2018; Boylan-Kolchin 2023). [Note added: I copied this from the caption in our paper, but the links all seem to go to that rather than to each of the cited papers. You can get to them from our reference list if you want, but it’ll take some extra clicks. It looks like AAS has set it up this way to combat trawling by bots.]

One can see what I mean about a fire hose of results from the number of references given here. Despite the challenges of keeping track of all this, I take heart in the fact that many different groups are finding similar results. Even the results that were initially wrong remain problematic for LCDM. Despite all the masses and redshifts changing when the calibration was updated, the bulk of the data (the white squares, which are the black squares in the preceding plot) remain in the problematic region. The same result is replicated many times over by others.

The challenge, as usual, is assessing what LCDM actually predicts. The entire region of this plot is well away from the region predicted for typical galaxies. To reside here, a galaxy must be an outlier. But how extreme an outlier?

The dark gray region is the no-go zone. This is where dark matter halos do not have enough baryons to make the observed mass of stars. It should be impossible for galaxies to be here. I can think of ways to get around this, but that’s material for a future post. For now, it suffices to know that there should be no galaxies in the dark gray region. Indeed, there are not. A few straddle the edge, but nothing is definitively in that region given the uncertainties. So LCDM is not outright falsified by these data. This bar is set very low, as the galaxies that do skirt the edge require that basically all of the available baryons have been converted into starts practically instantaneously. This is not a reasonable.

Not with ten thousand simulations could you do this.

So what is a reasonable expectation for this diagram? That’s hard to say, but that’s what the white and light gray region attempts to depict. Galaxies might plausibly be in the white region but should not be in the light gray region for any sensible star formation efficiency.

One problem with this statement is that it isn’t clear what a sensible star formation efficiency is. We have a good idea of what it needs to be, on average, at low redshift. There is no clear indication that it changes as a function of redshift – at least until we hit results like this. Then we have to be on guard for confirmation bias in which we simply make the star formation efficiency be what we need it to be. (This is essentially what I advocate as the least unreasonable option in section 6.3 of the ApJ paper.)

OK, but what should the limit be? Keller et al. (2023) made a meta-analysis of the available simulations; I have used his analysis and my own reading of the literature to establish the lower boundary of the light gray area. It is conceivable that you would get the occasional galaxy this massive (the white region is OK), but not more so (the light gray region is not OK). The boundary is the most extreme galaxy in each simulation, so as far from typical as possible. The light gray region is really not OK; the only question is where exactly it sets in.

The exact location of this boundary is not easy to define. Different simulations give different answers for different reasons. These are extremal statistics; we’re asking what the one most massive galaxy is in an entire simulation. Higher resolution simulations perceive the formation of small structures like galaxies sooner, but large simulations have more opportunity for extreme events to happen. Which “wins” in terms of making the rare big galaxy early is a competition between these effects that appears, in my reading, to depend on details of simulation implementation that are unlikely to be representative of physical reality (even assuming LCDM is the correct underlying physics).

To make my own assessment, I reviewed the accessible simulations (they don’t all provide the necessary information) to fine the very most massive simulated galaxy as a function of redshift. As ever, I am looking for the case that is most favorable to LCDM. The version I found comes from the large-box, next generation Illustris simulation TNG300. This is the red line a bit into the gray area above. Galaxies really, really should not exist above or to the right of that line. Not only have I adopted the most generous simulation estimate I could find, I have also chosen not to normalize to the area surveyed by JWST. One should do this, but the area so far surveyed is tiny, so the line slides down. Even if galaxies as massive as this exist in TNG300, we have to have been really lucky to point JWST at that spot on a first go. So the red line is doubly generous, and yet there are still galaxies that exceed this limit.

The bottom line is that yes, JWST data pose a real problem for LCDM. It has been amusing watching this break people’s brains. I’ve seen papers that say this is a problem for LCDM because you’d have to turn more than half of the available baryons into stars and that’s crazy talk, and others that say LCDM is absolutely OK because there are enough baryons. The observational result is the same – galaxies with very high stellar-to-dark halo mass ratios, but the interpretation appears to be different because one group of authors is treating the light gray region as forbidden while the other sets the bar at the dark gray region. So the difference in interpretation is not a conflict in the data, but an inconsistency in what [we think] LCDM predicts.

That’s enough for today. Galaxy data at high redshift are clearly in conflict with the a priori predictions of LCDM. This was true before JWST, and remains true with JWST. Whether the observations can be reconciled with LCDM I leave as an exercise for scientists in the field, or at least until another post.


+A minor technical note: the Schechter function is widely used to describe the luminosity function of galaxies, so it provides a common language with which to quantify both their characteristic luminosity L* and space density Φ*. I make use of it here to quantify the brightness of the typical galaxy. It is, of course, not perfect. As we go from low to high redshift, the luminosity function becomes less Schechter-like and more power law-like, an evolution that you can see in Jay Franck’s plot. We chose to use Schechter fits for consistency with the previous work of Mancone et al. (2010) and Wylezalek et al. (2014), and also to down-weight the influence of the few very bright galaxies should they be active galactic nuclei or some other form of contaminant. Long story short, plausible contaminants (no photometric redshifts were used; sample galaxies all have spectroscopic redshifts) cannot explain the bulk of the data; our estimates of m* are robust and, if anything, underestimate how bright galaxies typically are.

On the timescale for galaxy formation

On the timescale for galaxy formation

I’ve been wanting to expand on the previous post ever since I wrote it, which is over a month ago now. It has been a busy end to the semester. Plus, there’s a lot to say – nothing that hasn’t been said before, somewhere, somehow, yet still a lot to cobble together into a coherent story – if that’s even possible. This will be a long post, and there will be more after to narrate the story of our big paper in the ApJ. My sole ambition here is to express the predictions of galaxy formation theory in LCDM and MOND in the broadest strokes.

A theory is only as good as its prior. We can always fudge things after the fact, so what matters most is what we predict in advance. What do we expect for the timescale of galaxy formation? To tell you what I’m going to tell you, it takes a long time to build a massive galaxy in LCDM, but it happens much faster in MOND.

Basic Considerations

What does it take to make a galaxy? A typical giant elliptical galaxy has a stellar mass of 9 x 1010 M. That’s a bit more than our own Milky Way, which has a stellar mass of 5 or 6 x 1010 M (depending who you ask) with another 1010 M or so in gas. So, in classic astronomy/cosmology style, let’s round off and say a big galaxy is about 1011 M. That’s a hundred billion stars, give or take.

An elliptical galaxy (NGC 3379, left) and two spiral galaxies (NGC 628 and NGC 891, right).

How much of the universe does it take to make one big galaxy? The critical density of the universe is the over/under point for whether an expanding universe expands forever, or has enough self-gravity to halt the expansion and ultimately recollapse. Numerically, this quantity is ρcrit = 3H02/(8πG), which for H0 = 73 km/s/Mpc works out to 10-29 g/cm3 or 1.5 x 10-7 M/pc3. This is a very small number, but provides the benchmark against which we measure densities in cosmology. The density of any substance X is ΩX = ρXcrit. The stars and gas in galaxies are made of baryons, and we know the baryon density pretty well from Big Bang Nucleosynthesis: Ωb = 0.04. That means the average density of normal matter is very low, only about 4 x 10-31 g/cm3. That’s less than one hydrogen atom per cubic meter – most of space is an excellent vacuum!

This being the case, we need to scoop up a large volume to make a big galaxy. Going through the math, to gather up enough mass to make a 1011 M galaxy, we need a sphere with a radius of 1.6 Mpc. That’s in today’s universe; in the past the universe was denser by (1+z)3, so at z = 10 that’s “only” 140 kpc. Still, modern galaxies are much smaller than that; the effective edge of the disk of the Milky Way is at a radius of about 20 kpc, and most of the baryonic mass is concentrated well inside that: the typical half-light radius of a 1011 M galaxy is around 6 kpc. That’s a long way to collapse.

Monolithic Galaxy Formation

Given this much information, an early concept was monolithic galaxy formation. We have a big ball of gas in the early universe that collapses to form a galaxy. Why and how this got started was fuzzy. But we knew how much mass we needed and the volume it had to come from, so we can consider what happens as the gas collapses to create a galaxy.

Here we hit a big astrophysical reality check. Just how does the gas collapse? It has to dissipate energy to do so, and cool to form stars. Once stars form, they may feed energy back into the surrounding gas, reheating it and potentially preventing the formation of more stars. These processes are nontrivial to compute ab initio, and attempting to do so obsesses much of the community. We don’t agree on how these things work, so they are the knobs theorists can turn to change an answer they don’t like.

Even if we don’t understand star formation in detail, we do observe that stars have formed, and can estimate how many. Moreover, we do understand pretty well how stars evolve once formed. Hence a common approach is to build stellar population models with some prescribed star formation history and see what works. Spiral galaxies like the Milky Way formed a lot of stars in the past, and continue to do so today. To make 5 x 1010 M of stars in 13 Gyr requires an average star formation rate of 4 M/yr. The current measured star formation rate of the Milky Way is estimated to be 2 ± 0.7 M/yr, so the star formation rate has been nearly constant (averaging over stochastic variations) over time, perhaps with a gradual decline. Giant elliptical galaxies, in contrast, are “red and dead”: they have no current star formation and appear to have made most of their stars long ago. Rather than a roughly constant rate of star formation, they peaked early and declined rapidly. The cessation of star formation is also called quenching.

A common way to formulate the star formation rate in galaxies as a whole is the exponential star formation rate, SFR(t) = SFR0 e-t/τ. A spiral galaxy has a low baseline star formation rate SFR0 and a long burn time τ ~ 10 Gyr while an elliptical galaxy has a high initial star formation rate and a short e-folding time like τ ~ 1 Gyr. Many variations on this theme are possible, and are of great interest astronomically, but this basic distinction suffices for our discussion here. From the perspective of the observed mass and stellar populations of local galaxies, the standard picture for a giant elliptical was a large, monolithic island universe that formed the vast majority of its stars early on then quenched with a short e-folding timescale.

Galaxies as Island Universes

The density parameter Ω provides another useful way to think about galaxy formation. As cosmologists, we obsess about the global value of Ω because it determines the expansion history and ultimate fate of the universe. Here it has a more modest application. We can think of the region in the early universe that will ultimately become a galaxy as its own little closed universe. With a density parameter Ω > 1, it is destined to recollapse.

A fun and funny fact of the Friedmann equation is that the matter density parameter Ωm → 1 at early times, so the early universe when galaxies form is matter dominated. It is also very uniform (more on that below). So any subset that is a bit more dense than average will have Ω > 1 just because the average is very close to Ω = 1. We can then treat this region as its own little universe (a “top-hat overdensity”) and use the Friedmann equation to solve for its evolution, as in this sketch:

The expansion of the early universe a(t) (blue line). A locally overdense region may behave as a closed universe, recollapsing in a finite time (red line) to potentially form a galaxy.

That’s great, right? We have a simple, analytic solution derived from first principles that explains how a galaxy forms. We can plug in the numbers to find how long it takes to form our basic, big 1011 M galaxy and… immediately encounter a problem. We need to know how overdense our protogalaxy starts out. Is its effective initial Ωm = 2? 10? What value, at what time? The higher it is, the faster the evolution from initially expanding along with the rest of the universe to decoupling from the Hubble flow to collapsing. We know the math but we still need to know the initial condition.

Annoying Initial Conditions

The initial condition for galaxy formation is observed in the cosmic microwave background (CMB) at z = 1090. Where today’s universe is remarkably lumpy, the early universe is incredibly uniform. It is so smooth that it is homogeneous and isotropic to one part in a hundred thousand. This is annoyingly smooth, in fact. It would help to have some lumps – primordial seeds with Ω > 1 – from which structure can grow. The observed seeds are too tiny; the typical initial amplitude is 10-5 so Ωm = 1.00001. That takes forever to decouple and recollapse; it hasn’t yet had time to happen.

The cosmic microwave background as observed by ESA’s Planck satellite. This is an all-sky picture of the relic radiation field – essentially a snapshot of the universe when it was just a few hundred thousand years old. The variations in color are variations in temperature which correspond to variations in density. These variations are tiny, only about one part in 100,000. The early universe was very uniform; the real picture is a boring blank grayscale. We have to crank the contrast way up to see these minute variations.

We would like to know how the big galaxies of today – enormous agglomerations of stars and gas and dust separated by inconceivably vast distances – came to be. How can this happen starting from such homogeneous initial conditions, where all the mass is equally distributed? Gravity is an attractive force that makes the rich get richer, so it will grow the slight initial differences in density, but it is also weak and slow to act. A basic result in gravitational perturbation theory is that overdensities grow at the same rate the universe expands, which is inversely related to redshift. So if we see tiny fluctuations in density with amplitude 10-5 at z = 1000, they should have only grown by a factor of 1000 and still be small today (10-2 at z = 0). But we see structures of much higher contrast than that. You can’t here from there.

The rich large scale structure we see today is impossible starting from the smooth observed initial conditions. Yet here we are, so we have to do something to goose the process. This is one of the original motivations for invoking cold dark matter (CDM). If there is a substance that does not interact with photons, it can start to clump up early without leaving too large a mark on the relic radiation field. In effect, the initial fluctuations in mass are larger, just in the invisible substance. (That’s not to say the CDM doesn’t leave a mark on the CMB; it does, but it is subtle and entirely another story.) So the idea is that dark matter forms gravitational structures first, and the baryons fall in later to make galaxies.

An illustration of the the linear growth of overdensities. Structure can grow in the dark matter (long dashed lines) with the baryons catching up only after decoupling (short dashed line). In effect, the dark matter gives structure formation a head start, nicely explaining the apparently impossible growth factor. This has been standard picture for what seems like forever (illustration from Schramm 1992).

With the right amount of CDM – and it has to be just the right amount of a dynamically cold form of non-baryonic dark matter (stuff we still don’t know actually exists) – we can explain how the growth factor is 105 since recombination instead of a mere 103. The dark matter got a head start over the stuff we can see; it looks like 105 because the normal matter lagged behind, being entangled with the radiation field in a way the dark matter was not.

This has been the imperative need in structure formation theory for so long that it has become undisputed lore; an element of the belief system so deeply embedded that it is practically impossible to question. I risk getting ahead of the story, but it is important to point out that, like the interpretation of so much of the relevant astrophysical data, this belief assumes that gravity is normal. This assumption dictates the growth rate of structure, which in turn dictates the need to invoke CDM to allow structure to form in the available time. If we drop this assumption, then we have to work out what happens in each and every alternative that we might consider. That definitely gets ahead of the story, so first let’s understand what we should expect in LCDM.

Hierarchical Galaxy formation in LCDM

LCDM predicts some things remarkably well but others not so much. The dark matter is well-behaved, responding only to gravity. Baryons, on the other hand, are messy – one has to worry about hydrodynamics in the gas, star formation, feedback, dust, and probably even magnetic fields. In a nutshell, LCDM simulations are very good at predicting the assembly of dark mass, but converting that into observational predictions relies on our incomplete knowledge of messy astrophysics. We know what the mass should be doing, but we don’t know so well how that translates to what we see. Mass good, light bad.

Starting with the assembly of mass, the first thing we learn is that the story of monolithic galaxy formation outlined above has to be wrong. Early density fluctuations start out tiny, even in dark matter. God didn’t plunk down island universes of galaxy mass then say “let there be galaxies!” The annoying initial conditions mean that little dark matter halos form first. These subsequently merge hierarchically to make ever bigger halos. Rather than top-down monolithic galaxy formation, we have the bottom-up hierarchical formation of dark matter halos.

The hierarchical agglomeration of dark matter halos into ever larger objects is often depicted as a merger tree. Here are four examples from the high resolution Illustris TNG50 simulation (Pillepich et al. 2019; Nelson et al. 2019).

Examples of merger trees from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019). Objects have been selected to have very nearly the same stellar mass at z=0. Mass is built up through a series of mergers. One large dark matter halo today (at top) has many antecedents (small halos at bottom). These merge hierarchically as illustrated by the connecting lines. The size of the symbol is proportional to the halo mass. I have added redshift and the corresponding age of the universe for vanilla LCDM in a more legible font. The color bar illustrates the specific star formation rate: the top row has objects that are still actively star forming like spirals; those in the bottom row are “red and dead” – things that have stopped forming stars, like giant elliptical galaxies. In all cases, there is a lot of merging and a modest rate of growth, with the typical object taking about half a Hubble time (~7 Gyr) to assemble half of its final stellar mass.

The hierarchical assembly of mass is generic in CDM. Indeed, it is one of its most robust predictions. Dark matter halos start small, and grow larger by a succession of many mergers. This gradual agglomeration is slow: note how tiny the dark matter halos at z = 10 are.

Strictly speaking, it isn’t even meaningful to talk about a single galaxy over the span of a Hubble time. It is hard to avoid this mental trap: surely the Milky Way has always been the Milky Way? so one imagines its evolution over time. This is monolithic thinking. Hierarchically, “the galaxy” refers at best to the largest progenitor, the object that traces the left edge of the merger trees above. But the other protogalactic chunks that eventually merge together are as much part of the final galaxy as the progenitor that happens to be largest.

This complicated picture is complicated further by what we can see being stars, not mass. The luminosity we observe forms through a combination of in situ growth (star formation in the largest progenitor) and ex situ growth through merging. There is no reason for some preferred set of protogalaxies to form stars faster than the others (though of course there is some scatter about the mean), so presumably the light traces the mass of stars formed traces the underlying dark mass. Presumably.

That we should see lots of little protogalaxies at high redshift is nicely illustrated by this lookback cone from Yung et al (2022). Here the color and size of each point corresponds to the stellar mass. Massive objects are common at low redshift but become progressively rare at high redshift, petering out at z > 4 and basically absent at z = 10. This realization of the observable stellar mass tracks the assembly of dark mass seen in merger trees.

Fig. 2 from Yung et al. (2022) illustrating what an observer would see looking back through their simulation to high redshift.

This is what we expect to see in LCDM: lots of small protogalaxies at high redshift; the building blocks of later galaxies that had not yet merged. The observation of galaxies much brighter than this at high redshift by JWST poses a fundamental challenge to the paradigm: mass appears not to be subdivided as expected. So it is entirely justifiable that people have been freaking out that what we see are bright galaxies that are apparently already massive. That shouldn’t happen; it wasn’t predicted to happen; how can this be happening?

That’s all background that is assumed knowledge for our ApJ paper, so we’re only now getting to its Figure 1. This combines one of the merger trees above with its stellar mass evolution. The left panel shows the assembly of dark mass; the right pane shows the growth of stellar mass in the largest progenitor. This is what we expect to see in observations.


Fig. 1 from McGaugh et al (2024): A merger tree for a model galaxy from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019, left panel) selected to have M ≈ 9 × 1010 M at z = 0; i.e., the stellar mass of a local L giant elliptical galaxy (Driver et al. 2022). Mass assembles hierarchically, starting from small halos at high redshift (bottom edge) with the largest progenitor traced along the left of edge of the merger tree. The growth of stellar mass of the largest progenitor is shown in the right panel. This example (jagged line) is close to the median (dashed line) of comparable mass objects (Rodriguez-Gomez et al. 2016), and within the range of the scatter (the shaded band shows the 16th – 84th percentiles). A monolithic model that forms at zf = 10 and evolves with an exponentially declining star formation rate with τ = 1 Gyr (purple line) is shown for comparison. The latter model forms most of its stars earlier than occurs in the simulation.

For comparison, we also show the stellar mass growth of a monolithic model for a giant elliptical galaxy. This is the classic picture we had for such galaxies before we realized that galaxy formation had to be hierarchical. This particular monolithic model forms at zf = 10 and follows an exponential star formation rate with τ = 1 Gyr. It is one of the models published by Franck & McGaugh (2017). It is, in fact, the first model I asked Jay to construct when he started the project. Not because we expected it to best describe the data, as it turns out to do, but because the simple exponential model is a touchstone of stellar population modeling. It was a starter model: do this basic thing first to make sure you’re doing it right. We chose τ = 1 Gyr because that was the typical number bandied about for elliptical galaxies, and zf = 10 because that seemed ridiculously early for a massive galaxy to form. At the time we built the model, it was ludicrously early to imagine a massive galaxy would form, from an LCDM perspective. A formation redshift zf = 10 was, less than a decade ago, practically indistinguishable from the beginning of time, so we expected it to provide a limit that the data would not possibly approach.

In a remarkably short period, JWST has transformed z = 10 from inconceivable to run of the mill. I’m not going to go into the data yet – this all-theory post is already a lot – but to offer one spoiler: the data are consistent with this monolithic model. If we want to “fix” LCDM, we have to make the red line into the purple line for enough objects to explain the data. That proves to be challenging. But that’s moving the goalposts; the prediction was that we should see little protogalaxies at high redshift, not massive, monolith-style objects. Just look at the merger trees at z = 10!

Accelerated Structure Formation in MOND

In order to address these issues in MOND, we have to go back to the beginning. What is the evolution of a spherical region (a top-hat overdensity) that might collapse to form a galaxy? How does a spherical region under the influence of MOND evolve within an expanding universe?

The solution to this problem was first found by Felten (1984), who was trying to play the Newtonian cosmology trick in MOND. In conventional dynamics, one can solve the equation of motion for a point on the surface of a uniform sphere that is initially expanding and recover the essence of the Friedmann equation. It was reasonable to check if cosmology might be that simple in MOND. It was not. The appearance of a0 as a physical scale makes the solution scale-dependent: there is no general solution that one can imagine applies to the universe as a whole.

Felten reasonably saw this as a failure. There were, however, some appealing aspects of his solution. For one, there was no such thing as a critical density. All MOND universes would eventually recollapse irrespective of their density (in the absence of the repulsion provided by a cosmological constant). It could take a very long time, which depended on the density, but the ultimate fate was always the same. There was no special value of Ω, and hence no flatness problem. The latter obsessed people at the time, so I’m somewhat surprised that no one seems to have made this connection. Too soon*, I guess.

There it sat for many years, an obscure solution for an obscure theory to which no one gave credence. When I became interested in the problem a decade later, I started methodically checking all the classic results. I was surprised to find how many things we needed dark matter to explain were just as well (or better) explained by MOND. My exact quote was “surprised the bejeepers out of us.” So, what about galaxy formation?

I started with the top-hat overdensity, and had the epiphany that Felten had already obtained the solution. He had been trying to solve all of cosmology, which didn’t work. But he had solved the evolution of a spherical region that starts out expanding with the rest of the universe but subsequently collapses under the influence of MOND. The overdensity didn’t need to be large, it just needed to be in the low acceleration regime. Something like the red cycloidal line in the second plot above could happen in a finite time. But how much?

The solution depends on scale and needs to be solved numerically. I am not the greatest programmer, and I had a lot else on my plate at the time. I was in no rush, as I figured I was the only one working on it. This is usually a good assumption with MOND, but not in this case. Bob Sanders had had the same epiphany around the same time, which I discovered when I received his manuscript to referee. So all credit is due to Bob: he said these things first.

First, he noted that galaxy formation in MOND is still hierarchical. Small things form first. Crudely speaking, structure formation is very similar to the conventional case, but now the goose comes from the change in the force law rather than extra dark mass. MOND is nonlinear, so the whole process gets accelerated. To compare with the linear growth of CDM:

A sketch of how structures grow over time under the influence of cold dark matter (left, from Schramm 1992, same as above) and MOND (right, from Sanders & McGaugh 2002; see also this further discussion and previous post). The slow linear growth of CDM (long-dashed line, left panel) is replaced by a rapid, nonlinear growth in MOND (solid lines at right; numbers correspond to different scales). Nonlinear growth moderates after cosmic expansion begins to accelerate (dashed vertical line in right panel).

The net effect is the same. A cosmic web of large scale structure emerges. They look qualitatively similar, but everything happens faster in MOND. This is why observations have persistently revealed structures that are more massive and were in place earlier than expected in contemporaneous LCDM models.

Simulated structure formation in ΛCDM (top) and MOND (bottom) showing the more rapid emergence of similar structures in MOND (note the redshift of each panel). From McGaugh (2015).

In MOND, small objects like globular clusters form first, but galaxies of a range of masses all collapse on a relatively short cosmic timescale. How short? Let’s consider our typical 1011 M galaxy. Solving Felten’s equation for the evolution of a sphere numerically, peak expansion is reached after 300 Myr and collapse happens in a similar time. The whole galaxy is in place speedy quick, and the initial conditions don’t really matter: a uniform, initially expanding sphere in the low acceleration regime will behave this way. From our distant vantage point thirteen billion years later, the whole process looks almost monolithic (the purple line above) even though it is a chaotic hierarchical mess for the first few hundred million years (z > 14). In particular, it is easy to form half of the stellar mass early on: the mass is already assembled.

The evolution of a 1011 M sphere that starts out expanding with the universe but decouples and collapses under the influence of MOND (dotted line). It reaches maximum expansion after 300 Myr and recollapses in a similar time, so the entire object is in place after 600 Myr. (A version of this plot with a logarithmic time axis appears as Fig. 2 in our paper.) The inset shows the evolution of smaller shells within such an object (Fig. 2 from Sanders 2008). The inner regions collapse first followed by outer shells. These oscillate and cross, mixing and ultimately forming a reasonable size galaxy – see Sanders’s Table 1 and also his Fig. 4 for the collapse times for objects of other masses. These early results are corroborated by Eappen et al. (2022), who further demonstrate that the details of feedback are not important in MOND, unlike LCDM.

This is what JWST sees: galaxies that are already massive when the universe is just half a billion years old. I’m sure I should say more but I’m exhausted now and you may be too, so I’m gonna stop here by noting that in 1998, when Bob Sanders predicted that “Objects of galaxy mass are the first virialized objects to form (by z=10),” the contemporaneous prediction of LCDM was that “present-day disc [galaxies] were assembled recently (at z<=1)” and “there is nothing above redshift 7.” One of these predictions has been realized. It is rare in science that such a clear a priori prediction comes true, let alone one that seemed so unreasonable at the time, and which took a quarter century to corroborate.


*I am not quite this old: I was still an undergraduate in 1984. I hadn’t even decided to be an astronomer at that point; I certainly hadn’t started following the literature. The first time I heard of MOND was in a graduate course taught by Doug Richstone in 1988. He only mentioned it in passing while talking about dark matter, writing the equation on the board and saying maybe it could be this. I recall staring at it for a long few seconds, then shaking my head and muttering “no way.” I then completely forgot about it, not thinking about it again until it came up in our data for low surface brightness galaxies. I expect most other professionals have the same initial reaction, which is fair. The test of character comes when it crops up in their data, as it is doing now for the high redshift galaxy community.

The MHONGOOSE survey of atomic gas in and around galaxies

The MHONGOOSE survey of atomic gas in and around galaxies

I have been spending a lot of time lately writing up a formal paper on high redshift galaxies, so haven’t had much time to write here. The paper is a lot more involved than I told you so, but yeah, I did. Repeatedly. I do have a start on a post on self-interacting dark matter that I hope eventually to get back to. Today, I want to give a quick note about the MHONGOOSE survey. But first, a non-commercial interruption.


Triton Station joins Rogue Scholar

In internet news, Triton Station has joined Rogue Scholar. The blog itself hasn’t moved; Rogue Scholar is a community of science blogs. It provides some important capabilities, including full-text search, long-term archiving, DOIs, and metadata. The DOIs (Digital Object Identifiers) were of particular interest to me, as they have become the standard for identifying unique articles in regular academic journals now that these have mostly (entirely?) gone on-line. I had not envisioned ever citing this blog in a refereed journal, but a DOI makes it possible to do so. Any scientists who find a post useful are welcome to make use of this feature. I’m inclined to follow the example of JCAP and make the format volume, page be yearmonth, date (YYMM, DD), which comes out to Triton Station (2022), 2201, 03 in the standard astronomy journal format. I do not anticipate continuing to publish in the twenty second century, so no need for YYYYMM, Y2K experience notwithstanding.

For everyone interested in science, Rogue Scholar is a great place to find new blogs.


MHONGOOSE

In science news, the MHONGOOSE collaboration has released its big survey summary paper. Many survey science papers are in the pipeline. Congratulations to all involved, especially PI Erwin de Blok.

Erwin was an early collaborator of mine who played a pivotal role in measuring the atomic gas properties of low surface brightness galaxies, establishing the cusp-core problem, and that low surface brightness galaxies are dark matter dominated (or at least evince large mass discrepancies, as predicted by MOND). He has done a lot more since then, among them playing a leading role in the large VLA survey of nearby galaxies, THINGS. In astronomy we’re always looking forward to the next big survey – its a big universe; there’s always more out there. So after THINGS he conceived and began work on MHONGOOSE. It has been a long road tied to the construction of the MeerKAT array of radio telescopes – a major endeavor on the road to the ambitious Square Kilometer Array.

I was involved in the early phases of the MHONGOOSE project, helping to select the sample of target galaxies (it is really important to cover the full dynamic range of galaxy properties, dwarf to giant) and define the aspirational target sensitivity. HI observations often taper off below a column density of 1020 hydrogen atoms per cm2 (about 1 solar mass per square parsec). With work, one can get down to a few times 1019 cm-2. We want to go much deeper to see how much farther out the atomic gas extends. It was already known to go further out than the stars, but how far? Is there a hard edge, or just a continuous fall off?

We also hope to detect new dwarf galaxies that are low surface brightness in HI. There could, in theory, be zillions of such things lurking in all the dark matter subhalos that are predicted to exist around big galaxies. Irrespective of theory, are there HI gas-rich galaxies that are entirely devoid of stars? Do such things exist? People have been looking for them a long time, and there are now many examples of galaxies that are well over 95% gas, but there always seem to be at least a few stars associated with them. Is this always true? If we have cases that are 98, 99% gas, why not 100%? Do galaxies with gas always manage to turn at least a little of it into stars? They do have a Hubble time to work on it, so it is also a question why there is so much gas still around in these cases.

And… a lot of other things, but I don’t want to be here all day. So just a few quick highlights from the main survey paper. First, the obligatory sensitivity diagram. This shows how deep the survey reaches (lower column density) as a function of resolution (beam size). You want to see deeply and you want to resolve what you see, so ideally both of these numbers would be small. MHONGOOSE undercuts existing surveys, and is unlikely to be bettered until the full SKA comes on-line, which is still a long way off.

Sensitivity versus resolution in HI surveys.

And here are a couple of individual galaxy observations:

Optical images and the HI moment zero, one, and two maps. The moment zero map of the intensity of 21 cm radiation tells us where the atomic gas is, and how much of it there is. The moment one map is the velocity field from which we can construct a rotation curve. The second moment measures the velocity dispersion of the gas.

These are beautiful data. The spiral arms appear in the HI as well as in starlight, and continue in HI to larger radii. The outer edge of the HI disk is pretty hard; there doesn’t seem to be a lot of extra gas at low column densities extending indefinitely into the great beyond. I’m particular struck by the velocity dispersion of NGC 1566 tracking the spiral structure: this means the spiral arms have mass, and any stirring caused by star formation is localized to the spirals where much of the star formation goes on. That’s natural, but the surroundings seem relatively unperturbed: feedback is happening locally, but not globally. The velocity field of NGC 5068 has a big twist in the zero velocity contour (the thick line dividing the red receding side from the blue approaching side); this is a signature of non-circular motion, probably caused in this case by the visible bar. These are two-dimensional examples of Renzo’s rule (Sancisi’s Law), in which features in the visible mass distribution correspond to features in the kinematics.

I’ll end with a quick peak at the environments around some MHONGOOSE target galaxies:

Fields where additional galaxies (in blue) are present around the central target.

This is nifty on many levels. First, some (presumptively satellite) dwarf galaxies are detected. That in itself is a treat to me: once upon a time, Renzo Sancisi asked me to smooth the bejeepers out of the LSB galaxy data cubes to look for satellites. After much work, we found nada. Nothing. Zilch. It turns out that LSB galaxies are among the most isolated galaxy types in the universe. So that we detect some things here is gratifying, even in targets that are not LSBs.

Second, there are not a lot of new detections. The halos of big galaxies are not swimming in heretofore unseen swarms of low column density gas clouds. There can always be more at sensitivities yet unreached, but the data sure don’t encourage that perspective. MHONGOOSE is sensitive to very low mass gas clouds. The exact limit is distance-dependent, but a million solar masses of atomic gas should be readily visible. That’s a tiny amount by extragalactic standards, about one globular cluster’s worth of material. There’s just not a lot there.

Disappointing as the absence of zillions of new detections may be discovery-wise, it does teach us some important lessons. Empirically, galaxies look like island universes in gas as well as stars. There may be a few outlying galaxies, but they are not embedded in an obvious cosmic network of ephemeral cold gas. Nor are there thousands of unseen satellites/subhalos suddenly becoming visible – at least not in atomic gas. Theorists can of course imagine other things, but we observers can only measure one thing at a time, as instrumentation and telescope availability allows. This is a big step forward.

Primer on Galaxy Properties

Primer on Galaxy Properties

When we look up at the sky, we see stars. Stars are the building blocks of galaxies; we can see the stellar disk of the galaxy in which we live as the vault of the Milky Way arching across the sky. When we look beyond the Milky Way, we see galaxies. Just as stars are the building blocks of galaxies, galaxies are the building blocks of the universe. One can no more hope to understand cosmology without understanding galaxies than one can hope to understand galaxies without understanding stars.

Here I give a very brief primer on basic galaxy properties. This is a subject on which entire textbooks are written, so what I say here is necessarily very incomplete. It is a bare minimum to go on for the ensuing discussion.

Galaxy Properties

Cosmology entered the modern era when Hubble (1929) resolved the debate over the nature of spiral nebulae by measuring the distance to Andromeda, establishing that vast stellar systems — galaxies — exist external to and coequal with the Milky Way. Galaxies are the primary type of object observed when we look beyond the confines of our own Milky Way: they are the building blocks of the universe. Consequently, galaxies and cosmology are intertwined: it is impossible to understand one without the other.

Here I sketch a few essential facts about the properties of galaxies. This is far from a comprehensive list (see, for example Binney & Tremaine, 1987) and serves only to provide a minimum framework for the subsequent discussion. The properties of galaxies are often cast in terms of morphological type, starting with Hubble’s tuning fork diagram. The primary distinction is between Early Type Galaxies (ETGs) and Late Type Galaxies (LTGs), which is a matter of basic structure. ETGs, also known as elliptical galaxies, are three dimensional, ellipsoidal systems that are pressure supported: there is more kinetic energy in random motions than in circular motions, a condition described as dynamically hot. The orbits of stars are generally eccentric and oriented randomly with respect to one another, filling out the ellipsoidal shape seen in projection on the sky. LTGs, including spiral and irregular galaxies, are thin, quasi-two dimensional, rotationally supported disks. The majority of their stars orbit in the same plane in the same direction on low eccentricity orbits. The lion’s share of kinetic energy is invested in circular motion, with only small random motions, a condition described as dynamically cold. Examples of early and late type galaxies are shown in Fig. 1.

Fig. 1. Galaxy morphology. These examples shown an early type elliptical galaxy (NGC 3379, left), and two late type disk galaxies: a face-on spiral (NGC 628, top right), and an edge-on disk galaxy (NGC 891, bottom right). Elliptical galaxies are quasi-spherical, pressure supported stellar systems that tend to have predominantly old stellar populations, usually lacking young stars or much in the way of the cold interstellar gas from which they might form. In contrast, late type galaxies (spirals and irregulars) are thin, rotationally supported disks. They typically contain a mix of stellar ages and cold interstellar gas from which new stars continue to form. Interstellar dust is also present, being most obvious in the edge-on case. Images from Palomar Observatory, Caltech.

Finer distinctions in morphology can be made within the broad classes of early and late type galaxies, but the basic structural and kinematic differences suffice here. The disordered motion of ETGs is a natural consequence of violent relaxation (Lynden-Bell, 1967) in which a stellar system reaches a state of dynamical equilibrium from a chaotic initial state. This can proceed relatively quickly from a number of conceivable initial conditions, and is a rather natural consequence of the hierarchical merging of sub-clumps expected from the Gaussian initial conditions indicated by observations of the CMB (White, 1996). In contrast, the orderly rotation of dynamically cold LTGs requires a gentle settling of gas into a rotationally supported disk. It is essential that disk formation occur in the gaseous phase, as gas can dissipate and settle to the preferred plane specified by the net angular momentum of the system. Once stars form, their orbits retain a memory of their initial state for a period typically much greater than the age of the universe (Binney & Tremaine, 1987). Consequently, the bulk of the stars in the spiral disk must have formed there after the gas settled.

In addition to the dichotomy in structure, ETGs and LTGs also differ in their evolutionary history. ETGs tend to be ‘red and dead,’ which is to say, dominated by old stars. They typically lack much in the way of recent star formation, and are often devoid of the cold interstellar gas from which new stars can form. Most of their star formation happened in the early universe, and may have involved the merger of multiple protogalactic fragments. Irrespective of these details, massive ETGs appeared early in the universe (Steinhardt et al., 2016), and for the most part seem to have evolved passively since (Franck and McGaugh, 2017).

Again in contrast, LTGs have on-going star formation in interstellar media replete with cold atomic and molecular gas. They exhibit a wide range in stellar ages, from newly formed stars to ancient stars dating to near the beginning of time. Old stars seem to be omnipresent, famously occupying globular clusters but also present in the general disk population. This implies that the gaseous disk settled fairly early, though accretion may continue over a long timescale (van den Bergh, 1962; Henry and Worthey, 1999). Old stars persist in the same orbital plane as young stars (Binney & Merrifield, 1998), which precludes much subsequent merger activity, as the chaos of merging distorts orbits. Disks can be over-heated (Toth and Ostriker, 1992) and transformed by interactions between galaxies (Toomre and Toomre, 1972), even turning into elliptical galaxies during major mergers (Barnes & Hernquist, 1992).

Aside from its morphology, an obvious property of a galaxy is its mass. Galaxies exist over a large range of mass, with a type-dependent characteristic stellar mass of 5 ​× ​1010 ​M for disk dominated systems (the Milky Way is very close to this mass: Bland-Hawthorn & Gerhard, 2016) and 1011 ​M for elliptical galaxies (Moffett et al., 2016). Above this characteristic mass, the number density of galaxies declines sharply, though individual galaxies exceeding a few 1011 ​M certainly exist. The number density of galaxies increases gradually to lower masses, with no known minimum. The gradual increase in numbers does not compensate for the decrease in mass: integrating over the distribution, one finds that most of the stellar mass is in bright galaxies close to the characteristic mass.

Galaxies have a characteristic size and surface brightness. The same amount of stellar mass can be concentrated in a high surface brightness (HSB) galaxies, or spread over a much larger area in a low surface brightness (LSB) galaxy. For the purposes of this discussion, it suffices to assume that the observed luminosity is proportional to the mass of stars that produces the light. Similarly, the surface brightness measures the surface density of stars. Of the three observable quantities of luminosity, size, and surface brightness, only two are independent: the luminosity is the product of the surface brightness and the area over which it extends. The area scales as the square of the linear size.

The distribution of size and mass of galaxies is shown in Fig. 2. This figure spans the range from tiny dwarf irregular galaxies containing ‘only’ a few hundred thousand stars to giant spirals composed of hundreds of billions of stars with half-light radii ranging from hundreds of parsecs to tens of kpc. The upper boundaries represent real, physical limits on the sizes and masses of galaxies. Bright objects are easy to see; if still higher mass galaxies were common, they would be readily detected and cataloged. In contrast, the lower boundaries are set by the limits of observational sensitivity (“selection effects”): galaxies that are physically small or low in surface brightness are difficult to detect and are systematically under-represented in galaxy catalogs (Allen & Shu, 1979; Disney, 1976; McGaugh et al., 1995a).

Fig. 2. Galaxy size and mass. The radius that contains half of the light is plotted against the stellar mass. Galaxies exist over many decades in mass, and exhibit a considerable variation in size at a given mass. Early and late type galaxies are demarcated with different symbols, as noted. Lines illustrate tracks of constant stellar surface density. The data for ETGs are from the compilation of Dabringhausen and Fellhauer (2016) augmented by dwarf Spheroidal (dSph) galaxies in the Local Group compiled by Lelli et al. (2017). Ultra-diffuse galaxies (UDGs: van Dokkum et al., 2015; Mihos et al., 2015, ​× ​and +, respectively) have unsettled kinematic classifications at present, but most seem likely to be pressure supported ETGs. The bulk of the data for LTGs is from the SPARC database (Lelli et al., 2016a), augmented by cases that are noteworthy for their extremity in mass or surface brightness (Brunker et al., 2019; Dalcanton, Spergel, Gunn, et al., 1997; de Blok et al., 1995; McGaugh and Bothun, 1994; Mihos et al., 2018; Rhode et al., 2013; Schombert et al., 2011). The gas content of these star-forming systems adds a third axis, illustrated crudely here by whether an LTG is made more of stars or gas (filled and open symbols, respectively).

Individual galaxies can be early type or late type, high mass or low mass, large or small in linear extent, high or low surface brightness, gas poor or gas rich. No one of these properties is completely predictive of the others: the correlations that do exist tend to have lots of intrinsic scatter. The primary exception to this appears to involve the kinematics. Massive galaxies are fast rotators; low mass galaxies are slow rotators. This Tully-Fisher relation (Tully and Fisher, 1977) is one of the strongest correlations in extragalactic astronomy (Lelli et al., 2016b). It is thus necessary to simultaneously explain both the chaotic diversity of galaxy properties and the orderly nature of their kinematics (McGaugh et al., 2019).

Galaxies do not exist in isolation. Rather than being randomly distributed throughout the universe, they tend to cluster together: the best place to find a galaxy is in the proximity of another galaxy (Rubin, 1954). A common way to quantify the clustering of galaxies is the two-point correlation function ξ(r) (Peebles, 1980). This measures the excess probability of finding a galaxy within a distance r of a reference galaxy relative to a random distribution. The observed correlation function is well approximated as a power law whose slope and normalization varies with galaxy population. ETGs are more clustered than LTGs, having a longer correlation length: r0 ​≈ ​9 Mpc for red galaxies vs. ~ 5 Mpc for blue galaxies (Zehavi et al., 2011). Here we will find this quantity to be of interest for comparing the distribution of high and low surface brightness galaxies.


Galaxies are sometimes called island universes. That is partly a hangover from pre-Hubble times during which it was widely believed that the Milky Way contained everything: it was one giant island universe embedded in an indefinite but otherwise empty void. We know that’s not true now – there are lots of stellar systems of similar size to the Milky Way – but they often seem to stand alone even if they are clustered in non-random ways.

For example, here is the spiral galaxy NGC 7757, an island unto itself.

NGC 7757 from the digitized sky survey (© 1994, Association of Universities for Research in Astronomy, Inc).

NGC 7757 is a high surface brightness spiral. It is easy to spot amongst the foreground stars of the Milky Way. In contrast, there are strong selection effects against low surface brightness galaxies, like UGC 1230:

UGC 1230 from the digitized sky survey (© 1994, Association of Universities for Research in Astronomy, Inc).

The LSB galaxy is rather harder to spot. Even when noticed, it doesn’t seem as important as the HSB galaxy. This, in a nutshell, is the history of selection effects in galaxy surveys, which are inevitably biased towards the biggest and the brightest. Advances in detectors (especially the CCD revolution of the 1980s) helped open our eyes to the existence of these LSB galaxies, and allowed us to measure their physical properties. Doing so provided a stringent test of galaxy formation theories, which have scrambled to catch up ever since.

What JWST will see

What JWST will see

Big galaxies at high redshift!

That’s my prediction, anyway. A little context first.

New Year, New Telescope

First, JWST finally launched. This has been a long-delayed NASA mission; the launch had been put off so many times it felt like a living example of Zeno’s paradox: ever closer but never quite there. A successful launch is always a relief – rockets do sometimes blow up on lift off – but there is still sweating to be done: it has one of the most complex deployments of any space mission. This is still a work in progress, but to start the new year, I thought it would be nice to look forward to what we hope to see.

JWST is a major space telescope optimized for observing in the near and mid-infrared. This enables observation of redshifted light from the earliest galaxies. This should enable us to see them as they would appear to our eyes had we been around at the time. And that time is long, long ago, in galaxies very far away: in principle, we should be able to see the first galaxies in their infancy, 13+ billion years ago. So what should we expect to see?

Early galaxies in LCDM

A theory is only as good as its prior. In LCDM, structure forms hierarchically: small objects emerge first, then merge into larger ones. It takes time to build up large galaxies like the Milky Way; the common estimate early on was that it would take at least a billion years to assemble an L* galaxy, and it could easily take longer. Ach, terminology: an L* galaxy is the characteristic luminosity of the Schechter function we commonly use to describe the number density of galaxies of various sizes. L* galaxies like the Milky Way are common, but the number of brighter galaxies falls precipitously. Bigger galaxies exist, but they are rare above this characteristic brightness, so L* is shorthand for a galaxy of typical brightness.

We expect galaxies to start small and slowly build up in size. This is a very basic prediction of LCDM. The hierarchical growth of dark matter halos is fundamental, and relatively easy to calculate. How this translates to the visible parts of galaxies is more fraught, depending on the details of baryonic infall, star formation, and the many kinds of feedback. [While I am a frequent critic of model feedback schemes implemented in hydrodynamic simulations on galactic scales, there is no doubt that feedback happens on the much smaller scales of individual stars and their nurseries. These are two very different things for which we confusingly use the same word since the former is the aspirational result of the latter.] That said, one only expects to assemble mass so fast, so the natural expectation is to see small galaxies first, with larger galaxies emerging slowly as their host dark matter halos merge together.

Here is an example of a model formation history that results in the brightest galaxy in a cluster (from De Lucia & Blaizot 2007). Little things merge to form bigger things (hence “hierarchical”). This happens a lot, and it isn’t really clear when you would say the main galaxy had formed. The final product (at lookback time zero, at redshift z=0) is a big galaxy composed of old stars – fairly typically for a giant elliptical. But the most massive progenitor is still rather small 8 billion years ago, over 4 billion years after the Big Bang. The final product doesn’t really emerge until the last major merger around 4 billion years ago. This is just one example in one model, and there are many different models, so your mileage will vary. But you get the idea: it takes a long time and a lot of mergers to assemble a big galaxy.

Brightest cluster galaxy merger tree. Time progresses upwards from early in the universe at bottom to the present day at top. Every line is a small galaxy that merges to ultimately form the larger galaxy. Symbols are color-coded by B−V color (red meaning old stars, blue young) and their area scales with the stellar mass (bigger circles being bigger galaxies. From De Lucia & Blaizot 2007).

It is important to note that in a hierarchical model, the age of a galaxy is not the same as the age of the stars that make up the galaxy. According to De Lucia & Blaizot, the stars of the brightest cluster galaxies

“are formed very early (50 per cent at z~5, 80 per cent at z~3)”

but do so

“in many small galaxies”

– i.e., the little progenitor circles in the plot above. The brightest cluster galaxies in their model build up rather slowly, such that

“half their final mass is typically locked-up in a single galaxy after z~0.5.”

De Lucia & Blaizot (2007)

So all the star formation happens early in the little things, but the final big thing emerges later – a lot later, only reaching half its current size when the universe is about 8 Gyr old. (That’s roughly when the solar system formed: we are late-comers to this party.) Given this prediction, one can imagine that JWST should see lots of small galaxies at high redshift, their early star formation popping off like firecrackers, but it shouldn’t see any big galaxies early on – not really at z > 3 and certainly not at z > 5.

Big galaxies in the data at early times?

While JWST is eagerly awaited, people have not been idle about looking into this. There have been many deep surveys made with the Hubble Space Telescope, augmented by the infrared capable (and now sadly defunct) Spitzer Space Telescope. These have already spied a number of big galaxies at surprisingly high redshift. So surprising that Steinhardt et al. (2016) dubbed it “The Impossibly Early Galaxy Problem.” This is their key plot:

The observed (points) and predicted (lines) luminosity functions of galaxies at various redshifts (colors). If all were well, the points would follow the lines of the same color. Instead, galaxies appear to be brighter than expected, already big at the highest redshifts probed. From Steinhardt et al. (2016).

There are lots of caveats to this kind of work. Constructing the galaxy luminosity function is a challenging task at any redshift; getting it right at high redshift especially so. While what counts as “high” varies, I’d say everything on the above plot counts. Steinhardt et al. (2016) worry about these details at considerable length but don’t find any plausible way out.

Around the same time, one of our graduate students, Jay Franck, was looking into similar issues. One of the things he found was that not only were there big galaxies in place early on, but they were also in clusters (or at least protoclusters) early and often. That is to say, not only are the galaxies too big too soon, so are the clusters in which they reside.

Dr. Franck made his own comparison of data to models, using the Millennium simulation to devise an apples-to-apples comparison:

The apparent magnitude m* at 4.5 microns of L* galaxies in clusters as a function of redshift. Circles are data; squares represent the Millennium simulation. These diverge at z > 2: galaxies are brighter (smaller m*) than predicted (Fig. 5.5 from Franck 2017).

The result is that the data look more like big galaxies formed early already as big galaxies. The solid lines are “passive evolution” models in which all the stars form in a short period starting at z=10. This starting point is an arbitrary choice, but there is little cosmic time between z = 10 and 20 – just a few hundred million years, barely one spin around the Milky Way. This is a short time in stellar evolution, so is practically the same as starting right at the beginning of time. As Jay put it,

“High redshift cluster galaxies appear to be consistent with an old stellar population… they do not appear to be rapidly assembling stellar mass at these epochs.”

Franck 2017

We see old stars, but we don’t see the predicted assembly of galaxies via mergers, at least not at the expected time. Rather, it looks like some galaxies were already big very early on.

As someone who has worked mostly on well resolved, relatively nearby galaxies, all this makes me queasy. Jay, and many others, have worked desperately hard to squeeze knowledge from the faint smudges detected by first generation space telescopes. JWST should bring these into much better focus.

Early galaxies in MOND

To go back to the first line of this post, big galaxies at high redshift did not come as a surprise to me. It is what we expect in MOND.

Structure formation is generally considered a great success of LCDM. It is straightforward and robust to calculate on large scales in linear perturbation theory. Individual galaxies, on the other hand, are highly non-linear objects, making them hard to beasts to tame in a model. In MOND, it is the other way around – predicting the behavior of individual galaxies is straightforward – only the observed distribution of mass matters, not all the details of how it came to be that way – but what happens as structure forms in the early universe is highly non-linear.

The non-linearity of MOND makes it hard to work with computationally. It is also crucial to how structure forms. I provide here an outline of how I expect structure formation to proceed in MOND. This page is now old, even ancient in internet time, as the golden age for this work was 15 – 20 years ago, when all the essential predictions were made and I was naive enough to think cosmologists were amenable to reason. Since the horizon of scientific memory is shorter than that, I felt it necessary to review in 2015. That is now itself over the horizon, so with the launch of JWST, it seems appropriate to remind the community yet again that these predictions exist.

This 1998 paper by Bob Sanders is a foundational paper in this field (see also Sanders 2001 and the other references given on the structure formation page). He says, right in the abstract,

“Objects of galaxy mass are the first virialized objects to form (by z = 10), and larger structure develops rapidly.”

Sanders (1998)

This was a remarkable prediction to make in 1998. Galaxies, much less larger structures, were supposed to take much longer to form. It takes time to go from the small initial perturbations that we see in the CMB at z=1000 to large objects like galaxies. Indeed, the it takes at least a few hundred million years simply in free fall time to assemble a galaxy’s worth of mass, a hard limit. Here Sanders was saying that an L* galaxy might assemble as early as half a billion years after the Big Bang.

So how can this happen? Without dark matter to lend a helping hand, structure formation in the very early universe is inhibited by the radiation field. This inhibition is removed around z ~ 200; exactly when being very sensitive to the baryon density. At this point, the baryon perturbations suddenly find themselves deep in the MOND regime, and behave as if there is a huge amount of dark matter. Structure proceeds hierarchically, as it must, but on a highly compressed timescale. To distinguish it from LCDM hierarchical galaxy formation, let’s call it prompt structure formation. In prompt structure formation, we expect

  • Early reionization (z ~ 20)
  • Some L* galaxies by z ~ 10
  • Early emergence of the cosmic web
  • Massive clusters already at z > 2
  • Large, empty voids
  • Large peculiar velocities
  • A very large homogeneity scale, maybe fractal over 100s of Mpc

There are already indications of all of these things, nearly all of which were predicted in advance of the relevant observations. I could elaborate, but that is beyond the scope of this post. People should read the references* if they’re keen.

*Reading the science papers is mandatory for the pros, who often seem fond of making straw man arguments about what they imagine MOND might do without bothering to check. I once referred some self-styled experts in structure formation to Sanders’s work. They promptly replied “That would mean structures of 1018 M!” when what he said was

“The largest objects being virialized now would be clusters of galaxies with masses in excess of 1014 M. Superclusters would only now be reaching maximum expansion.”

Sanders (1998)

The exact numbers are very sensitive to cosmological parameters, as Sanders discussed, but I have no idea where the “experts” got 1018, other than just making stuff up. More importantly, Sanders’s statement clearly presaged the observation of very massive clusters at surprisingly high redshift and the discovery of the Laniakea Supercluster.

These are just the early predictions of prompt structure formation, made in the same spirit that enabled me to predict the second peak of the microwave background and the absorption signal observed by EDGES at cosmic dawn. Since that time, at least two additional schools of thought as to how MOND might impact cosmology have emerged. One of them is the sterile neutrino MOND cosmology suggested by Angus and being actively pursued by the Bonn-Prague research group. Very recently, there is of course the new relativistic theory of Skordis & Złośnik which fits the cosmologists’ holy grail of the power spectrum in both the CMB at z = 1090 and galaxies at z = 0. There should be an active exchange and debate between these approaches, with perhaps new ones emerging.

Instead, we lack critical mass. Most of the community remains entirely obsessed with pursuing the vain chimera of invisible mass. I fear that this will eventually prove to be one of the greatest wastes of brainpower (some of it my own) in the history of science. I can only hope I’m wrong, as many brilliant people seem likely to waste their career running garbage in-garbage out computer simulations or at the bottom of a mine shaft failing to detect what isn’t there.

A beautiful mess

JWST can’t answer all of these questions, but it will help enormously with galaxy formation, which is bound to be messy. It’s not like L* galaxies are going to spring fully formed from the void like Athena from the forehead of Zeus. The early universe must be a chaotic place, with clumps of gas condensing to form the first stars that irradiate the surrounding intergalactic gas with UV photons before detonating as the first supernovae, and the clumps of stars merging to form giant elliptical galaxies while elsewhere gas manages to pool and settle into the large disks of spiral galaxies. When all this happens, how it happens, and how big galaxies get how fast are all to be determined – but now accessible to direct observation thanks to JWST.

It’s going to be a confusing, beautiful mess, in the best possible way – one that promises to test and challenge our predictions and preconceptions about structure formation in the early universe.